Remove special character of substring only by Regular Expression - javascript

<html lang="en">
;;;;;;
<body>
<script>
var a = 3;
var b = a * 10;
</script>
</body>
</html>
This is is content of a string variable of my project.
And I want to remove ; of javascript code, not remove ; above the <body>
Only remove ; between <script> and </script>
Is it possible by Regular Expression?
The purpose to do this is that I want to remove unnecessary characters of string, and the string can include javascript and other language.
But as you know javascript allow semicolons and no-semicolons.

Short answer
You can try Regex lookbehind. Working example
Problem - Less Browser support
Long answer
You can use Regex capturing group to search for the semicolon which needs to be removed using the following pattern:
/(<script>[\S\s]*);(?=[\s\S]*<\/script>)/g
And then, you can use String.replace() method to remove the semicolon. Working example

Related

Getting Javascript Regular Expression from Page

I'm trying to grab a regular expression from an HTML element so that I can use it for evaluation later, but for some reason when I grab the RE from the DOM it's different than when I statically define it in my Javascript. Here is my testing code:
<p id="test">\\b31\\b</p>
<script>
var j = document.getElementById('test').innerHTML;
console.log(j);
var i = "\\b31\\b"
console.log(i);
</script>
The results of this are \\b31\\b for j and \b31\b for i. Why isn't j also \b31\b? More importantly, how do I fix this? Because my regular expression evaluation later won't work with j, and I need to be able to grab regular expressions from the page to evaluate later, but right now I can't unless I statically define them.
i don't know exactly why this is happening, but you could just simply fix this with:
var j = document.getElementById('test').innerHTML.replace(/\\\\/g,"\\");
see example here:
https://jsfiddle.net/wjnm15gL/1/
You cannot "fix" this. By design, the backslash in JS is used to escape special characters, such as newlines (\n). If you want to use a literal backslash, a double backslash has to be used.
If you want to use two backslashes, use four:
console.log('\\\\'); //returns \\
JavaScript strings automatically escape special characters following a \ character. Try this in your console to see:
console.log('It\'s a wonderful day');
//It's a wonderful day
That's why your j and i variables log differently to the console.

Js ReGex non-capturing group not working

my problem is, i need to capture an script src, but i need to get it only if it has an script tag before the src.
So here follow my regex and the options i tried
String: <script src="http://example.net"></script>
Regex: /(?:\<script[^]+src=("|'))([^]+)(?="|')/g
Match: <script src="http://example.net
Second option:
String: <script src="http://example.net"></script>
Regex: /(?!\<script[^]+src=("|'))([^]+)(?="|')/g
Match: script src="http://example.net
What i need to get is: http://example.net
I really do appreciate any help.
This is the tool i'm using for testing: http://www.regexr.com/
Thanks,
Regular expression is not the right tool for parsing HTML, but to fix the problem you can use the exec() method in a loop to grab all your submatches and then push the match results of the captured group into an array.
var s = '<script src="http://foo.net"></script><script src="http://bar.com"></script>';
var re = /<script[^>]+?src=['"]([^'"]+)['"]/g,
matches = [];
while (m = re.exec(s)) {
matches.push(m[1]);
}
console.log(matches) //=> [ 'http://foo.net', 'http://bar.com' ]
Not sure exactly what you're trying to do or where you got that syntax.
If you want values of the src attribute in all script tags, why not just search for /<script[^>]*\ssrc="([^"]*)"/ and examine the first subexpression match..
This syntax [^]+ as far i know, works only with old versions of internet explorer (but perhaps with newer versions too, you know microsoft) and means all that is not nothing (i.e. everything), one or several times.
If you want to match all the characters until the end of the tag and before the attribute you want, you need to use [^>]+? (as you can see) with a lazy quantifier.
For the second ugly [^], since it is between quotes, you only need to replace it with [^"'] that excludes quotes.
The result you need is not the whole match but the content of the capture group.
<script[^>]+?src=["']([^"']+)["']
Here's a start for you:
/<script src=\"(.*)(?=\")/g
Retrieve the value of the first capturing group returned by this expression.
Here is the regexr.com result:
String: <script src="http://example.net"></script>
Regex: /(?:<script src=")([^"]+)/g
group#1: http://example.net
And here is the example javascript code:
s = '<script src="http://example.net"></script>';
url = s.split(/(?:<script src=")([^"]+)/g)[1];
Since javascript doesn't support lookbehind assertions, - AFAIK - You can't both match only the url and check if there is a script tag before the url. Therefore, As an alternative of lookbehind assertions, this is the fastest and easiest solution that i know.

Javascript Regular Expression: Only matching the last pattern

Context: I have some dynamically generated HTML which can have embedded javascript function calls inside. I'm trying to extract the function calls with a regular expression.
Sample HTML string:
<dynamic html>
<script language="javascript">
funcA();
</script>
<a little more dynamic html>
<script language="javascript">
funcB();
</script>
My goal is to extract the text "funcA();" and "funcB();" from the above snippet (either as a single string or an array with two elements would be fine). The regular expression I have so far is:
var regexp = /[\s\S]*<script .*>([\s\S]*)<\/script>[\s\S]*/gm;
Using html_str.replace(regexp, "$1") only returns "funcB();".
Now, this regexp works just fine when there is only ONE set of <script> tags in the HTML, but when there are multiple it only returns the LAST one when using the replace() method. Even removing the '/g' modifier matches only the last function call. I'm still a novice to regular expressions so I know I'm missing something fundamental here... Any help in pointing me in the right direction would be greatly appreciated. I've done a bit of research already but still haven't been able to get this issue resolved.
Your wildcard matches are all greedy. This means they will not only match what you expect, but as much as there possibly is in your code.
Make them all non-greedy (.*?) and it should work.

Javascript Regular Expression Match

Try
<script type="text/javascript">
var str=">1 people>9 people>1u people";
document.write(str.match(/>.*people/img).length);
</script>
at http://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_regexp_dot.
This code should return an array of size 3 but it return array of size 1.
Where is the problem?
The .* part of your regexp is being "greedy" and taking as many characters as it can, in this case returning the entire string as a single match.
Write it like this instead, with a trailing ?:
str.match(/>.*?people/img)
See the section describing "?" in the Mozilla Developer Network JS Reference.

Regex to remove all but file name from links

I am trying to write a regexp that removes file paths from links and images.
href="path/path/file" to href="file"
href="/file" to href="file"
src="/path/file" to src="file"
and so on...
I thought that I had it working, but it messes up if there are two paths in the string it is working on. I think my expression is too greedy. It finds the very last file in the entire string.
This is my code that shows the expression messing up on the test input:
<script type="text/javascript" src="/javascripts/jquery.js"></script>
<script type="text/javascript">
$(document).ready(function(){
var s = '<img src="/one/two/keep.this">';
var t = s.replace(/(src|href)=("|').*\/(.*)\2/gi,"$1=$2$3$2");
alert(t);
});
</script>
It gives the output:
The correct output should be:
<img src="keep.this">
Thanks for any tips!
It doesn't have to be a regular expression (assuming / delimiters):
var fileName = url.split('/').pop(); //pop takes the last element
I would suggest run separate regex replacement, one for a links and another for img, easier and clearer, thus more maintainable.
This seems to work in case anyone else has the problem:
var t = s.replace(/(src|href)=('|")([^ \2]*\/)*\/?([^ \2]*)\2/gi,"$1=$2$4$2");
Try adding ? to make the * quantifiers non-greedy. You want them to stop matching when they encounter the ending quote character. The greedy versions will barrel right on past the ending quote if there's another quote later in the string, finding the longest possible match; the non-greedy ones will find the shortest possible match.
/(src|href)=("|').*?\/([^/]*?)\2/gi
Also I changed the second .* to [^/]* to allow the first .* to still match the full path now that it's non-greedy.

Categories