javascript replace with regexp has strange behaviour - javascript

Maybe someone can give me a hint...
I have the following code and experience a strange behaviour in javascript (node.js):
var a = "img{http://my.image.com/imgae.jpg} img{http://my.image.com/imgae.jpg}"
var html = a.replace(/img\{(.*)\}/g, '<img src="$1" class="image">');
//result: <img src="http://my.image.com/imgae.jpg" class="image"">
As you can see, the occurrence in the string (a markup thing) is replaced by an img tag with source as expected.
But now something strange. In the markup are probably several elements of type img{src}
var a = "img{http://my.image.com/imgae.jpg} some text between img{http://my.image.com/imgae.jpg}"
var html = a.replace(/img\{(.*)\}/g, '<img src="$1" class="image">');
//result: <img src="http://my.image.com/imgae.jpghttp://my.image.com/imgae.jpg" class="image"">
The result is strange. in $1 all matches are stored and accumulated... And there is only one image tag.
I am confused...

Try: a.replace(/img\{(.*?)\}/g, '<img src="$1" class="image">');
I found out about adding ? makes regex non-greedy here

Use this to stop at the first closing curly bracket.
var html = a.replace(/img{([^}]*)}/g, '<img src="$1" class="image">');

I think it's probably more important that you understand how this is working. .* can be a dangerous regular expression if you don't understand what it will do because it is greedy and will consume as much as it can, and some linters will warn against it.
So if you break down your regex you will find that the img\{ part matches the first part of the string (.*) matches http://my.image.com/imgae.jpg} some text between img{http://my.image.com/imgae.jpg and the final } matches the closing } because this is the largest string that matches the expression.
The best solution is to use ([^}]*), which matches anything except } because you know that anything between the image {} will will not be a closing brace.
You can test your regex to see what it is matching:
var reg = /img\{(.*)\}/g
var a = "img{http://my.image.com/imgae.jpg} img{http://my.image.com/imgae.jpg}"
var groups = a.match(reg)
// we can see what the first group matched
// groups[0] === "http://my.image.com/imgae.jpg} img{http://my.image.com/imgae.jpg"

Related

How do I use JavaScript regex to search for everything except for one instance?

I am using JavaScript regex and would like to strip HTML tags out of a string except for one situation.
Let's take this string for example:
"<a>link me</a>
<p class="highlight">paragraph</p>
<replace meta="data"></replace>"
I would like to use string replace to transform it into:
(all HTML tags are stripped except for <[/?]replace[.*]>)
"link me paragraph <replace meta="data"></replace>"
The regex for removing all tags would be:
html = String(html).replace(/<[^>]+>/gm, '');
How would one go about placing the exception for <replace> and </replace> in there?
Use negative lookahead:
/(?!<\/?replace)<[^>]+>/gm
The (?!<\/?replace) negative lookahead asserts that <[^>]+> cannot match if it's a replace opening or closing tag.
Regex101
var str = `<a>link me</a>
<p class="highlight">paragraph</p>
<replace meta="data">DO NOT REPLACE</replace>`;
var re = /(?!<\/?replace)<[^>]+>/gm;
document.querySelector('pre').textContent = str.replace(re, '');
<pre></pre>
The classic approach is to first match and capture what you want to keep (in this case, the <replace> tags), then as an alternative match everything else you don't want to keep (in this case, all other tags), then replace what matched with the captured content, which will have the effect of throwing away the unwanted tags:
var string = `<a>link me</a>
<p class="highlight">paragraph</p>
<replace meta="data"></replace>`;
var re = /(<\/?replace.*?>)|<.*?>/g;
// ^^^^^^^^^^^^^^^^^ CAPTURE WHAT WE WANT TO KEEP
// ^^^^^ DON'T CAPTURE WHAT WE DON'T WANT TO KEEP
var result = string.replace(re, '$1');
// ^^^^ REPLACE WITH CAPTURE
document.getElementById('result').textContent = result;
<pre id="result"></pre>
Negative look-ahead also is a fine solution, but some might consider this approach a bit more straightforward.

how to replace all occurrence of string between two symbols?

I'm working with RegEx on Javascript and here is where I stuck.
I have a simple string like
<html><body><span style=3D"font-family:Verdana; color:#000; font-size:10pt;=
"><div><font face=3D"verdana, geneva" size=3D"2">http://72.55.146.142:8880/=
order003.png.zip,120</body></html>
all i need to do is write javascript which can replace all strings in with "<" and ">" symbol.
I wrote something like this -
var strReplaceAll = Body;
var intIndexOfMatch = strReplaceAll.indexOf( "<" );
while (intIndexOfMatch != -1){
strReplaceAll = strReplaceAll.replace(/<.*>/,'')
intIndexOfMatch = strReplaceAll.indexOf( "<" );
}
but the problem is if body contains -
test<abc>test2<adg>
it will give me -
test
only or if body contains like -
<html>test<abc>test2<adg>
it will give me nothing please let me know how i can get-
testtest2
as a final output.
Try this regex instead:
<[^>]+>
DEMO:
http://regex101.com/r/kI5cJ7/2
DISCUSSION
Put the html code in a string and apply to this string the regex.
var htmlCode = ...;
htmlCode = htmlCode.replace(/<[^>]+>/g, '');
The original regex take too much characters (* is a greedy operator).
Check this page about Repetition with Star and Plus, especially the part on "Watch Out for The Greediness!".
Most people new to regular expressions will attempt to use <.+>. They will be surprised when they test it on a string like This is a <EM>first</EM> test. You might expect the regex to match <EM> and when continuing after that match, </EM>.
But it does not. The regex will match <EM>first</EM>. Obviously not what we wanted.
/(<.*?>)/
Just use this. Replace all the occurrences with "".
See demo.

Performing a non greedy regular expresssion match javascript

My input string looks something like:
var someString = 'This is a nice little string with <a target="_" href="/carSale/12/..">link1</a>. But there is more that we want to do with this. Lets insert another <a target="_" href="/carSale/13/..">link2</a> ';
My end goal is to match every anchor element that has a"carSale" within its href attribute and replace it with the text insider the anchor.
for e.g
Replace <a target="_" href="/carSale/12/..">link1</a> with string link1
but it should not replace
<a target="_" href="/bikeSale/12/..">link3</a>
since the above href does not contain the string "carSale"
I have created a regular expression object for this. But it seems to be performing a greedy match.
var regEx = /(<a.*carSale.*>)(.*)(<\/a>)/;
var someArr = someString.match(regEx);
console.log(someArr[0]);
console.log(someArr[1]);
console.log(someArr[2]);
console.log(someArr[3]);
Appending the modifier 'g' at the end fo the regular expression gives bizare results.
Fiddle here :
http://jsfiddle.net/jameshans/54X5b/
Rather than using a regular expression, use a parser. This won't break as easily and uses the native (native as in the browser's) parser so is less susceptible to bugs:
var div = document.createElement("div");
div.innerHTML = someString;
// Get links
var links = div.querySelectorAll("a");
for (var i = 0; i < links.length; ++i) {
var a = links[i];
// If the link contains a href with desired properties
if (a.href.indexOf("carSale") >= 0) {
// Replace the element with text
div.replaceChild(document.createTextNode(a.innerHTML), a);
}
}
See http://jsfiddle.net/prankol57/d72Vr/
However, if you are confident that your html will always follow the pattern specified by your regex, then you can use it. I will drop a link to
RegEx match open tags except XHTML self-contained tags
Online Demo
I am not sure what is what are your matching groups but how about this expression:
/^<a.*href="((?:.*)carSale(?:.*))".*>(.*)<\/a>$/
Note that in this expression I am matching href to contain carSale which I think is where you want the expression to match.
And since you want to replace the whole expression as I understand all you need to do is:
var result = '<a target="_" href="\/carSale/12\/..">link1<\/a>'.replace(/(^<a.*href="((?:.*)carSale(?:.*))".*>(.*)<\/a>$)/,"temp text");
Or this one:
/(<a.*?carSale.*?>)(.*?)(<\/a>)/
The ? makes your repeater non-greedy, so it eats as little as possible, versus the default behavior of * which is to eat as much as possible. So with the ? added, the (.*?) will stop at the first </a> rather than the last one
(<a[^>]*(href=\"([^>]*(?=carSale)[^>]*)\")[^>]*>)([^<]*)(<\/a>)*
groups 3 and 4 are what you are interested in

Extracting regex group in javascript not working

I am trying to extract img's src using javascript's regexp yet I am not able to use the groups. I have came up with something like this so far :
>str
"xxxxx src="http://www.omgubuntu.co.uk/wp-content/uploads/2013/12/Multim.jpg" xxxxx"
>str.match(re)
["src="http://www.omgubuntu.co.uk/wp-content/uploads/2013/12/Multim.jpg""]
>re
/src=\"(.*?)\"/g
>str.match(re)[1]
undefined
Yet I only get the match for the whole pattern
Unsure why you'd want to do this with regex;
Simply use:
document.getElementById('idOfImgElement').src;
Or easier, with jQuery:
$('#idOfImgElement').attr(src);
But if you really want to do this with regex, use:
var str = "<img src=\"http://www.omgubuntu.co.uk/wp-content/uploads/2013/12/Multim.jpg\" />";
var matches = (/src=\"(.*?)\"/g).exec(str);
window.print(matches[1]);
How many possible img tags might be in your string? Assuming 1 then forget the global flag (and iterating through an exec) and simply make a regex that explains the whole string and use capture groups. Then you can specify, in your index of match, to return the capture group you know will represent it. Since, in these types of questions, there always seems to be a caveat not mentioned I made the expression more tight against some possibilities like other attributes in the element. So just in case it is more complex than you let on you can use this regex:
(?:<img )?[^>]+src=(["'])(.*)\1
The quote capture group is necessary so that you match up the quotes. You show double quotes but is that guaranteed? in this regexThe second capture group is always going to be the URL (contents of src to be more precise).
In code:
var str = '<img src="http://www.omgubuntu.co.uk/wp-content/uploads/2013/12/Multim.jpg" />';
var src = str.match(/(?:<img )?[^>]+src=(["'])(.*)\1/)[2]
I have achieved this with regexp without gm flags
var re_imgsrc = /.*src=\"(.*?)\".*/
imageUrl = str.replace(re_imgsrc, "$1");

Javascript regex expression to replace multiple strings?

I've a string done like this: "http://something.org/dom/My_happy_dog_%28is%29cool!"
How can I remove all the initial domain, the multiple underscore and the percentage stuff?
For now I'm just doing some multiple replace, like
str = str.replace("http://something.org/dom/","");
str = str.replace("_%28"," ");
and go on, but it's really ugly.. any help?
Thanks!
EDIT:
the exact input would be "My happy dog is cool!" so I would like to get rid of the initial address and remove the underscores and percentage and put the spaces in the right place!
The problem is that trying to put a regex on Chrome "something goes wrong". Is it a problem of Chrome or my regex?
I'd suggest:
var str = "http://something.org/dom/My_happy_dog_%28is%29cool!";
str.substring(str.lastIndexOf('/')+1).replace(/(_)|(%\d{2,})/g,' ');
JS Fiddle demo.
The reason I took this approach is that RegEx is fairly expensive, and is often tricky to fine tune to the point where edge-cases become less troublesome; so I opted to use simple string manipulation to reduce the RegEx work.
Effectively the above creates a substring of the given str variable, from the index point of the lastIndexOf('/') (which does exactly what you'd expect) and adding 1 to that so the substring is from the point after the / not before it.
The regex: (_) matches the underscores, the | just serves as an or operator and the (%\d{2,}) serves to match digit characters that occur twice in succession and follow a % sign.
The parentheses surrounding each part of the regex around the |, serve to identify matching groups, which are used to identify what parts should be replaced by the ' ' (single-space) string in the second of the arguments passed to replace().
References:
lastIndexOf().
replace().
substring().
You can use unescape to decode the percentages:
str = unescape("http://something.org/dom/My_happy_dog_%28is%29cool!")
str = str.replace("http://something.org/dom/","");
Maybe you could use a regular expression to pull out what you need, rather than getting rid of what you don't want. What is it you are trying to keep?
You can also chain them together as in:
str.replace("http://something.org/dom/", "").replace("something else", "");
You haven't defined the problem very exactly. To get rid of all stretches of characters ending in %<digit><digit> you'd say
var re = /.*%\d\d/g;
var str = str.replace(re, "");
ok, if you want to replace all that stuff I think that you would need something like this:
/(http:\/\/.*\.[a-z]{3}\/.*\/)|(\%[a-z0-9][a-z0-9])|_/g
test
var string = "http://something.org/dom/My_happy_dog_%28is%29cool!";
string = string.replace(/(http:\/\/.*\.[a-z]{3}\/.*\/)|(\%[a-z0-9][a-z0-9])|_/g,"");

Categories