i want to replace all the src attribute inside a string of HTML adding a '?a' at the end of the url to fix a caching issue that i'm having.
So far i got this:
str.replace(/src=".*?"/gi, "src");
But i have no idea how to do the 2nd parameter to get what is matched and add the '?a' at the end.
Example:
<img src="mysite.com/logo.png" /> should become
<img src="mysite.com/logo.png?a" />
Thank you in advance, Daniel!
The first problem is that you are matching the entire src attribute instead of what's within the src attribute.
The second problem is that you aren't putting the match into your replacement.
This would be the correct regex:
str.replace(/src="(.*?)"/gi, 'src="$1?a"')/;
Adding brackets around the match .*? makes sure you match just that part instead of the whole regex, adding $1 into the second parameter makes sure you put the match back into the replacement.
You also have to re-enter src="" into the replacement, because the whole regex match will be replaced.
You can use this /src="(.*?)"/gi regex to find and replace the content.
Related
I have the following code which replaces the current URL using JavaScript:
window.location.replace(window.location.href.replace(/\/?$/, '#/view-0'));
However if I have a URL like:
domain.com/#/test or domain.com/#/
It will append the #/view-0 to the current hash. What I want to is replace EVERYTHING after the last part of the URL including any query strings or hashes.
So presume my regex doesn't handle that... How can I amend it, to be more aggressive?
The following syntax may help:
location.href.replace(/[?#].*$/, '#/view')
It will replace everything after (and together with) ? or # in the string with #/view.
(^[^\/]*?\/)(?:.*)
Use this.Replace by \1 then your string
See demo.
http://regex101.com/r/sA7pZ0/28
I can't post the exact data i'm trying to extract but here's a basic scenario with the same outcome. I'm grabbing the body of a page and trying to extract a bit.ly link from it. So let's say for example, this is the chunk of data where i'm trying to grab the link from.
String:
http://bit.ly/Pq8AkS</div><div class="shareUnit"><div class="-cx-PRIVATE-fbTimelineExternalShareUnit__wrapper"><div><div class="-cx-PRIVATE-fbTimelineExternalShareUnit__root -cx-PRIVATE-fbTimelineExternalShareUnit__hasImage"><a class="-cx-PRIVATE-fbTimelineExternalShareUnit__video -cx-PRIVATE-fbTimelineExternalShareUnit__image -cx-PRIVATE-fbTimelineExternalShareUnit__content" ajaxify="/ajax/flash/expand_inline.php?target_div=uikk85_59&share_id=271663136271285&max_width=403&max_height=403&context=timelineSingle" rel="async" href="#" onclick="CSS.addClass(this, "-cx-PRIVATE-fbTimelineExternalShareUnit__loading");CSS.removeClass(this, "-cx-PRIVATE-fbTimelineExternalShareUnit__video");"><i class="-cx-PRIVATE-fbTimelineExternalShareUnit__play"></i><img class="img" src="http://external.ak.fbcdn.net/safe_image.php?d=AQDoyY7_wjAyUtX2&w=155&h=114&url=http%3A%2F%2Fi1.ytimg.com%2Fvi%2FDre21lBu2zU%2Fmqdefault.jpg" alt="" /></a>
Now, I can get what i'm looking for with the following code but the link isn't always going to be exactly 6 characters long. So this causes an issue...
Body = document.getElementsByTagName("body")[0].innerHTML;
regex = /2Fbit.ly%2F(.{6})&h/g;
Matches = regex.exec(Body);
Here's what I was orginally trying but the problem I have is that it grabs too much data. It's going all the way to the last "&h" in the string above instead of stopping at the first one it hits.
Body = document.getElementsByTagName("body")[0].innerHTML;
regex = /2Fbit.ly%2F(.*)&h/g;
Matches = regex.exec(Body);
So basically the main part of the string i'm trying to focus on is "%2Fbit.ly%2FPq8AkS&h" so that I can get the "Pq8AkS" out of it. When I use the (.*) it's grabbing everything between "%2F" and the very last "&h" in the large string above.
You should not be using a regex on HTML. Use DOM functions to get the desired link object, then get the href attribute from that, then you can use a regex on just the href.
By default .* is greedy meaning that it matches the most it can match and still find a match. If you want it to be non-greedy (match the least possible), you can use this .*? instead like this:
regex = /2Fbit.ly%2F(.*?)&h/;
I also don't think you want the g flag on the regex as there should only be one match in the right URL.
If you show the rest of your HTML, we could offer advice on finding the right link object rather than trying to match the entire body HTML.
FYI, another trick for a non-greedy match is to do something like this:
regex = /2Fbit.ly%2F([^&]*)&h/;
Which matches a series of characters that are not & followed by &h which accomplishes the same goal as long as & can't be in the matched sequence.
By default + and * are greedy and match as much as possible. You need a non-greedy match for your (.+). A quick search gives the solution as
? directly following a quantifier makes the quantifier non-greedy (makes it match minimum instead of maximum of the interval defined).
So try changing your regex= line to
regex = /2Fbit.ly%2F(.*?)&h/g;
Edit: #jfriend00's answer below is more complete.
I have a string that contains HTML image elements that is stored in a var.
I want to remove the image elements from the string.
I have tried: var content = content.replace(/<img.+>/,"");
and: var content = content.find("img").remove(); but had no luck.
Can anyone help me out at all?
Thanks
var content = content.replace(/<img[^>]*>/g,"");
[^>]* means any number of characters other than >. If you use .+ instead, if there are multiple tags the replace operation removes them all at once, including any content between them. Operations are greedy by default, meaning they use the largest possible valid match.
/g at the end means replace all occurrences (by default, it only removes the first occurrence).
$('<p>').html(content).find('img').remove().end().html()
The following Regex should do the trick:
var content = content.replace(/<img[^>"']*((("[^"]*")|('[^']*'))[^"'>]*)*>/g,"");
It first matches the <img. Then [^>"']* matches any character except for >, " and ' any number of times. Then (("[^"]*")|('[^']*')) matches two " with any character in between (except " itself, which is this part [^"]*) or the same thing, but with two ' characters.
An example of this would be "asf<>!('" or 'akl>"<?'.
This is again followed by any character except for >, " and ' any number of times. The Regex concludes when it finds a > outside a set of single or double quotes.
This would then account for having > characters inside attribute strings, as pointed out by #Derek 朕會功夫 and would therefore match and remove all four image tags in the following test scenario:
<img src="blah.png" title=">:(" alt=">:)" /> Some text between <img src="blah.png" title="<img" /> More text between <img /><img src='asdf>' title="sf>">
This is of course inspired by #Matt Coughlin's answer.
Use the text() function, it will remove all HTML tags!
var content = $("<p>"+content+"</p>").text();
I'm in IE right now...this worked great, but my tags come out in upper case (after using innerHTML, i think) ... so I added "i" to make it case insensitive. Now Chrome and IE are happy.
var content = content.replace(/<img[^>]*>/gi,"");
Does this work for you?:
var content = content.replace(/<img[^>]*>/g, '')
You could load the text as a DOM element, then use jQuery to find all images and remove them. I generally try to treat XML (html in this case) as XML and not try to parse through the strings.
var element = $('<p>My paragraph has images like this <img src="foo"/> and this <img src="bar"/></p>');
element.find('img').remove();
newText = element.html();
console.log(newText);
To do this without regex or libraries (read jQuery), you could use DOMParser to parse your string, then use plain JS to do any manipulations and re-serialize to get back your string.
I'm trying to parse and amend some html (as a string) using javascript and in this html, there are references (like img src or css backgrounds) to filenames which contain full stops/periods/dots/.
e.g.
<img src="../images/filename.01.png"> <img src="../images/filename.02.png">
<div style="background:url(../images/file.name.with.more.dots.gif)">
I've tried, struggled and failed to come up with a neat regex to allow me to parse this string and spit it back out without the dots in those filenames, e.g.
<img src="../images/filename01.png"/> <img src="../images/filename02.png"/>
<div style="background:url(../images/filenamewithmoredots.gif)">
I only want to affect the image filenames, and obviously I want to leave the filetype alone.
A regex like:
/(.*)(?=(.gif|.png|.jpg|.jpeg))
allows me to match the main part of the filename and the extension seperately, but it also matches across the whole of the string, not just within the one filename I want.
I have no control over the incoming html, I'm just consuming it.
Help me please overflowers, you're my only hope!
I agree that this is not a problem suitable for regular expression, much less one neat expression.
But I trust that you are not here to hear that. So, in case you want to keep the input as string...
var src, result = '<img src="../images/filename.01.png"> <img src="../images/filename.02.png"><div style="background:url(../images/file.name.with.more.dots.gif)">';
do {
src = result;
result = src.replace( /((?:url(\()|href=|src=)['"]?(?:[^'"\/]*\/)*[^'"\/]*)\.(?=[^\.'")]*\.(?:gif|png|jpe?g)['")>}\s])/g, '$1' );
} while (result != src)
Basically it keeps removing the second last dot of images url's filenames until there are none. Here is a breakdown of the expression in case you need to modify it. Tread lightly:
( start main capturing group since js regx has no lookbehind.
(?:url(\()|href=|src=)['"]? Start of an url. it would be safer to force url() to be properly quoted so that we can use back reference, but unfortunately your given example is not.
(?:[^'"\/]*\/)* Folder part of the url.
[^'"\/]* Part of the file name that comes before second last dot.
) close main group.
\. This is the second last dot we want to get rid of.
(?= Look behind.
[^\.'")]* Part of the file name that goes between second last dot and last dot.
\.(?:gif|png|jpe?g) Make sure the url ends in image extension.
['")>}\s] Closing the url, which can be a quote, ')', '>', '}', or spaces. Should user back reference here if possible. (Was ['"]?\b when first answered)
) End of look behind.
Consider using the DOM instead of regular expressions. One way is to create fake elements.
var fake = document.createElement('div');
fake.innerHTML = incomingHTML: // Not really part of JS standard but all the 'main' browsers support it
var background = fake.childNodes[0].style.background;
// Now use a regex if need be: /url\(\"?(.*)\"?\)/
// If img is at childNodes[1]
var url = fake.childNodes[1].src;
With jQuery this is far easier:
$(incomingHTML).find('img').each(function() { $(this).attr('src'); });
Your problem is the greedy match in .*. Maybe better try something like this
([^\/]*)(?=(.gif|.png|.jpg|.jpeg))
[^\/] is a character class that matches every character but slashes
another point is, you need to escape the . to match it literally
([^\/]*)(?=\.(gif|png|jpg|jpeg))
The problem is that . means "any character".
Escape it:
/(.*)(?=(\.gif|\.png|\.jpg|\.jpeg))
Because of the way that jQuery deals with script tags, I've found it necessary to do some HTML manipulation using regular expressions (yes, I know... not the ideal tool for the job). Unfortunately, it seems like my understanding of how captured groups work in JavaScript is flawed, because when I try this:
var scriptTagFormat = /<script .*?(src="(.*?)")?.*?>(.*?)<\/script>/ig;
html = html.replace(
scriptTagFormat,
'<span class="script-placeholder" style="display:none;" title="$2">$3</span>');
The script tags get replaced with the spans, but the resulting title attribute is blank. Shouldn't $2 match the content of the src attribute of a script tag?
Nesting of groups is irrelevant; their numbering is determined strictly by the positions of their opening parentheses within the regex. In your case, that means it's group #1 that captures the whole src="value" sequence, and group #2 that captures just the value part.
Try this:
/<script (?:(?!src).)*(?:src="(.*?)")?.*?>(.*?)<\/script>/ig
See here: rubular
As stema wrote, the .*? matches too much. With the negative lookahead (?:(?!src).)* you will match only until a src attribute.
But actually in this case you could also just move the .*? into the optional part:
/<script (?:.*?src="(.*?)")?.*?>(.*?)<\/script>/ig
See here: rubular
The .*? matches too much because the following group is optional, ==> your src is matched from one of the .*? around. if you remove the ? after your first group it works.
Update: As #morja pointed out your solution is to move the first .*? into the optional src part.
Just for completeness: /<script (?:.*?(src="(.*?)"))?.*?>(.*?)<\/script>/ig
You can see it here on rubular (corrected my link also)
If you don't want to use the content of the first capturing group, then make it a non capturing group using (?:)
/<script (?:.*?(?:src="(.*?)"))?.*?>(.*?)<\/script>/ig
Then your wanted result is in $1 and $2.
Could you post the html you are retrieving? Your code works fine in a simple example: jsfiddle (warning: alert box)
My first guess is that one of your script tags does not have a src meaning you are left with a single capture group (the script contents).
I'm thinking that regular expressions by themselves can't do exactly what I'm looking for, so here's my modification to work around the problem:
var scriptTagFormat = /<script\s+((.*?)="(.*?)")*\s*>(.*?)<\/script>/ig;
html = html.replace(
scriptTagFormat,
'<span class="script-placeholder" style="display:none;" $1>$4</span>');
Before, I wanted to avoid setting non-standard attributes on the replacement span. This code blindly copies all attributes instead. Luckily, the non-standard attributes aren't stripped out of the DOM when I insert the HTML, so it will work for my purposes.