Remove image elements from string - javascript

I have a string that contains HTML image elements that is stored in a var.
I want to remove the image elements from the string.
I have tried: var content = content.replace(/<img.+>/,"");
and: var content = content.find("img").remove(); but had no luck.
Can anyone help me out at all?
Thanks

var content = content.replace(/<img[^>]*>/g,"");
[^>]* means any number of characters other than >. If you use .+ instead, if there are multiple tags the replace operation removes them all at once, including any content between them. Operations are greedy by default, meaning they use the largest possible valid match.
/g at the end means replace all occurrences (by default, it only removes the first occurrence).

$('<p>').html(content).find('img').remove().end().html()

The following Regex should do the trick:
var content = content.replace(/<img[^>"']*((("[^"]*")|('[^']*'))[^"'>]*)*>/g,"");
It first matches the <img. Then [^>"']* matches any character except for >, " and ' any number of times. Then (("[^"]*")|('[^']*')) matches two " with any character in between (except " itself, which is this part [^"]*) or the same thing, but with two ' characters.
An example of this would be "asf<>!('" or 'akl>"<?'.
This is again followed by any character except for >, " and ' any number of times. The Regex concludes when it finds a > outside a set of single or double quotes.
This would then account for having > characters inside attribute strings, as pointed out by #Derek 朕會功夫 and would therefore match and remove all four image tags in the following test scenario:
<img src="blah.png" title=">:(" alt=">:)" /> Some text between <img src="blah.png" title="<img" /> More text between <img /><img src='asdf>' title="sf>">
This is of course inspired by #Matt Coughlin's answer.

Use the text() function, it will remove all HTML tags!
var content = $("<p>"+content+"</p>").text();

I'm in IE right now...this worked great, but my tags come out in upper case (after using innerHTML, i think) ... so I added "i" to make it case insensitive. Now Chrome and IE are happy.
var content = content.replace(/<img[^>]*>/gi,"");

Does this work for you?:
var content = content.replace(/<img[^>]*>/g, '')

You could load the text as a DOM element, then use jQuery to find all images and remove them. I generally try to treat XML (html in this case) as XML and not try to parse through the strings.
var element = $('<p>My paragraph has images like this <img src="foo"/> and this <img src="bar"/></p>');
element.find('img').remove();
newText = element.html();
console.log(newText);

To do this without regex or libraries (read jQuery), you could use DOMParser to parse your string, then use plain JS to do any manipulations and re-serialize to get back your string.

Related

error in parsing web page using javascript

I am trying to parse a page using javascript this is part of page:
<div class="title">
<h1>
Affect and Engagement in Game-BasedLearning Environments
</h1>
</div>
This is link tom page source:view-source:http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6645369?tp=&arnumber=6645369
I am using this:
$(data).find('h1').each(function()
{console.log($(this).text());
});
Now I am able to get the value inside header but the value displayed have lots of space in front and back.I tried to replace the whitespace by using replace function but replce isn't happening.I don't understand what is there in front and back of the value of header.I somehow want to remove the extra space.
Replace only replaces the first instance found, it might have only removed one space... try this instead, using regular expression syntax:
text.replace(/ /g, '');
This should remove all spaces, even the ones inside your string text. To avoid this, you may only want to replace double spaces instead:
text.replace(/ /g, '');
Also you may want to remove new lines:
text.replace(/\n/g, '');
Here is an example JSFiddle
If you know for sure that your string is only surrounded on either end by spaces, but you want to preserve everything inside, you can use trim:
text.trim();
Since your already using jQuery, you can take advantage of their $.trim function which removes leading and trailing whitespace.
$(data).find('h1').each(function() {
console.log($.trim($(this).text()));
});
Reference: $.trim()
Try using Javascript's Trim() to get rid of the whitespaces that's present on both sides.
Function's Reference:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/Trim
Your HTML actually contains these paces within your <h1> element, so it is to be expected that they are present in the result of .text().
Normally you'd just use .trim(). However, you'll likely want to replace line breaks inside the text as well.
$(data).find('h1').each(function() {
var text = $(this).text();
// Replaces any multiple whitespace sequences with a single space.
// Even inside the text.!
// E.g. " \r\n \t" -> " "
text = text.replace(/\s+/g, " ");
// Trim leading/trailing whitespace.
text = text.trim();
console.log(text);
});
Fiddle for your pleasure.

replace similar string in a text using javascript regex

we have a text like:
this is a test :rep more text more more :rep2 another text text qweqweqwe.
or
this is a test :rep:rep2 more text more more :rep2:rep another text text qweqweqwe. (without space)
we should replace :rep with TEXT1 and :rep2 with TEXT2.
problem:
when try to replace using something like:
rgobj = new RegExp(":rep","gi");
txt = txt.replace(rgobj,"TEXT1");
rgobj = new RegExp(":rep2","gi");
txt = txt.replace(rgobj,"TEXT2");
we get TEXT1 in both of them because :rep2 is similar with :rep and :rep proccess sooner.
If you require that :rep always end with a word boundary, make it explicit in the regex:
new RegExp(":rep\\b","gi");
(If you don't require a word boundary, you can't distinguish what is meant by "hello I got :rep24 eggs" -- is that :rep, :rep2, or :rep24?)
EDIT:
Based on the new information that the match strings are provided by the user, the best solution is to sort the match strings by length and perform the replacements in that order. That way the longest strings get replaced first, eliminating the risk that the beginning of a long string will be partially replaced by a shorter substring match included in that long string. Thus, :replongeststr is replaced before :replong which is replaced before :rep .
If your data is always consistent, replace :rep2 before :rep.
Otherwise, you could search for :rep\s, searching for the space after the keyword. Just make sure you replace the space as well.

Javascript string replace of dots (.) within filenames

I'm trying to parse and amend some html (as a string) using javascript and in this html, there are references (like img src or css backgrounds) to filenames which contain full stops/periods/dots/.
e.g.
<img src="../images/filename.01.png"> <img src="../images/filename.02.png">
<div style="background:url(../images/file.name.with.more.dots.gif)">
I've tried, struggled and failed to come up with a neat regex to allow me to parse this string and spit it back out without the dots in those filenames, e.g.
<img src="../images/filename01.png"/> <img src="../images/filename02.png"/>
<div style="background:url(../images/filenamewithmoredots.gif)">
I only want to affect the image filenames, and obviously I want to leave the filetype alone.
A regex like:
/(.*)(?=(.gif|.png|.jpg|.jpeg))
allows me to match the main part of the filename and the extension seperately, but it also matches across the whole of the string, not just within the one filename I want.
I have no control over the incoming html, I'm just consuming it.
Help me please overflowers, you're my only hope!
I agree that this is not a problem suitable for regular expression, much less one neat expression.
But I trust that you are not here to hear that. So, in case you want to keep the input as string...
var src, result = '<img src="../images/filename.01.png"> <img src="../images/filename.02.png"><div style="background:url(../images/file.name.with.more.dots.gif)">';
do {
src = result;
result = src.replace( /((?:url(\()|href=|src=)['"]?(?:[^'"\/]*\/)*[^'"\/]*)\.(?=[^\.'")]*\.(?:gif|png|jpe?g)['")>}\s])/g, '$1' );
} while (result != src)
Basically it keeps removing the second last dot of images url's filenames until there are none. Here is a breakdown of the expression in case you need to modify it. Tread lightly:
( start main capturing group since js regx has no lookbehind.
(?:url(\()|href=|src=)['"]? Start of an url. it would be safer to force url() to be properly quoted so that we can use back reference, but unfortunately your given example is not.
(?:[^'"\/]*\/)* Folder part of the url.
[^'"\/]* Part of the file name that comes before second last dot.
) close main group.
\. This is the second last dot we want to get rid of.
(?= Look behind.
[^\.'")]* Part of the file name that goes between second last dot and last dot.
\.(?:gif|png|jpe?g) Make sure the url ends in image extension.
['")>}\s] Closing the url, which can be a quote, ')', '>', '}', or spaces. Should user back reference here if possible. (Was ['"]?\b when first answered)
) End of look behind.
Consider using the DOM instead of regular expressions. One way is to create fake elements.
var fake = document.createElement('div');
fake.innerHTML = incomingHTML: // Not really part of JS standard but all the 'main' browsers support it
var background = fake.childNodes[0].style.background;
// Now use a regex if need be: /url\(\"?(.*)\"?\)/
// If img is at childNodes[1]
var url = fake.childNodes[1].src;
With jQuery this is far easier:
$(incomingHTML).find('img').each(function() { $(this).attr('src'); });
Your problem is the greedy match in .*. Maybe better try something like this
([^\/]*)(?=(.gif|.png|.jpg|.jpeg))
[^\/] is a character class that matches every character but slashes
another point is, you need to escape the . to match it literally
([^\/]*)(?=\.(gif|png|jpg|jpeg))
The problem is that . means "any character".
Escape it:
/(.*)(?=(\.gif|\.png|\.jpg|\.jpeg))

getElementById replace HTML

<script type="text/javascript">
var haystackText = document.getElementById("navigation").innerHTML;
var matchText = 'Subscribe to RSS';
var replacementText = '<ul><li>Some Other Thing Here</li></ul>';
var replaced = haystackText.replace(matchText, replacementText);
document.getElementById("navigation").innerHTML = replaced;
</script>
I'm attempting to try and replace a string of HTML code to be something else. I cannot edit the code directly, so I'm using Javascript to alter the code.
If I use the above method Matching Text on a regular string, such as just 'Subscribe to RSS', I can replace it fine. However, once I try to replace an HTML string, the code 'fails'.
Also, what if the HTML I wish to replace contains line breaks? How would I search for that?
<ul><li>\n</li></ul>
??
What should I be using or doing instead of this? Or am I just missing a small step? I did search around here, but maybe my keywords for the search weren't optimal to find a result that fit my situation...
Edit: Gonna mention, I'm writing this script in the footer of my page, well after the text I wish to replace, so it's not an issue of the script being written before what I want to overwrite to appear. :)
Currently you are using String.replace(substring, replacement) that will search for an exact match of the substring and replace it with the replacement e.g.
"Hello world".replace("world", "Kojichan") => "Hello Kojichan"
The problem with exact matches is that it doesn't allow anything else but exact matches.
To solve the problem, you'll have to start to use regular expressions. When using regular expression you have to be aware of
special characters such as ?, /, and \ that need to escaped \?, \/, \\
multiline mode /regexp/m
global matching if you want to replace more than one instance of the expression /regexp/g
closures for allowing multiple instances of white space \s+ for [1..n] white-space characters and \s* for [0..n] white-space characters.
To use regular expression instead of substring matching you just need to change String.replace("substring", "replacement") to String.replace(/regexp/, "replacement") e.g.
"Hello world".replace(/world/, "Kojichan") => "Hello Kojichan"
From MDN:
Note: If a <div>, <span>, or <noembed> node has a child text node that
includes the characters (&), (<), or (>), innerHTML returns these
characters as &amp, &lt and &gt respectively. Use element.textContent
to get a correct copy of these text nodes' contents.
So since textContent (or innerText) won't get you the HTML, you'd have to modify your search string appropriately.
You can use Regular Expressions.
Recommend to use Regular Expression. Notice that ? and / are special characters in Regular Expression. And for global multi-line matching, you need g and m flags set in the regular expression.
Regular expression matching of HTML (other than plain text) that comes out of a web page is a bad idea and is troublesome to make work cross browser (particularly in IE). The HTML that comes out of a web page does not always look the same as what was put in because some browser reconstitute the HTML and don't actually store what went in. Attributes can change order, quote marks can change or disappear, entities can change, etc...
If you want to modify whole tags, then you should directly access the DOM and operate on the actual objects in the page.

Regex to put quotes for html attributes

I have a scenario like this
in html tags, if the attributes is not surrounded either by single or double quotes.. i want to put double quotes for that
how to write regex for that?
If you repeat this regex as many times as there might be tags in an element, that should work so long as the text is fairly normal and not containing lots of special characters that might give false positives.
"<a href=www.google.com title = link >".replace(/(<[^>]+?=)([^"'\s][^\s>]+)/g,"$1'$2'")
Regex says: open tag (<) followed by one or more not close tags ([^>]+) ungreedily (?) followed by equals (=) all captured as the first group ((...)) and followed by second group ((...)) capturing not single or double quote or space ([^"'\s]) followed by not space or close tag ([^\s>]) one or more times (+) and then replace that with first captured group ($1) followed by second captured group in single quotes ('$2')
For example with looping:
html = "<a href=www.google.com another=something title = link >";
newhtml = null;
while(html != newhtml){
if(newhtml)
html = newhtml;
var newhtml = html.replace(/(<[^>]+?=)([^"'\s][^\s>]+)/,"$1'$2'");
}
alert(html);
But this is a bad way to go about your problem. It is better to use an HTML parser to parse, then re-format the HTML as you want it. That would ensure well formatted HTML wheras regular expressions could only ensure well formatted HTML if the input is exactly as expected.
Very helpful! I made a slight change to allow it to match attributes with a single character value:
/(<[^>]+?=)([^"'\s>][^\s>]*)/g (changed one or more + to zero or more * and added > to the first match in second group).

Categories