Javascript: using String.replace() to replace/remove HTML anchors - javascript

I am trying to accomplish a task that would be very simple if there was a way to replace one simple string with another.
I have an HTML source of a page as a string. It contains several internal anchors such as
<a href="#about">, <a href="#contact">, <a href="#top">, <a href="#bottom">, <a href="#who-we-are">, etc.
All of the anchors are stored in an array (['about','contact'],...), and I need to remove every occurance of a string like
href="#whatever"
(where whatever is each time something different) so that the result is
<a>
What I'd do with simple search and replace would be to iterate through my array and replace each occurance of
'href="'+anchorname+'"'
with an empty string. But after many attempts with string.replace() I still have't found the way to accomplish this.
In other words (posted also in the comments):
A much simpler way to put my question would be this:
Suppose my string contains the following three strings
<a href="#contact"> <a href="#what"> <a href="#more">
How to I use Javascript to replace them (but NOT any tag with the same pattern) with <a> ?

All of the anchors are stored in an array (['about','contact'],...), and I need to remove every occurance of a string like href="#whatever" (where whatever is each time something different) so that the result is
for this you could do something like this:
var tets_array = $("a[href^='#']"); // select all a tags having href attr starting with #
Then take the array and modify with what you want.

I don't know if I understand the question but you can try this :
var index = myArray.indexOf('whatever');
myArray[index] = "whatever2"

One specific statement would be as follows:
var mystring = '<a href="#thistag"> <a href="#thattag">';
var repstring = mystring;
var myarray = ['thistag', 'theothertag'];
for (var i = 0; i < myarray.length; i++) {
var reptag = myarray[i];
repstring = repstring.replace('a href="#' + reptag + '"', 'a');
}
Then repstring contains the final string. Note the alternating single and double quote characters which ensure that the double-quotes are in their usual place as part of the HTML text. Obviously you can change reptag (i.e., the content of myarray) to include the # character, at which point you would alter the mystring.replace(...) line to match.

If I understand you correctly then you can replace all of those using regular expressions.
var tagString = '<a href="#contact"> <a href="#what"> <a href="#more">';
// Find every occurrence which starts with '#' and ends with '"'
var regEx = new RegExp('#.*?"', 'g');
// Replace those occurrences with '"' to remove them
tagString = tagString.replace(regEx, '"');
EDIT:
To replace specific tags in an array, you can do the following:
var tags = ['<a href="#a">', '<a href="#b">', '<a href="#c">'];
var tagsToReplace = ['a', 'c'];
for (var i = 0, len = tags.length; i < len; i++) {
var matches = tags[i].match(/#.*"/);
if (matches === null) {
continue;
}
var anchor = matches[0].substr(1).replace('"', ''); // Get only the anchor
if (tagsToReplace.indexOf(anchor) !== -1) {
tags[i] = tags[i].replace('#' + anchor, 'replaced');
}
}

Thanks for all of the suggestions. I tried various variations of them and noticed that some worked in Firefox but not in chrome, which led me to this thread
Why doesn't the javascript replace global flag work in Chrome or IE, and how to I work around it?
Which led me to the solution:
for (var i=0; i < myTags.length ; i++)
{
var find = 'href="#' + myTags[i] + '"';
var regex = new RegExp(find, 'gi');
MyWholeString = MyWholeString.replace(regex, '');
}

Related

Javascript replace multiple images calls with regex

I need to write a script that reads an HTML file in which there are multiple images declared. The thing is the call to the image is simple and I need to add several instructions. From this code :
<img src="myImage1.jpg" class="image_document">
<img src="myImage2.jpg" class="image_document">
I have to get this :
<a onclick="bloquedefilementchapitres();" href="articles/myImage1.jpg" class="fancybox" rel="images"><img src="articles/myImage1.jpg" class="image_document"></a>
<a onclick="bloquedefilementchapitres();" href="articles/myImage2.jpg" class="fancybox" rel="images"><img src="articles/myImage2.jpg" class="image_document"></a>
So, I tried to use regex to perform this thinking that every image could be detected by regex. Then, every filename could be stored in an array so that I would be able to use the filename to write the link.
Here is what I did :
var recherche_images,
urls = [],
str = theHTMLcode,
rex = /<img[^>]+src="?([^"\s]+)"/g;
while (recherche_images = rex.exec(str)) {
urls.push(recherche_images[1]);
}
for (var v = 0; v < urls.length; v++) {
str = str.replace(/<img src=\"/, '<a onclick=\"bloquedefilementchapitres();\" href=\"articles/' + urls[v] + '\" class=\"fancybox\" rel=\"images-' + idunite + '\"><img src=\"articles/');
}
str = str.replace(/image_document\">/g, 'image_document\"></a>');
If the HTML document only contains one image, it works. If two images are declared, the program loops on the first image and ignores the second image.
Is there any way to tell to the replace function to start at a certain point in the string? Is there any better way to perform this?
There are better ways to achieve this as noted in the comments to your question.
Regarding your code:
There is an issue with the for loop in your code - v++ is executed twice - one in the for loop increment condition and once in the body.
Another issue is the way the .replace() method has been used. Without the \g flag in the regex, the replace() method will just find and replace the first occurrence. Check documentation and this SO question for more details.
Check the code below which uses the /g operator in the regex to find all img matches in the string and each match is passed to a function which returns the replacement string.
What you need is something like this:
var recherche_images,
urls = [],
str = '<img src="myImage1.jpg" class="image_document"><img src="myImage2.jpg" class="image_document">',
rex = /<img[^>]+src="?([^"\s]+)"/g;
while (recherche_images = rex.exec(str)) {
urls.push(recherche_images[1]);
console.log(recherche_images[1]);
}
var idunite = 'test"';
var v = -1;
str = str.replace(/<img src=\"/g, function(mtch) {
v++;
return '<a onclick=\"bloquedefilementchapitres();\" href=\"articles/' + urls[v] + '\" class=\"fancybox\" rel=\"images-' + idunite + '\"><img src=\"articles/';
});
str = str.replace(/image_document\">/g, 'image_document\"></a>');
console.log(str);
JSFiddle.
Pure JS solution:
var x = document.getElementsByClassName('image_document');
for (var i = 0; i < x.length; i++) {
var obj = x[i];
var n = document.createElement('a');
n.href = "articles/" + obj.attributes.src.value;
obj.src = "articles/" + obj.attributes.src.value;
n.className = "fancybox";
n.rel = "images";
n.innerHTML = obj.outerHTML;
n.addEventListener('click', bloquedefilementchapitres);
console.log(n);
obj.parentNode.replaceChild(n, obj);
}
<img src="myImage1.jpg" class="image_document">
<img src="myImage2.jpg" class="image_document">
It seems like you can do this with a single global replace:
For example if your HTML string is in a variable called s then it would be:
s.replace(/<img src="([^\"]+)" class="image_document">/g,
"<a onclick=\"bloquedefilementchapitres();\" href=\"articles/$1\" class=\"fancybox\" rel=\"images\"><img src=\"articles/$1\" class=\"image_document\"></a>");
That seems to work for me ok in the console for a string with one or many img tags. Apologies if I'm missing additional complexity/requirements.

Complex Regex : how to don't match reserved word after //

I'm making a webcode editor, I'm working on the text markup so I wrote this regex : /\b(?:abstract|arguments|boolean|break|byte|case|catch|char|const|class|continue|debugger|default|delete|do|double|else|enum|eval|export|extends|false|final|finally|float|for|function|goto|if|implements|import|in|instanceof|int|interface|let|long|native|new|null|package|private|protected|public|return|short|static|super|switch|synchronized|this|throw|throws|transient|true|try|typeof|var|void|volatile|while|with|yield|alert|all|anchor|anchors|area|assign|blur|button|checkbox|clearInterval|clearTimeout|clientInformation|close|closed|confirm|constructor|crypto|decodeURI|decodeURIComponent|defaultStatus|document|element|elements|embed|embeds|encodeURI|encodeURIComponent|escape|event|fileUpload|focus|form|forms|frame|innerHeight|innerWidth|layer|layers|link|location|mimeTypes|navigate|navigator|frames|frameRate|hidden|history|image|images|offscreenBuffering|open|opener|option|outerHeight|outerWidth|packages|pageXOffset|pageYOffset|parent|parseFloat|parseInt|password|pkcs11|plugin|prompt|propertyIsEnum|radio|reset|screenX|screenY|scroll|secure|select|self|setInterval|setTimeout|status|submit|taint|text|textarea|top|unescape|untaint|window|onblur|onclick|onerror|onfocus|onkeydown|onkeypress|onkeyup|onmouseover|onload|onmouseup|onmousedown|onsubmit)\b(?=(?:[^"]*"[^"]*")*[^"]*$)(?=(?:[^']*'[^']*')*[^']*$)(?![^<]*>)(?![^\/*]*\*\/)/gm
This the group of reserved words
/\b(?:abstract|arguments|boolean|break|byte|case|catch|char|const|class|continue|debugger|default|delete|do|double|else|enum|eval|export|extends|false|final|finally|float|for|function|goto|if|implements|import|in|instanceof|int|interface|let|long|native|new|null|package|private|protected|public|return|short|static|super|switch|synchronized|this|throw|throws|transient|true|try|typeof|var|void|volatile|while|with|yield|alert|all|anchor|anchors|area|assign|blur|button|checkbox|clearInterval|clearTimeout|clientInformation|close|closed|confirm|constructor|crypto|decodeURI|decodeURIComponent|defaultStatus|document|element|elements|embed|embeds|encodeURI|encodeURIComponent|escape|event|fileUpload|focus|form|forms|frame|innerHeight|innerWidth|layer|layers|link|location|mimeTypes|navigate|navigator|frames|frameRate|hidden|history|image|images|offscreenBuffering|open|opener|option|outerHeight|outerWidth|packages|pageXOffset|pageYOffset|parent|parseFloat|parseInt|password|pkcs11|plugin|prompt|propertyIsEnum|radio|reset|screenX|screenY|scroll|secure|select|self|setInterval|setTimeout|status|submit|taint|text|textarea|top|unescape|untaint|window|onblur|onclick|onerror|onfocus|onkeydown|onkeypress|onkeyup|onmouseover|onload|onmouseup|onmousedown|onsubmit)\b
This skip markup if in double quotes
(?=(?:[^"]*"[^"]*")*[^"]*$)
This skip markup if in single quotes
(?=(?:[^']*'[^']*')*[^']*$)
This skip markup if in a tag <>
(?![^<]*>)
This skip markup if in a comment /* */
(?![^\/*]*\*\/)
Now I'm stuck on the last piece of cake, I need to skip markup if in a comment // [single line]
(?!\/\/[\w\s\'\"][^\n]*)|(?!\/\/)
Any suggestion?
My suggestion is not to use a regex for this kind of parsing job. Since you are building something in Javascript, you can use jison to convert a grammar that you design into a working javascript function that will parse text according to your grammar.
if you are curious this is my solution, please let me know if your eyes are bleeding or if it is a good solution :
//finding the string that I need to manipulate
regcomment2 =/(\/\/[\w\s\'\"][^\n]*)|(\/\/)/gm;
//this the loop to find and replace
var str = finale.match(regcomment2);
if(finale.match(regcomment2)){
str = str.toString();
var arr = str.split(",");
var arrcheck = str.split(",");
var text = "";
var i;
for (i = 0; i < arr.length; i++) {
//writing right code
arr[i]= arr[i].replace(/(<.*?[^ok]>)/g,"");
console.log("Commento Split Dopo = " + arr[i]);
console.log("Commento Arr2 = " + arrcheck[i]);
//replace original code with right code
finale = finale.replace(arrcheck[i],arr[i]);
}

Split in javascript without tags with specific text

im trying to split a string
<div id = 'tostart'><button>todo </button>hometown todo </div>
with "to" as a keyword.
the problem is i do not have to split in between the tags and have to only split from outside the tags so if i split i get a result like
arr = ["<div id = 'tostart'><button>","do","</button>home","wn ","do </div>"]
is there a regex by using which it can be acheived.
Thanks in advance.
use this :
var str = "<div id = 'tostart'><button>todo </button>hometown todo </div>";
var res = str.replace(/to/g, '|').replace(/(.*?)(<.*?)\|(.*?>)/g, '$1$2to$3');
console.log(res.split("\|"));
output :
["<div id = 'tostart'><button>", "do </button>home", "wn ", "do </div>"]
#musefan:
This is actually done as an improvisation .
first I replaced all the to with | and then I selected all the pipes which were inside the < or > and replaced them with to. Finally I could split on the basis of the | which were left over by the previous replace.
regex : (.*?)(<.*?)\|(.*?>)
will select all | characters which are inside < and >
I am relying on your HTML using < and > to escape stray < > that browsers tolerate but validators don't!
str.split(/to(?=[^>]*(?=<|$))/g);
As others have said, regex isn't going to work for really messy HTML (e.g. inline script elements).
This is a very quick and dirty function that will do what you want. Note that there is probably a more efficient way to do this, and also that it does not cater for any > characters that might be part of an attribute value. However, it does work for your example input:
function splitNonTag(input, splitText) {
var inTag = false;//flag to check if we are in a tag or not
var result = [];//array for storing results
var temp = "";//string to store current result
for (var i = 0; i < input.length; i++) {
var c = input[i];//get the current character to process
//check if we are not in a tag and have found a split match
if (!inTag && input.substring(i).indexOf(splitText) == 0) {
result.push(temp);//add the split data to the results
temp = "";//clear the buffer ready for next set of split data
i += splitText.length - 1;//skip to the end of the split delimiter as we don't keep this data
continue;//continue directly to next iteration of for loop
}
temp += c;//append current character to buffer as this is part of the split data
//check if we are entering, or exiting a tag and set the flag as needed
if (c == '<') inTag = true;
else if (c == '>') inTag = false;
}
//if we have any left over buffer data then this should become the last split result item
if (temp)
result.push(temp);
return result;
}
var input = "<div id = 'tostart'><button>todo </button>hometown todo </div>";
var result = splitNonTag(input, 'to');
console.log(result);
Here is a working example

JS Regex to find href of several a tags

I need a regex to find the contents of the hrefs from these a tags :
<p class="bc_shirt_delete">
delete
</p>
Just the urls, not the href/ tags.
I'm parsing a plain text ajax request here, so I need a regex.
You can try this regex:
/href="([^\'\"]+)/g
Example at: http://regexr.com?333d1
Update: or easier via non greedy method:
/href="(.*?)"/g
This will do it nicely. http://jsfiddle.net/grantk/cvBae/216/
Regex example: https://regex101.com/r/nLXheV/1
var str = '<p href="missme" class="test">delete</p>'
var patt = /<a[^>]*href=["']([^"']*)["']/g;
while(match=patt.exec(str)){
alert(match[1]);
}
Here is a robust solution:
let href_regex = /<a([^>]*?)href\s*=\s*(['"])([^\2]*?)\2\1*>/i,
link_text = 'another article link',
href = link_text.replace ( href_regex , '$3' );
What it does:
detects a tags
lazy skips over other HTML attributes and groups (1) so you DRY
matches href attribute
takes in consideration possible whitespace around =
makes a group (2) of ' and " so you DRY
matches anything but group (1) and groups (3) it
matches the group (2) of ' and "
matches the group (1) (other attributes)
matches whatever else is there until closing the tag
set proper flags i ignore case
You may don't need Regex to do that.
o = document.getElementsByTagName('a');
urls = Array();
for (i =0; i < o.length; i++){
urls[i] = o[i].href;
}
If it is a plain text, you may insert it into a displayed non DOM element, i.e display: none, and then deal with it regularly in a way like I described.
It might be easier to use jQuery
var html = '<li><h2 class="saved_shirt_name">new shirt 1</h2><button class="edit_shirt">Edit Shirt</button><button class="delete_shirt" data-eq="0" data-href="/CustomContentProcess.aspx?CCID=13524&OID=3936923&A=Delete">Delete Shirt</button></li><li><h2 class="saved_shirt_name">new shirt 2</h2><button class="edit_shirt">Edit Shirt</button><button class="delete_shirt" data-eq="0" data-href="/CustomContentProcess.aspx?CCID=13524&OID=3936924&A=Delete">Delete Shirt</button></li><li><h2 class="saved_shirt_name">new shirt 3</h2><button class="edit_shirt">Edit Shirt</button><button class="delete_shirt" data-eq="0" data-href="/CustomContentProcess.aspx?CCID=13524&OID=3936925&A=Delete">Delete Shirt</button></li>';
$(html).find('[data-href]');
And iterate each node
UPDATE (because post updated)
Let html be your raw response
var matches = $(html).find('[href]');
var hrefs = [];
$.each(matches, function(i, el){ hrefs.push($(el).attr('href'));});
//hrefs is an array of matches
I combined a few solutions around and came up with this (Tested in .NET):
(?<=href=[\'\"])([^\'\"]+)
Explanation:
(?<=) : look behind so it wont include these characters
[\'\"] : match both single and double quote
[^] : match everything else except the characters after '^' in here
+ : one or more occurrence of last character.
This works well and is not greedy with the quote as it would stop matching the moment it finds a quote
var str = "";
str += "<p class=\"bc_shirt_delete\">";
str += "delete";
str += "</p>";
var matches = [];
str.replace(/href=("|')(.*?)("|')/g, function(a, b, match) {
matches.push(match);
});
console.log(matches);
or if you don't care about the href:
var matches = str.match(/href=("|')(.*?)("|')/);
console.log(matches);
how about spaces around = ?
this code will fix it:
var matches = str.match(/href( *)=( *)("|'*)(.*?)("|'*)( |>)/);
console.log(matches);
It's important to be non-greedy. And to cater for —matching— ' or "
test = "<a href="#" class="foo bar"> banana
<a href='http://google.de/foo?yes=1&no=2' data-href='foobar'/>"
test.replace(/href=(?:\'.*?\'|\".*?\")/gi,'');
disclaimer: The one thing it does not catch is html5 attribs data-href...
In this specified case probably this is fastest pregmatch:
/f="([^"]*)/
gets ALL signs/characters (letters, numbers, newline signs etc.) form f=" to nearest next ", excluding it, flags for example /is are unnecesary, return null if empty
but if the source contains lots of other links, it will be necessary to determine that this is exactly the one you are looking for and here we can do it this way, just include in your pregmatch more of the source code, for example (of course its depend from source site code...)
/bc_shirt_delete">\s*<a href="([^"]*)

how to replace multiple words using javascript replace() function?

here is my code:
var keys = keyword.split(' ');
//alert(keys);
for(var i=0; i<keys.length; i++)
{
var re = new RegExp(keys[i], "gi");
var NewString = oldvar.replace(re, '<span style="background-color:#FFFF00">'+keys[i]+'</span>');
document.getElementById("wordlist").innerHTML=NewString;
alert(keys[i]);
}
but here if I put a string "a b"; its split into two letters "a" and "b"
and this replace function replace "a" but when it get "b" it overwrite and only replace "b".
but I want to highlight both "a" and "b".
how to solve this?
I got another problem . If I replace/highlight it then it replace all "a" and "b" of HTML tag. so, how to prevent to replace those html tag. but also when I display the whole text I need all html tag
You can actually do a single regex replace like this:
var re = new RegExp(keys.join("|"), "gi");
oldvar = oldvar.replace(re, replacer);
document.getElementById("wordlist").innerHTML = oldvar;
function replacer(str)
{
return '<span style="background-color:#FFFF00">' + str + '</span>';
}
Example - http://jsfiddle.net/zEXrq/1/
What it is doing is merging all keys into a single regex seperated by | which will match all the words then running the replacer function on the matches.
Example 2 - http://jsfiddle.net/zEXrq/2/
var keys = keyword.split(' ');
//alert(keys);
for(var i=0; i<keys.length; i++)
{
var re = new RegExp(keys[i], "gi");
oldvar = oldvar.replace(re, '<span style="background-color:#FFFF00">'+keys[i]+'</span>');
document.getElementById("wordlist").innerHTML=oldvar;
alert(keys[i]);
}
Edit:
It seems obvious that oldvar is not changed durring the loop always only last replace is applyied. You have to change "oldvar" in order to replace all the words
You should do the Operations on the same var. You take oldvar outside of the loop, but never take the changed content into oldvar. So the last iteration (only) is the one which replaces the content.
You're calling replace on the variable oldvar (which is not declared in this snippet) in each iteration and thus starting from the same point - the non-highlighted string - every time. Without having seen all of the code, I would guess that simply replacing var NewString = with oldvar = and .innerHTML=NewString with .innerHTML=oldvar will solve your problem.

Categories