javascript - replacing text not between in attributes - javascript

I'm trying to replace text inside an HTML string with Javascript.
The tricky part is that I need to replace the text only if it isn't inside a tag (meaning it's not part of an attribute):
var html_str = '<div>hi blah blah<img src="img.jpg" alt="say hi" />hi!</div>';
In this example, after I do html_str.replace("hi","hello"); I want to replace only the text inside the div and a tags, avoiding the <img alt=".." or the href="....
Some more info:
html_str = document.body.innerHTML;, therefore the elements are unknown. The example above is only an example.
Regex are more than welcome.
The hi and hello values are inside varaibles, meaning the actual replace is like so: html_str.replace(var1,var2);
The REAL code is this:
var html_str = document.body.innerHTML;
var replaced_txt = "hi";
var replace_with = "hello";
var replaced_html = html_str.replace(replaced_txt,replace_with);
I hope I explained myself well.
Thanks in advance.

This maybe?
var obj = {'hi':'hello','o':'*','e':'3','ht':'HT','javascrpit':'js','ask':'ASK','welcome':'what\'s up'}; // This may contain a lot more data
(function helper(parent, replacements) {
[].slice.call(parent.childNodes, 0).forEach(function (child) {
if (child.nodeType == Node.TEXT_NODE) {
for (var from in replacements) {
child.nodeValue = child.nodeValue.replace(from, replacements[from]);
}
}
else {
helper(child, replacements);
}
});
}(document.body, obj));
http://jsfiddle.net/G8fYq/4/ (uses document.body directly)
If you want to make the changes visible immediately then you could also pass document.body and forget about the whole container stuff.
Update to allow for multiple replacements in one run.
You could also try XPath in javascript, though the following solution will not work in IE.
var
replacements = {'hi':'hello','o':'*','e':'3','ht':'HT','javascrpit':'js','ask':'ASK','welcome':'what\'s up'},
elements = document.evaluate('//text()', document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null),
i = 0,
textNode, from;
for (; i < elements.snapshotLength; i += 1) {
textNode = elements.snapshotItem(i);
for (from in replacements) {
if (replacements.hasOwnProperty(from)) {
textNode.nodeValue = textNode.nodeValue.replace(from, replacements[from]);
}
}
}

Actually it is possible to use this simple regex negative lookahead:
(?![^<>]*>)
Just add it to your match pattern and it will exclude any content as attributes in tags. Here is an example:
Javascript regex: Find all URLs outside <a> tags - Nested Tags

You can try this, but regexp and HTML are not friends:
var str = '<div>hi blah blah<img src="img.jpg" alt="say hi" />hi!</div>',
rx = new RegExp('(\>[^\>]*)' + 'hi' + '([^\>=]*\<)', 'g');
str = str.replace(rx, '$1hello$2');

Related

Javascript regex neglect div span tags

I have the below text
<span> is an </span>
And I wanted to change the an into a and I use the below regex pattern to do that.
const regExFinder = new RegExp("an", 'gi');
const sourceHTML = "<span> is an </span>";
sourceHTML.replace(regExFinder, `$&`);
But the output is something like this. Can anybody give me an idea of how to neglect any tag and only change the text inside the tag.
<spa> is a </spa>
And what if my source HTML looks like this:
<div> an <span> is an </span></div>
You have a couple of options.
const str = "<div> an <span> is an </span></div>";
// method 1: negative lookaheads (probably the best for regex)
str.replace(/an(?![^<>]*>)/gi, "a");
// method 2: rely on having a space after the "an" (not reliable)
str.replace(/an /gi, "a ")
// method 3: rely on "an" being its own word (depends on the situation)
str.replace(/\ban/gi, "a")
I parse the whole string into a DOM element and then go through all span elements to change their content from "an" to "a", The metacharacter \b in the regular expression denotes a word boundary.
Edit:
After digging a bit deeper I can now operate on all text nodes and change the strings in question:
var html='<div> an <span> is an </span>apple and this <span> is a </span> banana.</div>';
var b=document.createElement('body');
b.innerHTML=html;
// use the "optional filter function" to do the changes:
getTextNodesIn(b,n=>n.textContent=n.textContent.replace(/\ban\b/g,'a'));
// output:
console.log(b.innerHTML);
// I just realised that I can also use Chris West's original function:
// https://cwestblog.com/2014/03/14/javascript-getting-all-text-nodes/
function getTextNodesIn(elem, opt_fnFilter) {
var textNodes = [];
if (elem) {
for (var nodes = elem.childNodes, i = nodes.length; i--;) {
var node = nodes[i], nodeType = node.nodeType;
if (nodeType == 3) {
if (!opt_fnFilter || opt_fnFilter(node, elem)) {
textNodes.push(node);
}
}
else if (nodeType == 1 || nodeType == 9 || nodeType == 11) {
textNodes = textNodes.concat(getTextNodesIn(node, opt_fnFilter));
}
}
}
return textNodes;
}
"Fun fact": In ES6 notation the function can be re-written in an even shorter way as:
function getTN(elem, opt_flt) {
if (elem) return [...elem.childNodes].reduce((tn,node)=>{
var nty = node.nodeType;
if (nty==3 && (!opt_flt || opt_flt(node, elem))) tn.push(node);
else if (nty==1 || nty==9 || nty==11) tn=tn.concat(getTN(node, opt_flt));
return tn
}, []);
}
You can check this solution. I've removed all html tag from the string and then applied the replacement operation. It'll work for both of your test cases.
const regExFinder = new RegExp("an", 'gi');
let sourceHTML = "<div> an <span> is an </span></div>";
sourceHTML = sourceHTML.replace(/<[^>]*>?/gm, '').trim(); // removing HTML tags
sourceHTML = sourceHTML.replace(regExFinder, 'a');
console.log(sourceHTML)

JS RegExp finding word that is not in tag and replace string [duplicate]

This question already has answers here:
How can I change an element's text without changing its child elements?
(16 answers)
Closed 5 years ago.
I need to write a second RegExp to find variable d inside sentence that is not in tags. So variable in tags should be skipped.
Regex '(?:^|\\b)('+d+')(?=\\b|$)' will find d variable but i need to exclude <span> tag with class="description".
New sentence is wrapped in a new tag.
sentence = "This is some word. <span class='description'>word</span> in tag should be skipped"
d = 'word'
re = new RegExp('(?:^|\\b)('+d+')(?=\\b|$)', 'gi')
sentence = sentence.replace(re, "<span>$1</span>")
Result I'm trying to achieve is:
"This is some <span>word</span>. <span class='description'>word</span> in tag should be skipped"
I'm using coffeescript, thanks for the help.
Try this one: (word)(?![^<>]*<\/)
Full code:
var sentence = "This is some word. <span class='description'>word</span> in tag should be skipped"
var d = 'word'
var re = new RegExp('('+d+')(?![^<>]*<\/)', 'gi')
sentence = sentence.replace(re, "<span>$1</span>")
I based this answer on this snippet: https://regex101.com/library/gN4vI6
Trying to manipulate HTML with regular expressions is not a good idea: sooner or later you'll bump into some boundary condition where it fails. Maybe some < or > occur inside attribute values, or even inside text nodes, while the searched term may also occur at unexpected places, like in HTML comments, attribute values, or script tags, ... The list of boundary cases is long.
Furthermore, your search term may contain characters that have a special meaning in regular expression syntax, so you should at least escape those.
Here is a solution that interprets the string as HTML, using the DOM capabilities, and only replaces text in text nodes:
function escapeRegExp(str) {
return str.replace(/[\[\]\/{}()*+?.\\^$|-]/g, "\\$&");
}
function wrapText(sentence, word) {
const re = new RegExp("\\b(" + escapeRegExp(word) + ")\\b", "gi"),
span = document.createElement('span');
span.innerHTML = sentence;
Array.from(span.childNodes, function (node) {
if (node.nodeType !== 3) return;
node.nodeValue.split(re).forEach(function (part, i) {
let add;
if (i%2) {
add = document.createElement('span');
add.textContent = part;
add.className = 'someClass';
} else {
add = document.createTextNode(part);
}
span.insertBefore(add, node);
});
span.removeChild(node);
});
return span.innerHTML;
}
const html = 'This is some word. <span class="word">word</span> should stay',
result = wrapText(html, 'word');
console.log(result);
Recursing into elements
In comments you mentioned that you would now also like to have the replacements happening within some tags, like p.
I'll assume that you want this to happen for all elements, except those that have a certain class, e.g. the class that you use for the wrapping span elements, but you can of course customise the condition to your needs (like only recursing into p, or ...).
The code needs only a few modifications:
function escapeRegExp(str) {
return str.replace(/[\[\]\/{}()*+?.\\^$|-]/g, "\\$&");
}
function wrapText(sentence, word) {
const re = new RegExp("\\b(" + escapeRegExp(word) + ")\\b", "gi"),
doc = document.createElement('span');
doc.innerHTML = sentence;
(function recurse(elem) {
Array.from(elem.childNodes, function (node) {
// Customise this condition as needed:
if (node.classList && !node.classList.contains('someClass')) recurse(node);
if (node.nodeType !== 3) return;
node.nodeValue.split(re).forEach(function (part, i) {
let add;
if (i%2) {
add = document.createElement('span');
add.textContent = part;
add.className = 'someClass';
} else {
add = document.createTextNode(part);
}
elem.insertBefore(add, node);
});
elem.removeChild(node);
});
})(doc);
return doc.innerHTML;
}
const html = '<p><b>Some word</b></p>. <span class="someClass">word</span> should stay',
result = wrapText(html, 'word');
console.log(result);

Regular expression to avoid replace part of html tag

For example , I have a string <i class='highlight'>L</i>olopolo . And I need to change the letter l to <i class='highlight'>l</i>. How to make a regular expression to ignore the tag and everything inside?
Try this:
var string = "<i class='highlight'>L</i>olopolo";
string = string.replace(/l(?![^>]+>)(?![^<]*<\/)/g, "<i class='highlight'>l</i>");
alert(string);
if you want to have arbitrary text then you can use the code below:
var text = "foo";
var re = new RegExp(text + '(?![^>]+>)(?![^<]*</)', 'g');
var string = "<i class='highlight'>foobar</i>foobarfoobar";
string = string.replace(re, "<i class='highlight'>" + text + "</i>");
alert(string);
As mentioned using regular expressions is not the best idea so the next best thing is to loop over text nodes and add the elements.
var charSplit = "l";
var elem = document.querySelector(".x");
var nodes = elem.childNodes;
for(var i=nodes.length-1;i>=0;i--){
var node = nodes[i];
if(node.nodeType === 3) { //this is a text node
var last = node;
var parts = node.nodeValue.split(charSplit); //split of the character we are supposed to match
node.nodeValue = parts[parts.length-1]; //set text node value to last index's value
for (var j=parts.length-2; j>=0;j--){ //loop backwards ingnoring the last index since we already put that text in the textode
var it = document.createElement("i"); //create the new element to add
it.className="highligt";
it.innerHTML = charSplit;
node.parentNode.insertBefore(it,last); //add it before the text node
var tx = document.createTextNode(parts[j]); //create new text node for text that becomes before the element
node.parentNode.insertBefore(tx,it);
last = tx;
}
}
}
<p class="x"><i class='highlight'>L</i>olopolo</p>
I would suggest something like this, with minimal (and not so complicated) regex usage. If your string is initially part of html -> you can get parent(s), and change textContent and innerHTML:
tag=document.getElementsByTagName('p')[0]; /*just as example,can be anything else*/
str=tag.textContent;
reg=/(l)/gi;
tag.innerHTML=str.replace(reg,"<i class='highlight'>"+'$1'+"</i>");
Demo: http://jsfiddle.net/LzbkhLx7/
P.S. Explanation - textContent will give you 'pure' string/text, without HTML tags - and then you can easily wrap every occurrence of l/L.
document.getElementsByClassName("highlight")[0].innerHTML = "l";
No need for regex.
Or if you want to change the letter from upper to lower case
var el = document.getElementsByClassName("highlight")[0];
el.innerHTML = el.innerHTML.toLowerCase();
Of course you'll have to make sure you can call toLowerCase (or other method) on the innerHTML before doing it.

Match characters at start of string, ignore strings in html tags

A little help required please...
I have a regular expression that matches characters at the start of a string as follows:
If I have a set of strings like so:
Ray Fox
Foster Joe
Finding Forrester
REGEX
/\bfo[^\b]*?\b/gi
This will match 'FO' in Fox, Foster, and Forrester as expected:
However, I am faced with an issue where if the set of strings are wrapped in html tags like so;-
<span class="fontColor1">Ray Fox</span>
<span class="fontColor2">Foster Joe</span>
<span class="fontColor3">Finding Forrester</span>
This will match 'FO' in fontColor* as well.
I'm fairly green with Regular expressions, I need a little help updating the query so that it only searches values between HTML tags where HTML tags exist, but still works correctly if HTML tags do not exist.
You can use a html parser and extract pure text, and match that.
var root;
try {
root = document.implementation.createHTMLDocument("").body;
}
catch(e) {
root = document.createElement("body");
}
root.innerHTML = '<span class="fontColor1">Ray Fox</span>\
<span class="fontColor2">Foster Joe</span>\
<span class="fontColor3">Finding Forrester</span>';
//If you are using jQuery
var text = $(root).text();
//Proceed as normal with the text variable
If you are not using jQuery, you can replace $(root).text() with findText(root), where findText:
function findText(root) {
var ret = "",
nodes = root.childNodes;
for (var i = 0; i < nodes.length; ++i) {
if (nodes[i].nodeType === 3) {
ret += nodes[i].nodeValue;
} else if (nodes[i].nodeType === 1) {
ret += findText(nodes[i]);
}
}
return ret;
}
What about
<.*?span.*?>(.*?)<\s?\/.*?span.*?>
And where do you have text where html tags don't exist? That makes no sense.
EDIT:
This solution will not match nested tags, but as the question is written, that doesn't seem to be an issue.

How to replace text not within a specific-Tag in JavaScript

I have a string (partly HTML) where I want to replace the string :-) into bbcode :wink:. But this replacement should not happen within <pre>, but in any other tag (or even not within a tag).
For example, I want to replace
:-)<pre>:-)</pre><blockquote>:-)</blockquote>
to:
:wink:<pre>:-)</pre><blockquote>:wink:</blockquote>
I already tried it with the following RegEx, but it does not work (nothing gets replaced):
var s = ':-)<pre>:-)</pre><blockquote>:-)</blockquote>';
var regex = /:\-\)(?!(^<pre>).*<\/pre>)/g;
var r = s.replace(regex, ':wink:');
Can someone please help me? :-)
This ought to do it:-
var src = ":-)<pre>:-)</pre><blockquote>:-)</blockquote>"
var result = src.replace(/(<pre>(?:[^<](?!\/pre))*<\/pre>)|(\:\-\))/gi, fnCallback)
function fnCallback(s)
{
if (s == ":-)") return ":wink:"
return s;
}
alert(result);
It works because any pre element will get picked up by the first option in the regex and once consumed means that any contained :-) can't be matched since the processor will have moved beyond it.
You could avoid hellish regexes altogether if you use a suitable library such as jQuery, e.g.:
var excludeThese = ['pre'];
// loop over all elements on page, replacing :-) with :wink: for anything
// that is *not* a tag name in the excludeThese array
$('* not:(' + excludeThese.join(',') + ')').each(function() {
$(this).html($(this).html().replace(/:\-\)/,':wink:'));
});
Just thought it'd be worth offering a DOM solution:
E.g.
var div = document.createElement('div');
div.innerHTML = ":-)<pre>:-)</pre><blockquote>:-)</blockquote>";
replace(div, /:-\)/g, ":wink:", function(){
// Custom filter function.
// Returns false for <pre> elements.
return this.nodeName.toLowerCase() !== 'pre';
});
div.innerHTML; // <== here's your new string!
And here's the replace function:
function replace(element, regex, replacement, filter) {
var cur = element.firstChild;
if (cur) do {
if ( !filter || filter.call(cur) ) {
if ( cur.nodeType == 1 ) {
replace( cur, regex, replacement );
} else {
cur.data = cur.data.replace( regex, replacement );
}
}
} while ( cur = cur.nextSibling );
}
Almost good: Your negative lookbehind and lookahead where not in the right position and need a slight adjustment:
/(?<!(<pre>)):-\)(?!(<\/pre>))/g
Looks for all ":-)"
...but not if there is a <pre> behind (the regex cursor is!)
...but not if there is a </pre> before (the regex cursor is!)
as a side effect though: <pre>:-):-)</pre> works too but not <pre>:-):-):-)</pre>
https://regex101.com/r/CO0DAD/1
ps. this is a firefox 104 browser (could be different in others)
try with
var regex = /:-)(?!(^)*</pre>)/g;

Categories