How to replace words appearing anywhere through the DOM? - javascript

I have 2 arrays, one for old words and one for new ones:
old_words = ['word1', 'word2', 'word3'];
new_words = ['nword1', 'nword2', 'nword3'];
I need to find which words inside the DOM match any in the old_words array, replace it with the same index one in new_words, and wrap it in a tag.
And yes, I know traversing the DOM is probably not the best option, but I don't know where the words might appear since it is meant to work on any website.
I can't figure out how to do this... this is what I have so far:
const regexp = new RegExp('(' + old_words.join('|') + ')', 'ig');
old_words.forEach(function (word, idx) {
let addSpan = word.replace( regexp, `<span data-old-word="$&"> ${new_words[idx]}</span>`;
// that's all I have
});
the thing is, I don't know what to change to check the entire DOM as it fails...any tips would be appreciated...
I have previously attempted to do:
if (document.body.innerText.indexOf(word) !== -1){
//change
}
but it crashes because of the size. I assume I could take the values from the only but still I'm unsure how to go about it.

if you start with an empty html and create the elements using createElement().
or if every element has a class you could do getElementByClass().

Related

Replace all occurences of custom tag with URL in it [JS]

I'm making an online text editor for a website I'm building, and I use custom tags for the markup.
To make it easier to read, the markup is highlighted by blue, which I do buy using the following function:
var imgOccurences = (informationText.match(/\[img/gi)).length;
for(var i = 0; i < imgOccurences; i++){
var imgLocation = informationText.indexOf('[img');
var endImgLocation = informationText.indexOf(']', imgLocation+1);
if(imgLocation != -1 && endImgLocation != -1){
var informationTextTemp1 = informationText.slice(0, imgLocation);
var informationTextTemp2 = informationText.slice(endImgLocation+1, -1);
var informationTextTemp3 = informationText.slice(imgLocation, endImgLocation+1);
informationTextTemp3 = "<span class='highlightWord'>"+informationTextTemp3+"</span>";
informationText = informationTextTemp1 + informationTextTemp3 + informationTextTemp2;
}
}
However the problem I face is that, when normalizing the text to HTML, I cannot use regex expressions, which I was previously using with the other tags, on the [img] tag, due to the fact that I wanted to highlight the image tag, and all of its contents, which includes a URL.
So I decided to count up all the occurrences of just the '[img' part of the [img] tag and then look for the next occurrence of ']', then slice it out of the normal text, then highlight it using a span, and then add it back to the normal text, while I put it in a for loop.
However only the first occurrence of the [img] tag is highlighted, and I am unsure as to how I should deal with this. Any help would be appreciated.
Basically I need to get everything which looks like: [img src='www.example.com/image.png']and make it look like:<span class='highlightWord'>[img src='example.com/image.png']</span> and then put it into the .innerHTML of the div called textHighlights.
Expected result:
The result I got:
You can do it much simpler since the .replace method accepts a regular expression as a parameter for the matching string.
informationText = informationText.replace(/(\[img.+?\])/gi, '<span class="highlightWord">$1</span>');
The above will replace all matches directly (by wrapping them in the span you want)

How to find a substring only in the text portion of an HTML string, with Javascript?

UPDATE: I am no longer specifically in need of the answer to this question - I was able to solve the (larger) problem I had in an entirely different way (see my comment). However, I'll check in occasionally, and if a viable answer arrives, I'll accept it. (It may take a week or three, though, as I'm only here sporadically.)
I have a string. It may or may not have HTML tags in it. So, it could be:
'This is my unspanned string'
or it could be:
'<span class="someclass">This is my spanned string</span>'
or:
'<span class="no-text"></span><span class="some-class"><span class="other-class">This is my spanned string</span></span>'
or:
'<span class="no-text"><span class="silly-example"></span></span><span class="some-class">This is my spanned string</span>'
I want to find the index of a substring, but only in the portion of the string that, if the string were turned into a DOM element, would be (a) TEXT node(s). In the example, only in the part of the string that has the plain text This is my string.
However, I need the location of the substring in the whole string, not only in the plain text portion.
So, if I'm searching for "span" in each of the strings above:
searching the first one will return 13 (0-based),
searching the second will skip the opening span tag in the string and return 35 for the string span in the word spanned
searching the third will skip the empty span tag and the openings of the two nested span tags, and return 91
searching the fourth will skip the nested span tags and the opening of the second span tag, and return 100
I don't want to remove any of the HTML tags, I just don't want them included in the search.
I'm aware that attempting to use regex is almost certainly a bad idea, probably even for simplistic strings as my code will be encountering, so please refrain from suggesting it.
I'm guessing I will need to use an HTML parser (something I've never done before). Is there one with which I can access the original parsed strings (or at least their lengths) for each node?
Might there be a simpler solution than that?
I did search around and wasn't been able to find anyone ask this particular question before, so if someone knows of something I missed, I apologize for faulty search skills.
The search could loop through the string char by char. If inside a tag, skip the tag, search the string only outside tags and remember partial match in case the text is matched partially then interrupted with another tag, continue the search outside the tag.
Here is a little function I came up with:
function customSearch(haysack,needle){
var start = 0;
var a = haysack.indexOf(needle,start);
var b = haysack.indexOf('<',start);
while(b < a && b != -1){
start = haysack.indexOf('>',b) + 1;
a = haysack.indexOf(needle,start);
b = haysack.indexOf('<',start);
}
return a;
}
It returns the results you expected based in your examples. Here is a JSFiddle where the results are logged in the console.
Let's start with your third example:
var desiredSubString = 'span';
var entireString = '<span class="no-text"></span><span class="some-class"><span class="other-class">This is my spanned string</span></span>';
Remove all HTML elements from entireString, above, to establish textString:
var textString = entireString.replace(/(data-([^"]+"[^"]+")/ig,"");
textString = textString.replace(/(<([^>]+)>)/ig,"");
You can then find the index of the start of the textString within the entireString:
var indexOfTextString = entireString.indexOf(textString);
Then you can find the index of the start of the substring you're looking for within the textString:
var indexOfSubStringWithinTextString = textString.indexOf(desiredSubString);
Finally you can add indexOfTextString and indexOfSubStringWithinTextString together:
var indexOfSubString = indexOfTextString + indexOfSubStringWithinTextString;
Putting it all together:
var entireString = '<span class="no-text"></span><span class="some-class"><span class="other-class">This is my spanned string</span></span>';
var desiredSubString = 'span';
var textString = entireString.replace(/(data-([^"]+"[^"]+")/ig,"");
textString = textString.replace(/(<([^>]+)>)/ig,"");
var indexOfTextString = entireString.indexOf(textString);
var indexOfSubStringWithinTextString = textString.indexOf(desiredSubString);
var indexOfSubString = indexOfTextString + indexOfSubStringWithinTextString;
You could use the browser's own HTML parser and XPath engine to search only inside the text nodes and do whatever processing you need.
Here's a partial solution:
var haystack = ' <span class="no-text"></span><span class="some-class"><span class="other-class">This is my spanned string</span></span>';
var needle = 'span';
var elt = document.createElement('elt');
elt.innerHTML = haystack;
var iter = document.evaluate('.//text()[contains(., "' + needle + '")]', elt).iterateNext();
if (iter) {
var position = iter.textContent.indexOf(needle);
var range = document.createRange();
range.setStart(iter, position);
range.setEnd(iter, position + needle.length);
// At this point, range points at the first occurence of `needle`
// in `haystack`. You can now delete it, replace it with something
// else, and so on, and after that, set your original string to the
// innerHTML of the document fragment representing the range.
console.log(range);
}
JSFiddle.

How to properly bold search terms from Twitter, strange regex case in JS

I'm retrieving tweets from Twitter with the Twitter API and displaying them in my own client.
However, I'm having some difficulty properly highlighting the right search terms. I want to an effect like the following:
The way I'm trying to do this in JS is with a function called highlightSearchTerms(), which takes the text of the tweet and an array of keywords to bold as arguments. It returns the text of the fixed tweet. I'm bolding keywords by wrapping them in a that has the class .search-term.
I'm having a lot of problems, which include:
Running a simple replace doesn't preserve case
There is a lot of conflict with the keyword being in href tags
If I try to do a for loop with a replace, I don't know how to only modify search terms that aren't in an href, and that I haven't already wrapped with the span above
An example tweet I want to be able to handle for:
Input:
This is a keyword. This is a <a href="http://search.twitter.com/q=%23keyword">
#keyword</a> with a hashtag. This is a link with kEyWoRd:
http://thiskeyword.com.
Expected Output:
This is a
<span class="search-term">keyword</span>
. This is a <a href="http://search.twitter.com/q=%23keyword"> #
<span class="search-term">keyword</span>
</a> with a hashtag. This is a link with
<span class="search-term">kEyWoRd</span>
:<a href="http://thiskeyword.com">http://this
<span class="search-term>keyword.com</span>
</a>.
I've tried many things, but unfortunately I can't quite find out the right way to tackle the problem. Any advice at all would be greatly appreciated.
Here is my code that works for some cases but ultimately doesn't do what I want. It fails to handle for when the keyword is in the later half of the link (e.g. http://twitter.com/this_keyword). Sometimes it strangely also highlights 2 characters before a keyword as well. I doubt the best solution would resemble my code too much.
function _highlightSearchTerms(text, keywords){
for (var i=0;i<keywords.length;i++) {
// create regex to find all instances of the keyword, catch the links that potentially come before so we can filter them out in the next step
var searchString = new RegExp("[http://twitter.com/||q=%23]*"+keywords[i], "ig");
// create an array of all the matched keyword terms in the tweet, we can't simply run a replace all as we need them to retain their initial case
var keywordOccurencesInitial = text.match(searchString);
// create an array of the keyword occurences we want to actually use, I'm sure there's a better way to create this array but rather than try to optimize, I just worked with code I know should work because my problem isn't centered around this block
var keywordOccurences = [];
if (keywordOccurencesInitial != null) {
for(var i3=0;i3<keywordOccurencesInitial.length;i3++){
if (keywordOccurencesInitial[i3].indexOf("http://twitter.com/") > -1 || keywordOccurencesInitial[i3].indexOf("q=%23") > -1)
continue;
else
keywordOccurences.push(keywordOccurencesInitial[i3]);
}
}
// replace our matches with search term
// the regex should ensure to NOT catch terms we've already wrapped in the span
// i took the negative lookbehind workaround from http://stackoverflow.com/a/642746/1610101
if (keywordOccurences != null) {
for(var i2=0;i2<keywordOccurences.length;i2++){
var searchString2 = new RegExp("(q=%23||http://twitter.com/||<span class='search-term'>)?"+keywordOccurences[i2].trim(), "g"); // don't replace what we've alrdy replaced
text = text.replace(searchString2,
function($0,$1){
return $1?$0:"<span class='search-term'>"+keywordOccurences[i2].trim()+"</span>";
});
}
}
return text;
}
Here's something you can probably work with:
var getv = document.getElementById('tekt').value;
var keywords = "keyword,big elephant"; // comma delimited keyword list
var rekeywords = "(" + keywords.replace(/\, ?/ig,"|") + ")"; // wraps keywords in ( and ), and changes , to a pipe (character for regex alternation)
var keyrex = new RegExp("(#?\\b" + rekeywords + "\\b)(?=[^>]*?<[^>]*>|(?![^>]*>))","igm")
alert(keyrex);
document.getElementById('tekt').value = document.getElementById('tekt').value.replace(keyrex,"<span class=\"search-term\">$1</span>");
And here is a variation that attempts to deal with word forms. If the word ends with ed,es,s,ing,etc, it chops it off and also, while looking for word-boundaries at the end of the word, it also looks for words ending in common suffixes. It's not perfect, for instance the past tense of ride is rode. Accounting for that with Regex is nigh-impossible without opening yourself up to tons of false-positives.
var getv = document.getElementById('tekt').value;
var keywords = "keywords,big elephant";
var rekeywords = "(" + keywords.replace(/(es|ing|ed|d|s|e)?\b(\s*,\s*|$)/ig,"(es|ing|ed|d|s|e)?$2").replace(/,/g,"|") + ")";
var keyrex = new RegExp("(#?\\b" + rekeywords + "\\b)(?=[^>]*?<[^>]*>|(?![^>]*>))","igm")
console.log(keyrex);
document.getElementById('tekt').value = document.getElementById('tekt').value.replace(keyrex,"<span class=\"search-term\">$1</span>");
Edit
This is just about perfect. Do you know how to slightly modify it so the keyword in thiskeyword.com would also be highlighted?
Change this line
var keyrex = new RegExp("(#?\\b" + rekeywords + "\\b)(?=[^>]*?<[^>]*>|(?![^>]*>))","igm")
to (All I did was remove both \\b's):
var keyrex = new RegExp("(#?" + rekeywords + ")(?=[^>]*?<[^>]*>|(?![^>]*>))","igm")
But be warned, you'll have problems like smiles ending up as smiles (if a user searches for mile), and there's nothing regex can do about that. Regex's definition of a word is alphanumeric characters, it has no dictionary to check.

Splitting a long phrase into an array

I need to take the phrase
It’s that time of year when you clean out your closets, dust off shelves, and spruce up your floors. Once you’ve taken care of the dust and dirt, what about some digital cleaning? Going through all your files and computers may seem like a daunting task, but we found ways to make the process fairly painless.
and upon pressing a button
split it into an array
iterate over that array at each step
Build SPAN elements as you go, along with the attributes
Add the SPAN elements to the original DIV
Add a click handler to the SPAN elements, or to the DIV, which causes the style on the SPAN to change on mouseover.
So far I had
function splitString(stringToSplit, separator) {
var arrayOfStrings = stringToSplit.split(separator);
print('The original string is: "' + stringToSplit + '"');
print('The separator is: "' + separator + '"');
print("The array has " + arrayOfStrings.length + " elements: ");
for (var i=0; i < arrayOfStrings.length; i++)
print(arrayOfStrings[i] + " / ");
}
var space = " ";
var comma = ",";
splitString(tempestString, space);
splitString(tempestString);
splitString(monthString, comma);
for (var i=0; i < myArray.length; i++)
{
}
var yourSpan = document.createElement('span');
yourSpan.innerHTML = "Hello";
var yourDiv = document.getElementById('divId');
yourDiv.appendChild(yourSpan);
yourSpan.onmouseover = function () {
alert("On MouseOver");
}
and for html I have
The DIV that will serve as your input (and output) is here, with
id="transcriptText":</p>
<div id="transcriptText"> It’s that time of year when you clean out your
closets, dust off shelves, and spruce up your floors. Once you’ve taken
care of the dust and dirt, what about some digital cleaning? Going
through all your files and computers may seem like a daunting task, but
we found ways to make the process fairly painless.</div>
<br>
<div id="divideTranscript" class="button"> Transform the
Transcript! </div>
Any help on how to move one? I have been stuck for quite some time
Well, first off this looks like homework.
That said, I'll try to help without giving you the actual code, since we're not supposed to give actual working solutions to homework. You're splitting the string too many times (once is all that's needed based on the instructions you gave) and you have to actually store the result of the split call somewhere that your other code can use it.
Your instructions say to add attributes to the span, but not which attributes nor what their contents should be.
Your function should follow the instructions:
1) Split the string. Since it doesn't specify on what, I'd assume words. So split it on spaces only and leave the punctuation where it is.
2) with the array of words returned from the split() function, iterate over it like you attempt to, but inside the braces that scope the loop is where you want to concatenate the <span> starting and ending tags around the original word.
3) use the document.createElement() to make that current span into a DOM element. Attach the mouseover and click handlers to it, then appendChild() it to the div.
add the handler to your button to call the above function.
Note that it's possibly more efficient to use the innerHTML() function to insert all the spans at once, but then you have to loop again to add the hover/click handlers.

Regex to search html return, but not actual html jQuery

I'm making a highlighting plugin for a client to find things in a page and I decided to test it with a help viewer im still building but I'm having an issue that'll (probably) require some regex.
I do not want to parse HTML, and im totally open on how to do this differently, this just seems like the the best/right way.
http://oscargodson.com/labs/help-viewer
http://oscargodson.com/labs/help-viewer/js/jquery.jhighlight.js
Type something in the search... ok, refresh the page, now type, like, class or class=" or type <a you'll notice it'll search the actual HTML (as expected). How can I only search the text?
If i do .text() it'll vaporize all the HTML and what i get back will just be a big blob of text, but i still want the HTML so I dont lose formatting, links, images, etc. I want this to work like CMD/CTRL+F.
You'd use this plugin like:
$('article').jhighlight({find:'class'});
To remove them:
.jhighlight('remove')
==UPDATE==
While Mike Samuel's idea below does in fact work, it's a tad heavy for this plugin. It's mainly for a client looking to erase bad words and/or MS Word characters during a "publishing" process of a form. I'm looking for a more lightweight fix, any ideas?
You really don't want to use eval, mess with innerHTML or parse the markup "manually". The best way, in my opinion, is to deal with text nodes directly and keep a cache of the original html to erase the highlights. Quick rewrite, with comments:
(function($){
$.fn.jhighlight = function(opt) {
var options = $.extend($.fn.jhighlight.defaults, opt)
, txtProp = this[0].textContent ? 'textContent' : 'innerText';
if ($.trim(options.find.length) < 1) return this;
return this.each(function(){
var self = $(this);
// use a cache to clear the highlights
if (!self.data('htmlCache'))
self.data('htmlCache', self.html());
if(opt === 'remove'){
return self.html( self.data('htmlCache') );
}
// create Tree Walker
// https://developer.mozilla.org/en/DOM/treeWalker
var walker = document.createTreeWalker(
this, // walk only on target element
NodeFilter.SHOW_TEXT,
null,
false
);
var node
, matches
, flags = 'g' + (!options.caseSensitive ? 'i' : '')
, exp = new RegExp('('+options.find+')', flags) // capturing
, expSplit = new RegExp(options.find, flags) // no capturing
, highlights = [];
// walk this wayy
// and save matched nodes for later
while(node = walker.nextNode()){
if (matches = node.nodeValue.match(exp)){
highlights.push([node, matches]);
}
}
// must replace stuff after the walker is finished
// otherwise replacing a node will halt the walker
for(var nn=0,hln=highlights.length; nn<hln; nn++){
var node = highlights[nn][0]
, matches = highlights[nn][1]
, parts = node.nodeValue.split(expSplit) // split on matches
, frag = document.createDocumentFragment(); // temporary holder
// add text + highlighted parts in between
// like a .join() but with elements :)
for(var i=0,ln=parts.length; i<ln; i++){
// non-highlighted text
if (parts[i].length)
frag.appendChild(document.createTextNode(parts[i]));
// highlighted text
// skip last iteration
if (i < ln-1){
var h = document.createElement('span');
h.className = options.className;
h[txtProp] = matches[i];
frag.appendChild(h);
}
}
// replace the original text node
node.parentNode.replaceChild(frag, node);
};
});
};
$.fn.jhighlight.defaults = {
find:'',
className:'jhighlight',
color:'#FFF77B',
caseSensitive:false,
wrappingTag:'span'
};
})(jQuery);
If you're doing any manipulation on the page, you might want to replace the caching with another clean-up mechanism, not trivial though.
You can see the code working here: http://jsbin.com/anace5/2/
You also need to add display:block to your new html elements, the layout is broken on a few browsers.
In the javascript code prettifier, I had this problem. I wanted to search the text but preserve tags.
What I did was start with HTML, and decompose that into two bits.
The text content
Pairs of (index into text content where a tag occurs, the tag content)
So given
Lorem <b>ipsum</b>
I end up with
text = 'Lorem ipsum'
tags = [6, '<b>', 10, '</b>']
which allows me to search on the text, and then based on the result start and end indices, produce HTML including only the tags (and only balanced tags) in that range.
Have a look here: getElementsByTagName() equivalent for textNodes.
You can probably adapt one of the proposed solutions to your needs (i.e. iterate over all text nodes, replacing the words as you go - this won't work in cases such as <tag>wo</tag>rd but it's better than nothing, I guess).
I believe you could just do:
$('#article :not(:has(*))').jhighlight({find : 'class'});
Since it grabs all leaf nodes in the article it would require valid xhtml, that is, it would only match link in the following example:
<p>This is some paragraph content with a link</p>
DOM traversal / selector application could slow things down a bit so it might be good to do:
article_nodes = article_nodes || $('#article :not(:has(*))');
article_nodes.jhighlight({find : 'class'});
May be something like that could be helpful
>+[^<]*?(s(<[\s\S]*?>)?e(<[\s\S]*?>)?e)[^>]*?<+
The first part >+[^<]*? finds > of the last preceding tag
The third part [^>]*?<+ finds < of the first subsequent tag
In the middle we have (<[\s\S]*?>)? between characters of our search phrase (in this case - "see").
After regular expression searching you could use the result of the middle part to highlight search phrase for user.

Categories