Insert a span around nth-word within a div - javascript

I'm designing a rudimentary spell checker of sorts. Suppose I have a div with the following content:
<div>This is some text with xyz and other text</div>
My spell checker correctly identifies the div (returning a jQuery object entitled current_object) and an index for the word (in the case of the example, 5 (due to starting at zero)).
What I need to do now, is surround this word with a span e.g.
<span class="spelling-error">xyz</span>
Leaving me with the final structure like this:
<div>
This is some text with
<span class="spelling-error">xyz</span>
and other text
</div>
However, I need to do this without altering the existing user selection / moving the caret / invoking methods that do so e.g.
window.getSelection().getRangeAt(0).cloneRange().surroundContents();
In other words, if the user is working on the 4th div in the contenteditable document, my code would identify issues in the other divs (1st - 3rd) while not removing focus from the 4th div.
Many thanks!

You've tagged this post as jQuery but I don't think it's particularly necessary to use it. I've written you an example.
https://jsfiddle.net/so0jrj2b/2/
// Redefine the innerHTML for our spellcheck target
spellcheck.innerHTML = (function(text)
{
// We're using an IIFE here to keep namespaces tidy.
// words is each word in the sentence split apart by text
var words = text.split(" ");
// newWords is our array of words after spellchecking.
var newWords = new Array;
// Loop through the sentences.
for (var i = 0; i < words.length; ++i)
{
// Pull the word from our array.
var word = words[i];
if (i === 5) // spellcheck logic here.
{
// Push this content to the array.
newWords.push("<span class=\"mistake\">" + word + "</span>");
}
else
{
// Push the word back to the array.
newWords.push(word);
}
}
// Return the rejoined text block.
return newWords.join(" ");
})(spellcheck.innerHTML);
Worth noting my usage of an IIFE her can be easily reproduced by moving that logic to its own function declaration to make better use of it.
Be aware you also need to account for punctuation in your spellchecking instances.

Related

Highlight a word in a webpage [duplicate]

I am writing a jquery plugin that will do a browser-style find-on-page search. I need to improve the search, but don't want to get into parsing the html quite yet.
At the moment my approach is to take an entire DOM element and all nested elements and simply run a regex find/replace for a given term. In the replace I will simply wrap a span around the matched term and use that span as my anchor to do highlighting, scrolling, etc. It is vital that no characters inside any html tags are matched.
This is as close as I have gotten:
(?<=^|>)([^><].*?)(?=<|$)
It does a very good job of capturing all characters that are not in an html tag, but I'm having trouble figuring out how to insert my search term.
Input: Any html element (this could be quite large, eg <body>)
Search Term: 1 or more characters
Replace Txt: <span class='highlight'>$1</span>
UPDATE
The following regex does what I want when I'm testing with http://gskinner.com/RegExr/...
Regex: (?<=^|>)(.*?)(SEARCH_STRING)(?=.*?<|$)
Replacement: $1<span class='highlight'>$2</span>
However I am having some trouble using it in my javascript. With the following code chrome is giving me the error "Invalid regular expression: /(?<=^|>)(.?)(Mary)(?=.?<|$)/: Invalid group".
var origText = $('#'+opt.targetElements).data('origText');
var regx = new RegExp("(?<=^|>)(.*?)(" + $this.val() + ")(?=.*?<|$)", 'gi');
$('#'+opt.targetElements).each(function() {
var text = origText.replace(regx, '$1<span class="' + opt.resultClass + '">$2</span>');
$(this).html(text);
});
It's breaking on the group (?<=^|>) - is this something clumsy or a difference in the Regex engines?
UPDATE
The reason this regex is breaking on that group is because Javascript does not support regex lookbehinds. For reference & possible solutions: http://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript.
Just use jQuerys built-in text() method. It will return all the characters in a selected DOM element.
For the DOM approach (docs for the Node interface): Run over all child nodes of an element. If the child is an element node, run recursively. If it's a text node, search in the text (node.data) and if you want to highlight/change something, shorten the text of the node until the found position, and insert a highligth-span with the matched text and another text node for the rest of the text.
Example code (adjusted, origin is here):
(function iterate_node(node) {
if (node.nodeType === 3) { // Node.TEXT_NODE
var text = node.data,
pos = text.search(/any regular expression/g), //indexOf also applicable
length = 5; // or whatever you found
if (pos > -1) {
node.data = text.substr(0, pos); // split into a part before...
var rest = document.createTextNode(text.substr(pos+length)); // a part after
var highlight = document.createElement("span"); // and a part between
highlight.className = "highlight";
highlight.appendChild(document.createTextNode(text.substr(pos, length)));
node.parentNode.insertBefore(rest, node.nextSibling); // insert after
node.parentNode.insertBefore(highlight, node.nextSibling);
iterate_node(rest); // maybe there are more matches
}
} else if (node.nodeType === 1) { // Node.ELEMENT_NODE
for (var i = 0; i < node.childNodes.length; i++) {
iterate_node(node.childNodes[i]); // run recursive on DOM
}
}
})(content); // any dom node
There's also highlight.js, which might be exactly what you want.

Splitting a long phrase into an array

I need to take the phrase
It’s that time of year when you clean out your closets, dust off shelves, and spruce up your floors. Once you’ve taken care of the dust and dirt, what about some digital cleaning? Going through all your files and computers may seem like a daunting task, but we found ways to make the process fairly painless.
and upon pressing a button
split it into an array
iterate over that array at each step
Build SPAN elements as you go, along with the attributes
Add the SPAN elements to the original DIV
Add a click handler to the SPAN elements, or to the DIV, which causes the style on the SPAN to change on mouseover.
So far I had
function splitString(stringToSplit, separator) {
var arrayOfStrings = stringToSplit.split(separator);
print('The original string is: "' + stringToSplit + '"');
print('The separator is: "' + separator + '"');
print("The array has " + arrayOfStrings.length + " elements: ");
for (var i=0; i < arrayOfStrings.length; i++)
print(arrayOfStrings[i] + " / ");
}
var space = " ";
var comma = ",";
splitString(tempestString, space);
splitString(tempestString);
splitString(monthString, comma);
for (var i=0; i < myArray.length; i++)
{
}
var yourSpan = document.createElement('span');
yourSpan.innerHTML = "Hello";
var yourDiv = document.getElementById('divId');
yourDiv.appendChild(yourSpan);
yourSpan.onmouseover = function () {
alert("On MouseOver");
}
and for html I have
The DIV that will serve as your input (and output) is here, with
id="transcriptText":</p>
<div id="transcriptText"> It’s that time of year when you clean out your
closets, dust off shelves, and spruce up your floors. Once you’ve taken
care of the dust and dirt, what about some digital cleaning? Going
through all your files and computers may seem like a daunting task, but
we found ways to make the process fairly painless.</div>
<br>
<div id="divideTranscript" class="button"> Transform the
Transcript! </div>
Any help on how to move one? I have been stuck for quite some time
Well, first off this looks like homework.
That said, I'll try to help without giving you the actual code, since we're not supposed to give actual working solutions to homework. You're splitting the string too many times (once is all that's needed based on the instructions you gave) and you have to actually store the result of the split call somewhere that your other code can use it.
Your instructions say to add attributes to the span, but not which attributes nor what their contents should be.
Your function should follow the instructions:
1) Split the string. Since it doesn't specify on what, I'd assume words. So split it on spaces only and leave the punctuation where it is.
2) with the array of words returned from the split() function, iterate over it like you attempt to, but inside the braces that scope the loop is where you want to concatenate the <span> starting and ending tags around the original word.
3) use the document.createElement() to make that current span into a DOM element. Attach the mouseover and click handlers to it, then appendChild() it to the div.
add the handler to your button to call the above function.
Note that it's possibly more efficient to use the innerHTML() function to insert all the spans at once, but then you have to loop again to add the hover/click handlers.

Replace string - How to replace each word once

I have an xml dictionary as shown below.
<word definition="The primary income-earner in a household"&gtbread winner&lt/word&gt
<word definition="One who wins, or gains by success in competition, contest, or gaming"&gtwinner&lt/word&gt
Whenerver there is a word from dictionary in my html, that word will be replaced with link and definition as title. When link is hovered, user should see the definition.
var allwords = xmlDoc.getElementsByTagName("word");
for (var i=0; i<allwords.length; i++)
{
var name = allwords[i].lastChild.nodeValue;
var linked = '' + allwords[i].lastChild.nodeValue + '';
}
Here is my replacer
function replacer(oldstring, newstring) {
document.body.innerHTML = document.body.innerHTML.replace(oldstring, newstring);
}
But problem is
once bread winner changes to linked form, also winner changes since bread winner includes winner, winner changes twice, and all the code mixes up.
I am asking if there is a way, once bread winner changes winner should not change anymore.
Thanks in advance!
What about something like this:
for (var i=0; i<allwords.length; i++)
{
if(allwords[i].firstChild.name == 'a') {
// This word has been linked already, skip it
}
// your code
}
You need some kind of sentry to prevent processing a term that's already been processed. I'd recommend wrapping the replaced terms with another element (not clear on how your html is structured, so I'm not sure what would work here, but a span would be the simplest way in normal html). Then your logic would just skip replacing words that had a parent element of whatever you decided to wrap it with.
You'll need to iterate through the matching text nodes, and only replace those that don't have an A tag as an ancestor in the DOM.
Try function strtr($text, $fromList, $toList). It should replace each term once.

Regex to search html return, but not actual html jQuery

I'm making a highlighting plugin for a client to find things in a page and I decided to test it with a help viewer im still building but I'm having an issue that'll (probably) require some regex.
I do not want to parse HTML, and im totally open on how to do this differently, this just seems like the the best/right way.
http://oscargodson.com/labs/help-viewer
http://oscargodson.com/labs/help-viewer/js/jquery.jhighlight.js
Type something in the search... ok, refresh the page, now type, like, class or class=" or type <a you'll notice it'll search the actual HTML (as expected). How can I only search the text?
If i do .text() it'll vaporize all the HTML and what i get back will just be a big blob of text, but i still want the HTML so I dont lose formatting, links, images, etc. I want this to work like CMD/CTRL+F.
You'd use this plugin like:
$('article').jhighlight({find:'class'});
To remove them:
.jhighlight('remove')
==UPDATE==
While Mike Samuel's idea below does in fact work, it's a tad heavy for this plugin. It's mainly for a client looking to erase bad words and/or MS Word characters during a "publishing" process of a form. I'm looking for a more lightweight fix, any ideas?
You really don't want to use eval, mess with innerHTML or parse the markup "manually". The best way, in my opinion, is to deal with text nodes directly and keep a cache of the original html to erase the highlights. Quick rewrite, with comments:
(function($){
$.fn.jhighlight = function(opt) {
var options = $.extend($.fn.jhighlight.defaults, opt)
, txtProp = this[0].textContent ? 'textContent' : 'innerText';
if ($.trim(options.find.length) < 1) return this;
return this.each(function(){
var self = $(this);
// use a cache to clear the highlights
if (!self.data('htmlCache'))
self.data('htmlCache', self.html());
if(opt === 'remove'){
return self.html( self.data('htmlCache') );
}
// create Tree Walker
// https://developer.mozilla.org/en/DOM/treeWalker
var walker = document.createTreeWalker(
this, // walk only on target element
NodeFilter.SHOW_TEXT,
null,
false
);
var node
, matches
, flags = 'g' + (!options.caseSensitive ? 'i' : '')
, exp = new RegExp('('+options.find+')', flags) // capturing
, expSplit = new RegExp(options.find, flags) // no capturing
, highlights = [];
// walk this wayy
// and save matched nodes for later
while(node = walker.nextNode()){
if (matches = node.nodeValue.match(exp)){
highlights.push([node, matches]);
}
}
// must replace stuff after the walker is finished
// otherwise replacing a node will halt the walker
for(var nn=0,hln=highlights.length; nn<hln; nn++){
var node = highlights[nn][0]
, matches = highlights[nn][1]
, parts = node.nodeValue.split(expSplit) // split on matches
, frag = document.createDocumentFragment(); // temporary holder
// add text + highlighted parts in between
// like a .join() but with elements :)
for(var i=0,ln=parts.length; i<ln; i++){
// non-highlighted text
if (parts[i].length)
frag.appendChild(document.createTextNode(parts[i]));
// highlighted text
// skip last iteration
if (i < ln-1){
var h = document.createElement('span');
h.className = options.className;
h[txtProp] = matches[i];
frag.appendChild(h);
}
}
// replace the original text node
node.parentNode.replaceChild(frag, node);
};
});
};
$.fn.jhighlight.defaults = {
find:'',
className:'jhighlight',
color:'#FFF77B',
caseSensitive:false,
wrappingTag:'span'
};
})(jQuery);
If you're doing any manipulation on the page, you might want to replace the caching with another clean-up mechanism, not trivial though.
You can see the code working here: http://jsbin.com/anace5/2/
You also need to add display:block to your new html elements, the layout is broken on a few browsers.
In the javascript code prettifier, I had this problem. I wanted to search the text but preserve tags.
What I did was start with HTML, and decompose that into two bits.
The text content
Pairs of (index into text content where a tag occurs, the tag content)
So given
Lorem <b>ipsum</b>
I end up with
text = 'Lorem ipsum'
tags = [6, '<b>', 10, '</b>']
which allows me to search on the text, and then based on the result start and end indices, produce HTML including only the tags (and only balanced tags) in that range.
Have a look here: getElementsByTagName() equivalent for textNodes.
You can probably adapt one of the proposed solutions to your needs (i.e. iterate over all text nodes, replacing the words as you go - this won't work in cases such as <tag>wo</tag>rd but it's better than nothing, I guess).
I believe you could just do:
$('#article :not(:has(*))').jhighlight({find : 'class'});
Since it grabs all leaf nodes in the article it would require valid xhtml, that is, it would only match link in the following example:
<p>This is some paragraph content with a link</p>
DOM traversal / selector application could slow things down a bit so it might be good to do:
article_nodes = article_nodes || $('#article :not(:has(*))');
article_nodes.jhighlight({find : 'class'});
May be something like that could be helpful
>+[^<]*?(s(<[\s\S]*?>)?e(<[\s\S]*?>)?e)[^>]*?<+
The first part >+[^<]*? finds > of the last preceding tag
The third part [^>]*?<+ finds < of the first subsequent tag
In the middle we have (<[\s\S]*?>)? between characters of our search phrase (in this case - "see").
After regular expression searching you could use the result of the middle part to highlight search phrase for user.

Javascript: Whitespace Characters being Removed in Chrome (but not Firefox)

Why would the below eliminate the whitespace around matched keyword text when replacing it with an anchor link? Note, this error only occurs in Chrome, and not firefox.
For complete context, the file is located at: http://seox.org/lbp/lb-core.js
To view the code in action (no errors found yet), the demo page is at http://seox.org/test.html. Copy/Pasting the first paragraph into a rich text editor (ie: dreamweaver, or gmail with rich text editor turned on) will reveal the problem, with words bunched together. Pasting it into a plain text editor will not.
// Find page text (not in links) -> doxdesk.com
function findPlainTextExceptInLinks(element, substring, callback) {
for (var childi= element.childNodes.length; childi-->0;) {
var child= element.childNodes[childi];
if (child.nodeType===1) {
if (child.tagName.toLowerCase()!=='a')
findPlainTextExceptInLinks(child, substring, callback);
} else if (child.nodeType===3) {
var index= child.data.length;
while (true) {
index= child.data.lastIndexOf(substring, index);
if (index===-1 || limit.indexOf(substring.toLowerCase()) !== -1)
break;
// don't match an alphanumeric char
var dontMatch =/\w/;
if(child.nodeValue.charAt(index - 1).match(dontMatch) || child.nodeValue.charAt(index+keyword.length).match(dontMatch))
break;
// alert(child.nodeValue.charAt(index+keyword.length + 1));
callback.call(window, child, index)
}
}
}
}
// Linkup function, call with various type cases (below)
function linkup(node, index) {
node.splitText(index+keyword.length);
var a= document.createElement('a');
a.href= linkUrl;
a.appendChild(node.splitText(index));
node.parentNode.insertBefore(a, node.nextSibling);
limit.push(keyword.toLowerCase()); // Add the keyword to memory
urlMemory.push(linkUrl); // Add the url to memory
}
// lower case (already applied)
findPlainTextExceptInLinks(lbp.vrs.holder, keyword, linkup);
Thanks in advance for your help. I'm nearly ready to launch the script, and will gladly comment in kudos to you for your assistance.
It's not anything to do with the linking functionality; it happens to copied links that are already on the page too, and the credit content, even if the processSel() call is commented out.
It seems to be a weird bug in Chrome's rich text copy function. The content in the holder is fine; if you cloneContents the selected range and alert its innerHTML at the end, the whitespaces are clearly there. But whitespaces just before, just after, and at the inner edges of any inline element (not just links!) don't show up in rich text.
Even if you add new text nodes to the DOM containing spaces next to a link, Chrome swallows them. I was able to make it look right by inserting non-breaking spaces:
var links= lbp.vrs.holder.getElementsByTagName('a');
for (var i= links.length; i-->0;) {
links[i].parentNode.insertBefore(document.createTextNode('\xA0 '), links[i]);
links[i].parentNode.insertBefore(document.createTextNode(' \xA0), links[i].nextSibling);
}
but that's pretty ugly, should be unnecessary, and doesn't fix up other inline elements. Bad Chrome!
var keyword = links[i].innerHTML.toLowerCase();
It's unwise to rely on innerHTML to get text from an element, as the browser may escape or not-escape characters in it. Most notably &, but there's no guarantee over what characters the browser's innerHTML property will output.
As you seem to be using jQuery already, grab the content with text() instead.
var isDomain = new RegExp(document.domain, 'g');
if (isDomain.test(linkUrl)) { ...
That'll fail every second time, because g​lobal regexps remember their previous state (lastIndex): when used with methods like test, you're supposed to keep calling repeatedly until they return no match.
You don't seem to need g (multiple matches) here... but then you don't seem to need regexp here either as a simple String indexOf would be more reliable. (In a regexp, each . in the domain would match any character in the link.)
Better still, use the URL decomposition properties on Location to do a direct comparison of hostnames, rather than crude string-matching over the whole URL:
if (location.hostname===links[i].hostname) { ...
​
// don't match an alphanumeric char
var dontMatch =/\w/;
if(child.nodeValue.charAt(index - 1).match(dontMatch) || child.nodeValue.charAt(index+keyword.length).match(dontMatch))
break;
If you want to match words on word boundaries, and case insensitively, I think you'd be better off using a regex rather than plain substring matching. That'd also save doing four calls to findText for each keyword as it is at the moment. You can grab the inner bit (in if (child.nodeType==3) { ...) of the function in this answer and use that instead of the current string matching.
The annoying thing about making regexps from string is adding a load of backslashes to the punctuation, so you'll want a function for that:
// Backslash-escape string for literal use in a RegExp
//
function RegExp_escape(s) {
return s.replace(/([/\\^$*+?.()|[\]{}])/g, '\\$1')
};
var keywordre= new RegExp('\\b'+RegExp_escape(keyword)+'\\b', 'gi');
You could even do all the keyword replacements in one go for efficiency:
var keywords= [];
var hrefs= [];
for (var i=0; i<links.length; i++) {
...
var text= $(links[i]).text();
keywords.push('(\\b'+RegExp_escape(text)+'\\b)');
hrefs.push[text]= links[i].href;
}
var keywordre= new RegExp(keywords.join('|'), 'gi');
and then for each match in linkup, check which match group has non-zero length and link with the hrefs[ of the same number.
I'd like to help you more, but it's hard to guess without being able to test it, but I suppose you can get around it by adding space-like characters around your links, eg. .
By the way, this feature of yours that adds helpful links on copying is really interesting.

Categories