A better way to extract innerText around getSelection()

A better way to extract innerText around getSelection() - javascript

I'm working on a Chrome extension that will extract a plain text url from a selection,
but the selection will always be a fraction of the url, so I need to scan left and right.
I basically need to extract 500 characters around the selection from innerText, since
I don't want to parse html elements, and also don't want to re-assemble textContent's.
Here's what I have so far, it's working quite well, but I feel there's a better way
that doesn't modify the document object and then restore it back to original state ...
// create a random 8 digit number - or bigger since we don't want duplicates
var randId = Math.floor(Math.random()*(90000000)+10000000);
var selection = document.getSelection();
var selectionRange = selection.getRangeAt(0);
// backup content, and then clone selection the user has made
var selectionContents = selectionRange.cloneContents();
var selectionContentsTxt = selectionContents.textContent;
// add random number to previous selection, this will expand selection
selectionContents.textContent = randId;
selectionRange.insertNode(selectionContents);
// find the random number in the document (in innerText in document.body)
var everything = document.body.innerText;
var offset = everything.indexOf(randId);
// restore backed up content to original state,- replace the selection
selectionContents = selectionRange.extractContents();
selectionContents.textContent = selectionContentsTxt;
selectionRange.insertNode(selectionContents);
// now we have the selection location offset in document.body.innerText
everything = document.body.innerText;
var extract = everything.substring(offset-250, offset+250);
extract holds the extracted text, with line breaks, etc, evaluated by the browser.
This will by the way be Chrome only so I don't need cross compatibility at all.
Now I can parse the selection - from middle to the edges to find the url,
note that starting the parse from middle is important so I didn't just extract
an innerText from a parentNode of my getSelection() without knowing the offsets ...
Is there a better way of doing this? maybe copying the entire document to a DOMParser,
but then how do I apply the getSelection from the original document to it ?
------ UPDATE ------
I've simplified the code above (forgot to mention I'm working with mouse clicks so I detect event
click, after a double click to select some text) and now it finds the caretOffset in
document.body.innerText but it still modifies the document and then restores it,
so still thinking of a better way to do this.
I'm wondering if insertNode and deleteContents is bad in some way ?
var randId = Math.floor(Math.random()*(90000000)+10000000);
var range = document.caretRangeFromPoint(event.clientX, event.clientY);
range.insertNode(document.createTextNode(randId));
var everything = document.body.innerText;
var offset = everything.indexOf(randId);
range.deleteContents();
var everything = document.body.innerText;
var extract = everything.substring(offset-250, offset+250);
document.getSelection().anchorNode.parentNode.normalize();
Any thoughts?

Related

Set value of textarea at desired line

So I can append text to a textarea using this method
document.getElementById('myArea').value += msg;
This tacks the new input onto the end of the current input.
Suppose the textarea already contains text. Suppose also that using "=" instead of "+=" and inputting the values textarea already had along with the new ones is not a possible solution in this context
How would one input new text to this textarea on the correct line and in the correct position with respect to the text that is already in place?
Here is a YouTube video demonstrating the problem
https://www.youtube.com/watch?v=GpwEuI3_73I&feature=youtu.be
UPDATE:
Instead of sending one letter at a time, I sent the whole textarea each time a key is pressed. Obviously more computationally taxing, but that's the only solution I have right now. I am still interested in hearing any better solutions if you have one!

I'm assuming you send only the last character typed (as in your original approach), and it is stored in a variable named "newChar".
Take this as pseudo-code, although I hope it does not require many changes to actually work:
// deserialize the text of the target textearea
var txt = targetTextarea.text;
var txtAsArray = txt.split(/\r?\n/);
var txtLine = txtAsArray[cursorRowNum];
// write the new character in the right position (but in memory)
txtLine = txtLine.substr(0, cursorColNum) + newChar + txtLine.substr(cursorColNum);
// now serialize the text back and update the target textarea
txtAsArray[cursorRowNum] = txtLine;
txt = txtAsArray.join("\n");
targetTextarea.text = txt;
A reference used was: How in node to split string by newline ('\n')?
Regarding performance, there is no additional network activity here, and we are accessing the DOM only twice (first and last line). Remember than accessing the DOM is around 100 times slower than plain variables in memory as shown by http://www.phpied.com/dom-access-optimization/ .
That "txt = txtAsArray.join("\n");" might need to be "txt = txtAsArray.join("\r\n");" on Windows. Detecting if you are in one or the other is explained at How to find the operating system version using JavaScript as pointed by Angel Joseph Piscola.

Hi this will add text to existing text in textarea
i have try that
var msg = "Hi How are you ?";
document.getElementById('myArea').value += msg;

How to limit Javascript's window.find to a particular DIV?

Is it possible to use Javascript in Safari/Firefox/Chrome to search a particular div container for a given text string. I know you can use window.find(str) to search the entire page but is it possible to limit the search area to the div only?
Thanks!

Once you look up your div (which you might do via document.getElementById or any of the other DOM functions, various specs here), you can use either textContent or innerText to find the text of that div. Then you can use indexOf to find the string in that.
Alternately, at a lower level, you can use a recursive function to search through all text nodes in the window, which sounds a lot more complicated than it is. Basically, starting from your target div (which is an Element), you can loop through its childNodes and search their nodeValue string (if they're Texts) or recurse into them (if they're Elements).
The trick is that a naive version would fail to find "foo" in this markup:
<p><span>fo</span>o</p>
...since neither of the two Text nodes there has a nodeValue with "foo" in it (one of them has "fo", the other "o").

Depending on what you are trying to do, there is an interesting way of doing this that does work (does require some work).
First, searching starts at the location where the user last clicked. So to get to the correct context, you can force a click on the div. This will place the internal pointer at the beginning of the div.
Then, you can use window.find as usual to find the element. It will highlight and move toward the next item found. You could create your own dialog and handle the true or false returned by find, as well as check the position. So for example, you could save the current scroll position, and if the next returned result is outside of the div, you restore the scroll. Also, if it returns false, then you can say there were no results found.
You could also show the default search box. In that case, you would be able to specify the starting position, but not the ending position because you lose control.
Some example code to help you get started. I could also try putting up a jsfiddle if there is enough interest.
Syntax:
window.find(aStringToFind, bCaseSensitive, bBackwards, bWrapAround, bWholeWord, bSearchInFrames, bShowDialog);
For example, to start searching inside of myDiv, try
document.getElementById("myDiv").click(); //Place cursor at the beginning
window.find("t", 0, 0, 0, 0, 0, 0); //Go to the next location, no wrap around
You could set a blur (lose focus) event handler to let you know when you leave the div so you can stop the search.
To save the current scroll position, use document.body.scrollTop. You can then set it back if it trys to jump outside of the div.
Hope this helps!
~techdude

As per the other answer you won't be able to use the window.find functionality for this. The good news is, you won't have to program this entirely yourself, as there nowadays is a library called rangy which helps a lot with this. So, as the code itself is a bit too much to copy paste into this answer I will just refer to a code example of the rangy library that can be found here. Looking in the code you will find
searchScopeRange.selectNodeContents(document.body);
which you can replace with
searchScopeRange.selectNodeContents(document.getElementById("content"));
To search only specifically in the content div.

If you are still looking for someting I think I found a pretty nice solution;
Here it is : https://www.aspforums.net/Threads/211834/How-to-search-text-on-web-page-similar-to-CTRL-F-using-jQuery/
And I'm working on removing jQuery (wip) : codepen.io/eloiletagant/pen/MBgOPB
Hope it's not too late :)

You can make use of Window.find() to search for all occurrences in a page and Node.contains() to filter out unsuitable search results.
Here is an example of how to find and highlight all occurrences of a string in a particular element:
var searchText = "something"
var container = document.getElementById("specificContainer");
// selection object
var sel = window.getSelection()
sel.collapse(document.body, 0)
// array to store ranges found
var ranges = []
// find all occurrences in a page
while (window.find(searchText)) {
// filter out search results outside of a specific element
if (container.contains(sel.anchorNode)){
ranges.push(sel.getRangeAt(sel.rangeCount - 1))
}
}
// remove selection
sel.collapseToEnd()
// Handle ranges outside of the while loop above.
// Otherwise Safari freezes for some reason (Chrome doesn't).
if (ranges.length == 0){
alert("No results for '" + searchText + "'")
} else {
for (var i = 0; i < ranges.length; i++){
var range = ranges[i]
if (range.startContainer == range.endContainer){
// Range includes just one node
highlight(i, range)
} else {
// More complex case: range includes multiple nodes
// Get all the text nodes in the range
var textNodes = getTextNodesInRange(
range.commonAncestorContainer,
range.startContainer,
range.endContainer)
var startOffset = range.startOffset
var endOffset = range.endOffset
for (var j = 0; j < textNodes.length; j++){
var node = textNodes[j]
range.setStart(node, j==0? startOffset : 0)
range.setEnd(node, j==textNodes.length-1?
endOffset : node.nodeValue.length)
highlight(i, range)
}
}
}
}
function highlight(index, range){
var newNode = document.createElement("span")
// TODO: define CSS class "highlight"
// or use <code>newNode.style.backgroundColor = "yellow"</code> instead
newNode.className = "highlight"
range.surroundContents(newNode)
// scroll to the first match found
if (index == 0){
newNode.scrollIntoView()
}
}
function getTextNodesInRange(rootNode, firstNode, lastNode){
var nodes = []
var startNode = null, endNode = lastNode
var walker = document.createTreeWalker(
rootNode,
// search for text nodes
NodeFilter.SHOW_TEXT,
// Logic to determine whether to accept, reject or skip node.
// In this case, only accept nodes that are between
// <code>firstNode</code> and <code>lastNode</code>
{
acceptNode: function(node) {
if (!startNode) {
if (firstNode == node){
startNode = node
return NodeFilter.FILTER_ACCEPT
}
return NodeFilter.FILTER_REJECT
}
if (endNode) {
if (lastNode == node){
endNode = null
}
return NodeFilter.FILTER_ACCEPT
}
return NodeFilter.FILTER_REJECT
}
},
false
)
while(walker.nextNode()){
nodes.push(walker.currentNode);
}
return nodes;
}
For the Range object, see https://developer.mozilla.org/en-US/docs/Web/API/Range.
For the TreeWalker object, see https://developer.mozilla.org/en-US/docs/Web/API/TreeWalker

var elements = [];
$(document).find("*").filter(function () {
if($(this).text().contains(yourText))
elements.push($(this));
});
console.log(elements);
I didn't try it, but according the jQuery documentation it should work.

Here is how I am doing with jquery:
var result = $('#elementid').text().indexOf('yourtext') > -1
it will return true or false

Maybe you are trying to not use jquery...but if not, you can use this $('div:contains(whatyouarelookingfor)') the only gotcha is that it could return parent elements that also contain the child div that matches.

Regex to search html return, but not actual html jQuery

I'm making a highlighting plugin for a client to find things in a page and I decided to test it with a help viewer im still building but I'm having an issue that'll (probably) require some regex.
I do not want to parse HTML, and im totally open on how to do this differently, this just seems like the the best/right way.
http://oscargodson.com/labs/help-viewer
http://oscargodson.com/labs/help-viewer/js/jquery.jhighlight.js
Type something in the search... ok, refresh the page, now type, like, class or class=" or type <a you'll notice it'll search the actual HTML (as expected). How can I only search the text?
If i do .text() it'll vaporize all the HTML and what i get back will just be a big blob of text, but i still want the HTML so I dont lose formatting, links, images, etc. I want this to work like CMD/CTRL+F.
You'd use this plugin like:
$('article').jhighlight({find:'class'});
To remove them:
.jhighlight('remove')
==UPDATE==
While Mike Samuel's idea below does in fact work, it's a tad heavy for this plugin. It's mainly for a client looking to erase bad words and/or MS Word characters during a "publishing" process of a form. I'm looking for a more lightweight fix, any ideas?

You really don't want to use eval, mess with innerHTML or parse the markup "manually". The best way, in my opinion, is to deal with text nodes directly and keep a cache of the original html to erase the highlights. Quick rewrite, with comments:
(function($){
$.fn.jhighlight = function(opt) {
var options = $.extend($.fn.jhighlight.defaults, opt)
, txtProp = this[0].textContent ? 'textContent' : 'innerText';
if ($.trim(options.find.length) < 1) return this;
return this.each(function(){
var self = $(this);
// use a cache to clear the highlights
if (!self.data('htmlCache'))
self.data('htmlCache', self.html());
if(opt === 'remove'){
return self.html( self.data('htmlCache') );
}
// create Tree Walker
// https://developer.mozilla.org/en/DOM/treeWalker
var walker = document.createTreeWalker(
this, // walk only on target element
NodeFilter.SHOW_TEXT,
null,
false
);
var node
, matches
, flags = 'g' + (!options.caseSensitive ? 'i' : '')
, exp = new RegExp('('+options.find+')', flags) // capturing
, expSplit = new RegExp(options.find, flags) // no capturing
, highlights = [];
// walk this wayy
// and save matched nodes for later
while(node = walker.nextNode()){
if (matches = node.nodeValue.match(exp)){
highlights.push([node, matches]);
}
}
// must replace stuff after the walker is finished
// otherwise replacing a node will halt the walker
for(var nn=0,hln=highlights.length; nn<hln; nn++){
var node = highlights[nn][0]
, matches = highlights[nn][1]
, parts = node.nodeValue.split(expSplit) // split on matches
, frag = document.createDocumentFragment(); // temporary holder
// add text + highlighted parts in between
// like a .join() but with elements :)
for(var i=0,ln=parts.length; i<ln; i++){
// non-highlighted text
if (parts[i].length)
frag.appendChild(document.createTextNode(parts[i]));
// highlighted text
// skip last iteration
if (i < ln-1){
var h = document.createElement('span');
h.className = options.className;
h[txtProp] = matches[i];
frag.appendChild(h);
}
}
// replace the original text node
node.parentNode.replaceChild(frag, node);
};
});
};
$.fn.jhighlight.defaults = {
find:'',
className:'jhighlight',
color:'#FFF77B',
caseSensitive:false,
wrappingTag:'span'
};
})(jQuery);
If you're doing any manipulation on the page, you might want to replace the caching with another clean-up mechanism, not trivial though.
You can see the code working here: http://jsbin.com/anace5/2/
You also need to add display:block to your new html elements, the layout is broken on a few browsers.

In the javascript code prettifier, I had this problem. I wanted to search the text but preserve tags.
What I did was start with HTML, and decompose that into two bits.
The text content
Pairs of (index into text content where a tag occurs, the tag content)
So given
Lorem <b>ipsum</b>
I end up with
text = 'Lorem ipsum'
tags = [6, '<b>', 10, '</b>']
which allows me to search on the text, and then based on the result start and end indices, produce HTML including only the tags (and only balanced tags) in that range.

Have a look here: getElementsByTagName() equivalent for textNodes.
You can probably adapt one of the proposed solutions to your needs (i.e. iterate over all text nodes, replacing the words as you go - this won't work in cases such as <tag>wo</tag>rd but it's better than nothing, I guess).

I believe you could just do:
$('#article :not(:has(*))').jhighlight({find : 'class'});
Since it grabs all leaf nodes in the article it would require valid xhtml, that is, it would only match link in the following example:
<p>This is some paragraph content with a link</p>
DOM traversal / selector application could slow things down a bit so it might be good to do:
article_nodes = article_nodes || $('#article :not(:has(*))');
article_nodes.jhighlight({find : 'class'});

May be something like that could be helpful
>+[^<]*?(s(<[\s\S]*?>)?e(<[\s\S]*?>)?e)[^>]*?<+
The first part >+[^<]*? finds > of the last preceding tag
The third part [^>]*?<+ finds < of the first subsequent tag
In the middle we have (<[\s\S]*?>)? between characters of our search phrase (in this case - "see").
After regular expression searching you could use the result of the middle part to highlight search phrase for user.

Calculating text selection offsets in nest elements in Javascript

The Problem
I am trying to figure out the offset of a selection from a particular node with javascript.
Say I have the following HTML
<p>Hi there. This <strong>is blowing my mind</strong> with difficulty.</p>
If I select from blowing to difficulty, it gives me the offset from the #text node inside of the <strong>. I need the string offset from the <p>'s innerHTML and the length of the selection. In this case, the offset would be 26 and the length would be 40.
My first thought was to do something with string offsets, etc. but you could easily have something like
<p> Hi there. This <strong>is awesome</strong>. For real. It <strong>is awesome</strong>.</p>
which would break that method because there are identical nodes. I also need the option to throw out nodes. Say I have something like this
<p>Hi there. This <strong>is blowing my mind</strong> with difficulty.</p>
I want to throw out an elements with rel="inserted" when I do the calculation. I still want 26 and 40 as the result.
What I'm looking for
The solution needs to be recursive. If there was a <span> with a <strong> in it, it would still need to traverse to the <p>.
The solution needs to remove the length of any element with rel="inserted". The contents are important, but the tags themselves are not. All other tags are important. I'd strongly prefer not to remove any elements from the DOM when I do all of this.
I am using document.getSelection() to get the selection object. This solution only has to work in WebKit. jQuery is an option, but I'd prefer to it without it if possible.
Any ideas would be greatly appreciated.
I have no control over the HTML I doing all of this on.

I think I solved my issue. I ended not calculating the offset like I originally planned. I am storing the "path" from the chunk (aka <p>). Here is the code:
function isChunk(node) {
if (node == undefined || node == null) {
return false;
}
return node.nodeName == "P";
}
function pathToChunk(node) {
var components = new Array();
// While the last component isn't a chunk
var found = false;
while (found == false) {
var childNodes = node.parentNode.childNodes;
var children = new Array(childNodes.length);
for (var i = 0; i < childNodes.length; i++) {
children[i] = childNodes[i];
}
components.unshift(children.indexOf(node));
if (isChunk(node.parentNode) == true) {
found = true
} else {
node = node.parentNode;
}
}
return components.join("/");
}
function nodeAtPathFromChunk(chunk, path) {
var components = path.split("/");
var node = chunk;
for (i in components) {
var component = components[i];
node = node.childNodes[component];
}
return node;
}
With all of that, you can do something like this:
var p = document.getElementsByTagName('p')[0];
var piece = nodeAtPathFromChunk(p, "1/0"); // returns desired node
var path = pathToChunk(piece); // returns "1/0"
Now I just need to expand all of that to support the beginning and the end of a selection. This is a great building block though.

What does this offset actually mean? An offset within the innerHTML of an element is going to be extremely fragile: any insertion of a new node or change to an attribute of an element preceding the point in the document the offset represents is going to make that offset invalid.
I strongly recommend using the browser's built-in support for this in the form of DOM Range. You can get hold of a range representing the current selection as follows:
var range = window.getSelection().getRangeAt(0);
If you're going to be manipulating the DOM based on this offset that you want, you're best off doing so using nodes instead of string representations of those nodes.

you can use the following java script code:
var text = window.getSelection();
var start = text.anchorOffset;
alert(start);
var end = text.focusOffset - text.anchorOffset;
alert(end);

Just check if your selected element is a paragraph, and if not use something like Prototype's Element.up() method to select the first paragraph parent.
For example:
if(selected_element.nodeName != 'P') {
parent_paragraph = $(selected_element).up('p');
}
Then just find the difference between the parent_paragraph's text offset and your selected_element's text offset.

Maybe you could use the jQuery selectors to ignore the rel="inserted"?
$('a[rel!=inserted]').doSomething();
http://api.jquery.com/attribute-not-equal-selector/
What code are you using now to select from blowing to difficulty?

How do I force the browser to render the tags rather than reveal them to the user on altered div:innerHTML?

I'm allowing the user to select text contained within <div></div> and change it to bolded text. In other words from <div>this is some text</div> to <div>this is <b>some</b> text</div>. All is working except that when I change the div.innerHTML to this is <b>some</b> text, the <b>some</b> tags are shown to the user rather than being rendered as HTML and displaying some bolded. This is all happening client side with Javascript.
How do I force the browser to render the tags rather than reveal them to the user?
Per request, here is the code...
HTML...
<div id="blob">
One simple, but not very efficient implementation of a dictionary is a linked
list. In this implementation all operations take linear time in the worst case
(and even in the average case), assuming that insertions first check whether the
item is in the current list. A more scalable implementation of a dictionary is a
balanced search tree. In this lecture note we present two even more efficient data
structures based on hashing.
</div>
Javascript...
tagText(document.getElementById("blob"),"<b>","</b>");
and...
//======================================================================
function tagText(el,tagstart,tagend)
{
var range = window.getSelection().getRangeAt(0);
var rtxt = range.startContainer.textContent;
var rlen = rtxt.length;
var start = range.startOffset;
var stop = range.endOffset;
var result = rtxt.substring(0,start) + tagstart + rtxt.substring(start,stop) + tagend + rtxt.substring(stop,rlen);
// el.innerHTML = result;
range.startContainer.textContent = result;
var txt = el.innerHTML;
el.innerHTML = txt;
}
//======================================================================
Looking at the div:innerHTML via firebug shows that the tags are escaped <b> rather than <b>.

That's not what's causing your problem, but..
Isn't this wrong?
var txt = el.innerHTML;
el.innerHTML = txt;
Edit:
Try this:
<div id="blob">
One simple, but not very efficient implementation of a dictionary is a linked
list. In this implementation all operations take linear time in the worst case
(and even in the average case), assuming that insertions first check whether the
item is in the current list. A more scalable implementation of a dictionary is a
balanced search tree. In this lecture note we present two even more efficient data
structures based on hashing.
</div>
<input type="button" value="sample" onclick='tagText();'>
and...
<script>
function tagText() {
var range = window.getSelection().getRangeAt(0);
range.surroundContents(document.createElement("b"));
}
</script>

I believe the problem is this line:
range.startContainer.textContent = result;
You aren't actually setting the innerHTML of the div element, you are setting the text of the container of the range, which will not interpret tags as HTML. Instead, try setting the inner HTML of the div directly from your results and clear the range.
For clarrification - this code will actually conver the selected content to bold.
var rtxt = range.startContainer.textContent;
var rlen = rtxt.length;
var start = range.startOffset;
var stop = range.endOffset;
var result = rtxt.substring(0,start) + tagstart + rtxt.substring(start,stop) + tagend + rtxt.substring(stop,rlen);
el.innerHTML = result;
If you want to maintain the selection then you will need to programtically reset the range.

In jQuery, that'd be $('#blob').wrap('<b></b>');. You don't even need to write a helper function then. Seriously, use a library. Don't waste your time figuring out low-level stuff.

We Keep Coding

JavaScript is the programming language of the Web.