Javascript .replace command replace page text?

Javascript .replace command replace page text? - javascript

Can the JavaScript command .replace replace text in any webpage? I want to create a Chrome extension that replaces specific words in any webpage to say something else (example cake instead of pie).

The .replace method is a string operation, so it's not immediately simple to run the operation on HTML documents, which are composed of DOM Node objects.
Use TreeWalker API
The best way to go through every node in a DOM and replace text in it is to use the document.createTreeWalker method to create a TreeWalker object. This is a practice that is used in a number of Chrome extensions!
// create a TreeWalker of all text nodes
var allTextNodes = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT),
// some temp references for performance
tmptxt,
tmpnode,
// compile the RE and cache the replace string, for performance
cakeRE = /cake/g,
replaceValue = "pie";
// iterate through all text nodes
while (allTextNodes.nextNode()) {
tmpnode = allTextNodes.currentNode;
tmptxt = tmpnode.nodeValue;
tmpnode.nodeValue = tmptxt.replace(cakeRE, replaceValue);
}
To replace parts of text with another element or to add an element in the middle of text, use DOM splitText, createElement, and insertBefore methods, example.
See also how to replace multiple strings with multiple other strings.
Don't use innerHTML or innerText or jQuery .html()
// the innerHTML property of any DOM node is a string
document.body.innerHTML = document.body.innerHTML.replace(/cake/g,'pie')
It's generally slower (especially on mobile devices).
It effectively removes and replaces the entire DOM, which is not awesome and could have some side effects: it destroys all event listeners attached in JavaScript code (via addEventListener or .onxxxx properties) thus breaking the functionality partially/completely.
This is, however, a common, quick, and very dirty way to do it.

Ok, so the createTreeWalker method is the RIGHT way of doing this and it's a good way. I unfortunately needed to do this to support IE8 which does not support document.createTreeWalker. Sad Ian is sad.
If you want to do this with a .replace on the page text using a non-standard innerHTML call like a naughty child, you need to be careful because it WILL replace text inside a tag, leading to XSS vulnerabilities and general destruction of your page.
What you need to do is only replace text OUTSIDE of tag, which I matched with:
var search_re = new RegExp("(?:>[^<]*)(" + stringToReplace + ")(?:[^>]*<)", "gi");
gross, isn't it. you may want to mitigate any slowness by replacing some results and then sticking the rest in a setTimeout call like so:
// replace some chunk of stuff, the first section of your page works nicely
// if you happen to have that organization
//
setTimeout(function() { /* replace the rest */ }, 10);
which will return immediately after replacing the first chunk, letting your page continue with its happy life. for your replace calls, you're also going to want to replace large chunks in a temp string
var tmp = element.innerHTML.replace(search_re, whatever);
/* more replace calls, maybe this is in a for loop, i don't know what you're doing */
element.innerHTML = tmp;
so as to minimize reflows (when the page recalculates positioning and re-renders everything). for large pages, this can be slow unless you're careful, hence the optimization pointers. again, don't do this unless you absolutely need to. use the createTreeWalker method zetlen has kindly posted above..

have you tryed something like that?
$('body').html($('body').html().replace('pie','cake'));

Related

Whats the best way to insert content into an HTML document with JavaScript

I was wondering sense there were so many ways to add content dynamically, which way is better.
I know of only of three methods
that add to a body's text node
document.body.innerHTML+="<div>"+myContent+"</div>";
add to the current text node
document.write("<div>"+myContent+"</div>");
and add a whole new node
//case based but for example purpose
var node = document.createElement("div");
node.appendChild(document.createTextNode(myContent));
document.body.appendChild(node);
The third example is obviously more lines. Which is why I'm wondering why I should even consider it. Given download times tend to be worse off then parse times?

document.body.innerHTML+="<div>"+myContent+"</div>";
Destroys any existing event handlers and form data, and creates new elements from the generated source code.
Doesn't safely escape myContent.
document.write("<div>"+myContent+"</div>");
Wipes out the entire document if it is in a closed state
Doesn't safely escape myContent.
var node = document.createElement("div");
node.appendChild(document.createTextNode(myContent));
document.body.appendChild(node);
Verbose, but safe
Option 3 is usually the best.
The third example is obviously more lines. Which is why I'm wondering why I should even consider it. Given download times tend to be worse off then parse times?
Micro-optimisations generally aren't worth the effort. Most of the size difference will be eliminated by HTTP compression anyway.

1) document.body.innerHTML+="<div>"+myContent+"</div>";
Useful when you inserting static HTML (Event Listener not required) in to specific <element>. In this case performance is good.
2) document.write("<div>"+myContent+"</div>");
I will not recommend this way to do. This will overwrite full page.
3) Useful in case Event Listener adding dynamically.
var node = document.createElement("div");
// You can Add Event here.
node.addEventListener("click", function(){ alert("Event Added"); });
node.appendChild(document.createTextNode(myContent));
document.body.appendChild(node);

Convert textNode content to a string

Having problem with a textNode that I can't convert to a string.
I'm trying to scrape a site and get certain information out from it, and when I use an XPath to find this text I'm after I get an textNode back.
When I look in google development tool in chrome, I can se that the textNode itself contain the text I'm after, but how do I convert the textNode to plain text?
here is the line of code I use:
abstracts = ZU.xpath(doc, '//*[#id="abstract"]/div/div/par/text()');
I have tried to use stuff like .innerHTML, toString, textContent but nothing have worked so far.

I usually use Text.wholeText if I want to see the content string of a textNode, because textNode is an object so using toString or innerHTML will not work because it is an object not as the string itself...
Example: from https://developer.mozilla.org/en-US/docs/Web/API/Text/wholeText
The Text.wholeText read-only property returns the full text of all Text nodes logically adjacent to the node. The text is concatenated in document order. This allows to specify any text node and obtain all adjacent text as a single string.
Syntax
str = textnode.wholeText;
Notes and example:
Suppose you have the following simple paragraph within your webpage (with some whitespace added to aid formatting throughout the code samples here), whose DOM node is stored in the variable para:
<p>Thru-hiking is great! <strong>No insipid election coverage!</strong>
However, <a href="http://en.wikipedia.org/wiki/Absentee_ballot">casting a
ballot</a> is tricky.</p>
You decide you don’t like the middle sentence, so you remove it:
para.removeChild(para.childNodes[1]);
Later, you decide to rephrase things to, “Thru-hiking is great, but casting a ballot is tricky.” while preserving the hyperlink. So you try this:
para.firstChild.data = "Thru-hiking is great, but ";
All set, right? Wrong! What happened was you removed the strong element, but the removed sentence’s element separated two text nodes. One for the first sentence, and one for the first word of the last. Instead, you now effectively have this:
<p>Thru-hiking is great, but However, <a
href="http://en.wikipedia.org/wiki/Absentee_ballot">casting a
ballot</a> is tricky.</p>
You’d really prefer to treat all those adjacent text nodes as a single one. That’s where wholeText comes in: if you have multiple adjacent text nodes, you can access the contents of all of them using wholeText. Let’s pretend you never made that last mistake. In that case, we have:
assert(para.firstChild.wholeText == "Thru-hiking is great! However, ");
wholeText is just a property of text nodes that returns the string of data making up all the adjacent (i.e. not separated by an element boundary) text nodes combined.
Now let’s return to our original problem. What we want is to be able to replace the whole text with new text. That’s where replaceWholeText() comes in:
para.firstChild.replaceWholeText("Thru-hiking is great, but ");
We’re removing every adjacent text node (all the ones that constituted the whole text) but the one on which replaceWholeText() is called, and we’re changing the remaining one to the new text. What we have now is this:
<p>Thru-hiking is great, but <a
href="http://en.wikipedia.org/wiki/Absentee_ballot">casting a
ballot</a> is tricky.</p>
Some uses of the whole-text functionality may be better served by using Node.textContent, or the longstanding Element.innerHTML; that’s fine and probably clearer in most circumstances. If you have to work with mixed content within an element, as seen here, wholeText and replaceWholeText() may be useful.
More info: https://developer.mozilla.org/en-US/docs/Web/API/Text/wholeText

How can I separately retrieve the HTML that's before and after a child element inside a parent element?

We're writing a web app that relies on Javascript/jQuery. It involves users filling out individual words in a large block of text, kind of like Mad Libs. We've created a sort of HTML format that we use to write the large block of text, which we then manipulate with jQuery as the user fills it out.
Part of a block of text might look like this:
This is a test of the NOUN Broadcast System.
Given that markup, I need to separately retrieve and manipulate the text before and after the inner ; we're calling those the "prefix" and "suffix".
I know that you can't parse HTML with simple string manipulation, but I tried anyway; I tried using split() on the and tags. It seemed simple enough. Unfortunately, Internet Explorer casts all HTML tags to uppercase, so that technique fails. I could write a special case, but the error has taught me to do this the right way.
I know I could simply use extra HTML tags to manually denote the prefix and suffix, but that seems ugly and redundant; I'd like to keep our markup format as lean and readable and writable as possible.
I've looked through the jQuery docs, and can't find a function that does exactly what I need. There are all sorts of functions to add stuff before and after and around and inside elements, but none that I can find to retrieve what's already there. I could remove the inner , but then I don't know how I can tell what came before the deleted element apart from what came after it.
Is there a "right" way to do what I'm trying to do?

With simple string manipulations you can also use Regex.
That should solve your problem.
var array = $('.fillmeout').html().split(/<\/?span>/i);

Use your jQuery API! $('.fillmeout').children() and then you can manipulate that element as required.
http://api.jquery.com/children/

For completeness, I thought I should point out that the cleanest answer is to put the prefix and suffix text in it's own <span> like this and then you can use jQuery selectors and methods to directly access the desired text:
<span class="fillmeout">
<span class="prefix">This is a test of the </span>
<span>NOUN</span>
<span class="suffix"> Broadcast System.</span>
</span>
Then, the code would be as simple as:
var fillme = $(".fillmeout").eq(0);
var prefix = fillme.find(".prefix").text();
var suffix = fillme.find(".suffix").text();
FYI, I would not call this level of simplicity "ugly and redundant" as you theorized. You're using HTML markup to delineate the text into separate elements that you want to separately access. That's just smart, not redundant.
By way of analogy, imagine you have toys of three separate colors (red, white and blue) and they are initially organized by color and you know that sometime in the future you are going to need to have them separated by color again. You also have three boxes to store them in. You can either put them all in one box now and manually sort them out by color again later or you can just take the already separated colors and put them each into their own box so there's no separation work to do later. Which is easier? Which is smarter?
HTML elements are like the boxes. They are containers for your text. If you want the text separated out in the future, you might as well put each piece of text into it's own named container so it's easy to access just that piece of text in the future.

Several of these answers almost got me what I needed, but in the end I found a function not mentioned here: .contents(). It returns an array of all child nodes, including text nodes, that I can then iterate over (recursively if needed) to find what I need.

I'm not sure if this is the 'right' way either, but you could replace the SPANs with an element you could consistently split the string on:
jQuery('.fillmeout span').replaceWith('|');
http://api.jquery.com/replaceWith/
http://jsfiddle.net/mdarnell/P24se/

You could use
$('.fillmeout span').get(0).previousSibling.textContent
$('.fillmeout span').get(0).nextSibling.textContent
This works in IE9, but sadly not in IE versions smaller than 9.

Based on your example, you could use your target as a delimiter to split the sentence.
var str = $('.fillmeout').html();
str = str.split('<span>NOUN</span>');
This would return an array of ["This is a test of the ", " Broadcast System."]. Here's a jsFiddle example.

You could just use the nextSibling and previousSibling native JavaScript (coupled with jQuery selectors):
$('.fillmeout span').each(
function(){
var prefix = this.previousSibling.nodeValue,
suffix = this.nextSibling.nodeValue;
});
JS Fiddle proof of concept.
References:
each().
node.nextSibling.
node.previousSibling.

If you want to use the DOM instead of parsing the HTML yourself and you can't put the desired text in it's own elements, then you will need to look through the DOM for text nodes and find the text nodes before and after the span tag.
jQuery isn't a whole lot of help when dealing with text nodes instead of element nodes so the work is mostly done in plain javascript like this:
$(".fillmeout").each(function() {
var node = this.firstChild, prefix = "", suffix = "", foundSpan = false;
while (node) {
if (node.nodeType == 3) {
// if text node
if (!foundSpan) {
prefix += node.nodeValue;
} else {
suffix += node.nodeValue;
}
} else if (node.nodeType == 1 && node.tagName == "SPAN") {
// if element and span tag
foundSpan = true;
}
node = node.nextSibling;
}
// here prefix and suffix are the text before and after the first
// <span> tag in the HTML
// You can do with them what you want here
});
Note: This code does not assume that all text before the span is located in one text node and one text node only. It might be, but it also might not be so it collates all the text nodes together that are before and after the span tag. The code would be simpler if you could just reference one text node on each side, but it isn't 100% certain that that is a safe assumption.
This code also handles the case where there is no text before or after the span.
You can see it work here: http://jsfiddle.net/jfriend00/P9YQ6/

Extract single element from XMLHttpRequest

I am actually making a Sidebar Gadget, (which is AJAX-based) and I am looking for a way to extract a single element from an AJAX Request.
The only way I found yet was to do something like that:
var temp = document.createElement("div");
temp.innerHTML = HttpRequest.innerText;
document.body.appendChild(temp);
temp.innerHTML = document.getElementByID("WantedElement").innerText;
But it is pretty ugly, I would like to extract WantedElement directly from the request without adding it to the actual document...
Thank you!

If you're in control of the data, the way you're doing it is probably the best method. Other answers here have their benefits but also they're all rather flawed. For instance, the querySelector() method is only available to Windows Desktop Gadgets running in IE8 mode on the host machine. Regular expressions are particularly unreliable for parsing HTML and should not be used.
If you're not in control of the data or if the data is not transferred over a secure protocol, you should be more concerned about security than code aesthetics -- you may be introducing potential security risks to the gadget and the host machine by inserting unsanitized HTML into the document. Since gadgets run with user or admin level privileges, the obvious security risk is untrusted source/MITM script injection, leaving a hole for malicious scripts to wreak havoc on the machine it's running on.
One potential solution is to use the htmlfile ActiveXObject:
function getElementFromResponse(divId)
{
var h = new ActiveXObject("htmlfile");
h.open();
// disable activex controls
h.parentWindow.ActiveXObject = function () {};
// write the html to the document
h.write(html);
h.close();
return h.getElementById("divID").innerText;
}
You could also make use of IE8's toStaticHTML() method, but your gadget would need to be running in IE8 mode.

One option would be to use regular expressions:
var str = response.match(/<div id="WantedElement">(.+)<\/div>/);
str[0]; // contents of div
However, if your server response is more complex, I'd suggest you to use a data format like JSON for the response. Then it would be much cleaner to parse at the client side.

You could append the response from XMLHttpRequest inside a hidden div, and then call getElementById to get the desired element. Later remove the div when done with it. Or maybe create a function that handles this for you.
function addNinjaNodeToDOM(html) {
var ninjaDiv = document.createElement("div");
ninjaDiv.innerHTML = html;
ninjaDiv.style.display = 'none';
return ninjaDiv;
}
var wrapper = addNinjaNodeToDOM(HttpRequest.innerText);
var requiredNode = wrapper.getElementById("WantedElement");
// do something with requiredNode
document.body.removeChild(wrapper); // remove when done
The only reason for appending it to the DOM was because getElementById will not work unless its part of the DOM tree. See MDC.
However, you can still run selector and XPath queries on detached DOM nodes. That would save you from having you to append elements to the DOM.
var superNinjaDiv = document.createElement('div');
superNinjaDiv.innerHTML = html;
var requiedNode = superNinjaDiv.querySelector("[id=someId]");

I think using getElementById to lookup the element in this case is not a good approach. This is because of extra steps you have to take to use it. You wrap the element in a DIV, inject in DOM, lookup your element using getElementById and then remove the injected DIV from DOM.
DOM manipulation is expensive and injection might cause unnecessary reflow as well. The problem is that you have a document.getElementById and not a element.getElementById which would allow you to query without injection in the document.
To solve this, using querySelector is an obvious solution which is far more easier. Else, I would suggest using getElementsByClassName if you can and if your element has a class defined.
getElementsByClassName is defined on ELEMENT and hence can be used without injecting the element in DOM.
Hope this helps.

It's somewhat unusual to pass HTML through an AJAX request; normally you pass a JSON string that the client can evaluate directly, and work with that
That being said, I don't think there's a way to parse HTML in javascript the way you want that's cross-browser, but here's a way to do it in Mozilla derivatives:
var r = document.createRange();
r.selectNode(document.body);
var domNode = r.createContextualFragment(HTTPRequest.innerText);

jquery: fastest DOM insertion?

I got this bad feeling about how I insert larger amounts of HTML.
Lets assume we got:
var html="<table>..<a-lot-of-other-tags />..</table>"
and I want to put this into
$("#mydiv")
previously I did something like
var html_obj = $(html);
$("#mydiv").append(html_obj);
Is it correct that jQuery is parsing html to create DOM-Objects ? Well this is what I read somewhere (UPDATE: I meant that I have read, jQuery parses the html to create the whole DOM tree by hand - its nonsense right?!), so I changed my code:
$("#mydiv").attr("innerHTML", $("#mydiv").attr("innerHTML") + html);
Feels faster, is it ? And is it correct that this is equivalent to:
document.getElementById("mydiv").innerHTML += html ? or is jquery doing some additional expensive stuff in the background ?
Would love to learn alternatives as well.

Try the following:
$("#mydiv").append(html);
The other answers, including the accepted answer, are slower by 2-10x: jsperf.
The accepted answer does not work in IE 6, 7, and 8 because you can't set innerHTML of a <table> element, due to a bug in IE: jsbin.

innerHTML is remarkably fast, and in many cases you will get the best results just setting that (I would just use append).
However, if there is much already in "mydiv" then you are forcing the browser to parse and render all of that content again (everything that was there before, plus all of your new content). You can avoid this by appending a document fragment onto "mydiv" instead:
var frag = document.createDocumentFragment();
frag.innerHTML = html;
$("#mydiv").append(frag);
In this way, only your new content gets parsed (unavoidable) and the existing content does not.
EDIT: My bad... I've discovered that innerHTML isn't well supported on document fragments. You can use the same technique with any node type. For your example, you could create the root table node and insert the innerHTML into that:
var frag = document.createElement('table');
frag.innerHTML = tableInnerHtml;
$("#mydiv").append(frag);

What are you attempting to avoid? "A bad feeling" is incredibly vague. If you have heard "the DOM is slow" and decided to "avoid the DOM", then this is impossible. Every method of inserting code into a page, including innerHTML, will result in DOM objects being created. The DOM is the representation of the document in your browser's memory. You want DOM objects to be created.
The reason why people say "the DOM is slow" is because creating elements with document.createElement(), which is the official DOM interface for creating elements, is slower than using the non-standard innerHTML property in some browsers. This doesn't mean that creating DOM objects is bad, it is necessary to create DOM objects, otherwise your code wouldn't do anything at all.

The answer about using a DOM fragment is on the right track. If you have a bunch of html objects that you are constant inserting into the DOM then you will see some speed improvements using the fragment. This post by John Resig explains it pretty well:
http://ejohn.org/blog/dom-documentfragments/

The fastest way to append items
The fastest way to append to the DOM tree is to buffer all of your append in to a single DOM fragment, then append the dom fragment to the dom.
This is the method I use in my game engine.
//Returns a new Buffer object
function Buffer() {
//the framgment
var domFragment = document.createDocumentFragment();
//Adds a node to the dom fragment
function add(node) {
domFragment.appendChild(node);
}
//Flushes the buffer to a node
function flush(targetNode) {
//if the target node is not given then use the body
var targetNode = targetNode || document.body;
//append the domFragment to the target
targetNode.appendChild(domFragment);
}
//return the buffer
return {
"add": add,
"flush": flush
}
}
//to make a buffer do this
var buffer = Buffer();
//to add elements to the buffer do the following
buffer.add(someNode1);
//continue to add elements to the buffer
buffer.add(someNode2);
buffer.add(someNode3);
buffer.add(someNode4);
buffer.add(someN...);
//when you are done adding nodes flush the nodes to the containing div in the dom
buffer.flush(myContainerNode);
Using this object i am able to render ~1000 items to the screen ~40 times a second in firefox 4.
Here's a use case.

For starters, write a script that times how long it takes to do it 100 or 1,000 times with each method.
To make sure the repeats aren't somehow optimized away--I'm no expert on JavaScript engines--vary the html you're inserting every time, say by putting '0001' then '0002' then '0003' in a certain cell of the table.

I create a giant string with and then append this string with jquery.
Works good and fast, for me.

You mention being interested in alternatives. If you look at the listing of DOM-related jQuery plugins you'll find several that are dedicated to programatically generating DOM trees. See for instance SuperFlyDom or DOM Elements Creator; but there are others.

We Keep Coding

JavaScript is the programming language of the Web.