How safe is it use document.body.innerHTML.replace?

How safe is it use document.body.innerHTML.replace? - javascript

Is running something like:
document.body.innerHTML = document.body.innerHTML.replace('old value', 'new value')
dangerous?
I'm worried that maybe some browsers might screw up the whole page, and since this is JS code that will be placed on sites out of my control, who might get visited by who knows what browsers I'm a little worried.
My goal is only to look for an occurrence of a string in the whole body and replace it.

Definitely potentially dangerous - particularly if your HTML code is complex, or if it's someone else's HTML code (i.e. its a CMS or your creating reusable javascript). Also, it will destroy any eventlisteners you have set on elements on the page.
Find the text-node with XPath, and then do a replace on it directly.
Something like this (not tested at all):
var i=0, ii, matches=xpath('//*[contains(text(),"old value")]/text()');
ii=matches.snapshotLength||matches.length;
for(;i<ii;++i){
var el=matches.snapshotItem(i)||matches[i];
el.wholeText.replace('old value','new value');
}
Where xpath() is a custom cross-browser xpath function along the lines of:
function xpath(str){
if(document.evaluate){
return document.evaluate(str,document,null,6,null);
}else{
return document.selectNodes(str);
}
}

I agree with lucideer, you should find the node containing the text you're looking for, and then do a replace. JS frameworks make this very easy. jQuery for example has the powerful :contains('your text') selector
http://api.jquery.com/contains-selector/

If you want rock solid solution, you should iterate over DOM and find value to replace that way.
However, if 'old value' is a long string that never could be mixed up with tag, attribute or attbibute value you are relatively safe by just doing replace.

Related

Javascript creating an HTML object vs creating an HTML string

Pretty simple question that I couldn't find an answer to, maybe because it's a non-issue, but I'm wondering if there is a difference between creating an HTML object using Javascript or using a string to build an element. Like, is it a better practice to declare any HTML elements in JS as JS objects or as strings and let the browser/library/etc parse them? For example:
jQuery('<div />', {'class': 'example'});
vs
jQuery('<div class="example></div>');
(Just using jQuery as an example, but same question applies for vanilla JS as well.)
It seems like a non-issue to me but I'm no JS expert, and I want to make sure I'm doing it right. Thanks in advance!

They're both "correct". And both are useful at different times for different purposes.
For instance, in terms of page-speed, these days it's faster to just do something like:
document.body.innerHTML = "<header>....big string o' html text</footer>";
The browser will spit it out in an instant.
As a matter of safety, when dealing with user-input, it's safer to build elements, attach them to a documentFragment and then append them to the DOM (or replace a DOM node with your new version, or whatever).
Consider:
var userPost = "My name is Bob.<script src=\"//bad-place.com/awful-things.js\"></script>",
paragraph = "<p>" + userPost + "</p>";
commentList.innerHTML += paragraph;
Versus:
var userPost = "My name is Bob.<script src=\"//bad-place.com/awful-things.js\"></script>",
paragraph = document.createElement("p");
paragraph.appendChild( document.createTextNode(userPost) );
commentList.appendChild(paragraph);
One does bad things and one doesn't.
Of course, you don't have to create textNodes, you could use innerText or textContent or whatever (the browser will create the text node on its own).
But it's always important to consider what you're sharing and how.
If it's coming from anywhere other than a place you trust (which should be approximately nowhere, unless you're serving static pages, in which case, why are you building html?), then you should keep injection in mind -- only the things you WANT to be injected should be.

Either can be preferable depending on your particular scenario—ie, if everything is hard-coded, option 2 is probably better, as #camus said.
One limitation with the first option though, is that this
$("<div data-foo='X' />", { 'class': 'example' });
will not work. That overload expects a naked tag as the first parameter with no attributes at all.
This was reported here

1/ is better if your attribubes depends on variables set before calling the $ function , dont have to concatenate strings and variables. Aside from that fact ,since you can do both , and it's just some js code somebody else wrote , not a C++ DOM API hardcoded in the browser...

Javascript - get strings inside a string

var str = '<div part="1">
<div>
...
<p class="so">text</p>
...
</div>
</div><span></span>';
I got a long string stored in var str, I need to extract the the strings inside div part="1". Can you help me please?

you could create a DOM element and set its innerHTML to your string.
Then you can iterate through the childNodes and read the attributes you want ;)
example
var str = "<your><html>";
var node = document.createElement("div");
node.innerHTML = str;
for(var i = 0; i < node.childNodes.length; i++){
console.log(node.childNodes[i].getAttribute("part"));
}

If you're using a library like JQuery, this is trivially easy without having to go through the horrors of parsing HTML with regex.
Simply load the string into a JQuery object; then you'll be able to query it using selectors. It's as simple as this:
var so = $(str).find('.so');
to get the class='so' elememnt.
If you want to get all the text in part='1', then it would be this:
var part1 = $(str).find('[part=1]').text();
Similar results can be achieved with Prototype library, or others. Without any library, you can still do the same thing using the DOM, but it'll be much harder work.
Just to clarify why it's a bad idea to do this sort of thing in regex:
Yes, it can be done. It is possible to scan a block of HTML code with regex and find things within the string.
However, the issue is that HTML is too variable -- it is defined as a non-regular language (bear in mind that the 'reg' in 'regex' is for 'regular').
If you know that your HTML structure is always going to look the same, it's relatively easy. However if it's ever going to be possible that the incoming HTML might contain elements or attributes other than the exact ones you're expecting, suddenly writing the regex becomes extremely difficult, because regex is designed for searching in predictable strings. When you factor in the possibility of being given invalid HTML code to parse, the difficulty factor increases even more.
With a lot of effort and good understanding of the more esoteric parts of regex, it can be done, with a reasonable degree of reliability. But it's never going to be perfect -- there's always going to be the possibility of your regex not working if it's fed with something it doesn't expect.
By contrast, parsing it with the DOM is much much simpler -- as demonstrated, with the right libraries, it can be a single line of code (and very easy to read, unlike the horrific regex you'd need to write). It'll also be much more efficient to run, and gives you the ability to do other search operations on the same chunk of HTML, without having to re-parse it all again.

Techniques to avoid building HTML strings in JavaScript?

It is very often I come across a situation in which I want to modify, or even insert whole blocks of HTML into a page using JavaScript. Usually it also involves changing several parts of the HTML dynamically depending on certain parameters.
However, it can make for messy/unreadable code, and it just doesn't seem right to have these little snippets of HTML in my JavaScript code, dammit.
So, what are some of your techniques to avoid mixing HTML and JavaScript?

The Dojo toolkit has a quite useful system to deal with HTML fragments/templates. Let's say the HTML snippet mycode/mysnippet.tpl.html is something like the following
<div>
<span dojoAttachPoint="foo"></span>
</div>
Notice the dojoAttachPoint attribute. You can then make a widget mycode/mysnippet.js using the HTML snippet as its template:
dojo.declare("mycode.mysnippet", [dijit._Widget, dijit._Templated], {
templateString: dojo.cache("mycode", "mysnippet.tpl.html"),
construct: function(bar){
this.bar = bar;
},
buildRendering: function() {
this.inherited(arguments);
this.foo.innerHTML = this.bar;
}
});
The HTML elements given attach point attributes will become class members in the widget code. You can then use the templated widget like so:
new mycode.mysnippet("A cup of tea would restore my normality.").placeAt(someOtherDomElement);
A nice feature is that if you use dojo.cache and Dojo's build system, it will insert the HTML template text into the javascript code, so that the client doesn't have to make a separate request.
This may of course be way too bloated for your use case, but I find it quite useful - and since you asked for techniques, there's mine. Sitepoint has a nice article on it too.

There are many possible techniques. Perhaps the most obvious is to have all elements on the page but have them hidden - then your JS can simply unhide them/show them as required. This may not be possible though for certain situations. What if you need to add a number (unspecified) of duplicate elements (or groups of elements)? Then perhaps have the elements in question hidden and using something like jQuery's clone function insert them as required into the DOM.
Alternatively if you really have to build HTML on the fly then definitely make your own class to handle it so you don't have snippets scattered through your code. You could employ jQuery literal creators to help do this.

I'm not sure if it qualifies as a "technique", but I generally tend to avoid constructing blocks of HTML in JavaScript by simply loading the relevant blocks from the back-end via AJAX and using JavaScript to swap them in and out/place them as required. (i.e.: None of the low-level text shuffling is done in JavaScript - just the DOM manipulation.)
Whilst you of course need to allow for this during the design of the back-end architecture, I can't help but think to leads to a much cleaner set up.

Sometimes I utilise a custom method to return a node structure based on provided JSON argument(s), and add that return value to the DOM as required. It ain't accessible once JS is unavailable like some backend solutions could be.

After reading some of the responses I managed to come up with my own solution using Python/Django and jQuery.
I have the HTML snippet as a Django template:
<div class="marker_info">
<p> _info_ </p>
more info...
</div>
In the view, I use the Django method render_to_string to load the templates as strings stored in a dictionary:
snippets = { 'marker_info': render_to_string('templates/marker_info_snippet.html')}
The good part about this is I can still use the template tags, for example, the url function. I use simplejson to dump it as JSON and pass it into the full template. I still wanted to dynamically replace strings in the JavaScript code, so I wrote a function to replace words surrounded by underscores with my own variables:
function render_snippet(snippet, dict) {
for (var key in dict)
{
var regex = new RegExp('_' + key + '_', 'gi');
snippet = snippet.replace(regex, dict[key]);
}
return snippet;
}

Is it possible to get jquery objects from an html string thats not in the DOM?

For example in javascript code running on the page we have something like:
var data = '<html>\n <body>\n I want this text ...\n </body>\n</html>';
I'd like to use and at least know if its possible to get the text in the body of that html string without throwing the whole html string into the DOM and selecting from there.

First, it's a string:
var arbitrary = '<html><body>\nSomething<p>This</p>...</body></html>';
Now jQuery turns it into an unattached DOM fragment, applying its internal .clean() method to strip away things like the extra <html>, <body>, etc.
var $frag = $( arbitrary );
You can manipulate this with jQuery functions, even if it's still a fragment:
alert( $frag.filter('p').get() ); // says "<p>This</p>"
Or of course just get the text content as in your question:
alert( $frag.text() ); // includes "This" in my contrived example
// along with line breaks and other text, etc
You can also later attach the fragment to the DOM:
$('div#something_real').append( $frag );
Where possible, it's often a good strategy to do complicated manipulation on fragments while they're unattached, and then slip them into the "real" page when you're done.

The correct answer to this question, in this exact phrasing, is NO.
If you write something like var a = $("<div>test</div>"), jQuery will add that div to the DOM, and then construct a jQuery object around it.
If you want to do without bothering the DOM, you will have to parse it yourself. Regular expressions are your friend.

It would be easiest, I think, to put that into the DOM and get it from there, then remove it from the DOM again.
Jquery itself is full of tricks like this. It's adding all sorts off stuff into the DOM all the time, including when you build something using $('<p>some html</p>'). So if you went down that road you'd still effectively be placing stuff into the DOM then removing it again, temporarily, except that it'd be Jquery doing it.

John Resig (jQuery author) created a pure JS HTML parser that you might find useful. An example from that page:
var dom = HTMLtoDOM("<p>Data: <input disabled>");
dom.getElementsByTagName("body").length == 1
dom.getElementsByTagName("p").length == 1
Buuuut... This question contains a constraint that I think you need to be more critical of. Rather than working around a hard-coded HTML string in a JS variable, can you not reconsider why it's that way in the first place? WHAT is that hard-coded string used for?
If it's just sitting there in the script, re-write it as a proper object.
If it's the response from an AJAX call, there is a perfectly good jQuery AJAX API already there. (Added: although jQuery just returns it as a string without any ability to parse it, so I guess you're back to square one there.)

Before throwing it in the DOM that is just a plain string.
You can sure use REGEX.

node selection and manipulation out of the dom (What is jQuery's trick ?)

Hi I would like to do dom selection and manipulation out of the dom.
The goal is to build my widget out of the dom and to insert it in the dom only once it is ready.
My issue is that getElementById is not supported on a document fragment. I also tried createElement and cloneNode, but it does not work either.
I am trying to do that in plain js. I am used to do this with jQuery which handles it nicely. I tried to find the trick in jQuery source, but no success so far...
Olivier

I have done something similar, but not sure if it will meet your needs.
Create a "holding area" such as a plain <span id="spanReserve"></span> or <td id="cellReserve"></td>. Then you can do something like this in JS function:
var holdingArea = document.getElementById('spanReserve');
holdingArea.innerHTML = widgetHTMLValue;

jQuery will try to use getElementById first, and if that doesn't work, it'll then search all the DOM elements using getAttribute("id") until it finds the one you need.
For instance, if you built the following DOM structure that isn't attached to the document and it was assigned to the javascript var widget:
<div id="widget">
<p><strong id="target">Hello</strong>, world!</p>
</div>
You could then do the following:
var target;
// Flatten all child elements in the div
all_elements = widget.getElementsByTagName("*");
for(i=0; i < all_elements.length; i++){
if(all_widget_elements[i].getAttribute("id") === "target"){
target = all_widget_elements[i];
break;
}
}
target.innerHTML = "Goodbye";
If you need more than just searching by ID, I'd suggest installing Sizzle rather than duplicating the Sizzle functionality. Assuming you have the ability to install another library.
Hope this helps!

EDIT:
what about something simple along these lines:
DocumentFragment.prototype.getElementById = function(id) {
for(n in this.childNodes){
if(id == n.id){
return n;
}
}
return null;
}
Why not just use jQuery or the selection API in whatever other lib youre using? AFAIK all the major libs support selection on fragments.
If you wan tto skip a larger lib like jQ/Prototype/Dojo/etc.. then you could jsut use Sizzle - its the selector engine that powers jQ and Dojo and its offered as a standalone. If thats out of the question as well then i suppose you could dive in to the Sizzle source and see whats going on. All in all though it seems like alot of effort to avoid a few 100k with the added probaility that the code you come up with is going to be slower runtime wise than all the work pulled into Sizzle or another open source library.
http://sizzlejs.com/
Oh also... i think (guessing) jQ's trick is that elements are not out of the DOM. I could be wrong but i think when you do something like:
$('<div></div>');
Its actually in the DOM document its just not part of the body/head nodes. Could be totally wrong about that though, its just a guess.
So you got me curious haha. I took a look at sizzle.. than answer is - its not using DOM methods. It seems using an algorithm that compares the various DOMNode properties mapped to types of selectors - unless im missing something... which is entirely possible :-)
However as noted below in comments it seems Sizzle DOES NOT work on DocumentFragments... So back to square one :-)

Modern browsers ( read: not IE ) have the querySelector method in Element API. You can use that to get and element by id within a DocumentFragment.
jQuery uses sizzle.js
What it does on DocumentFragments is: deeply loop through all the elements in the fragment checking if an element's attribute( in your case 'id' ) is the one you're looking for. To my knowledge, sizzle.js uses querySelector too, if available, to speed things up.
If you're looking for cross browser compatibility, which you probably are, you will need to write your own method, or check for the querySelector method.

It sounds like you are doing to right things. Not sure why it is not working out.
// if it is an existing element
var node = document.getElementById("footer").cloneNode(true);
// or if it is a new element use
// document.createElement("div");
// Here you would do manipulation of the element, setAttribute, add children, etc.
node.childNodes[1].childNodes[1].setAttribute("style", "color:#F00; font-size:128px");
document.documentElement.appendChild(node)

You really have two tools to work with, html() and using the normal jQuery manipulation operators on an XML document and then insert it in the DOM.
To create a widget, you can use html():
$('#target').html('<div><span>arbitrarily complex JS</span><input type="text" /></div>');
I assume that's not what you want. Therefore, look at the additional behaviors of the jQuery selector: when passed a second parameter, it can be its own XML fragment, and manipulation can happen on those documents. eg.
$('<div />').append('<span>').find('span').text('arbitrarily complex JS'). etc.
All the operators like append, appendTo, wrap, etc. can work on fragments like this, and then they can be inserted into the DOM.
A word of caution, though: jQuery uses the browser's native functions to manipulate this (as far as I can tell), so you do get different behaviors on different browsers. Make sure to well formed XML. I've even had it reject improperly formed HTML fragments. Worst case, though, go back and use string concatenation and the html() method.

We Keep Coding

JavaScript is the programming language of the Web.