innerHTML alternative for retrieving contents of page?

innerHTML alternative for retrieving contents of page? - javascript

I'm currently using innerHTML to retrieve the contents of an HTML element and I've discovered that in some browsers it doesn't return exactly what is in the source.
For example, using innerHTML in Firefox on the following line:
<div id="test"><strong>Bold text</strong></strong></div>
Will return:
<strong>Bold text</strong>
In IE, it returns the original string, with two closing strong tags. I'm assuming in most cases it's not a problem (and may be a benefit) that Firefox cleans up the incorrect code. However, for what I'm trying to accomplish, I need the exact code as it appears in the original HTML source.
Is this at all possible? Is there another Javascript function I can us?

I don't think you can receive incorrect HTML code in modern browsers. And it's right behaviour, because you don't have source of dynamicly generated HTML. For example Firefox' innerHTML returns part of DOM tree represented in string. Not an HTML source. And this is not a problem because second </strong> tag is ignored by the browser anyway.

innerHTML is generated not from the actual source of the document ie. the HTML file but is derived from the DOM object that is rendered by the browser. So if IE somehow shows you incorrect HTML code then it's probably some kind of bug. There is no such method to retrieve the invalid HTML code in every browser.

You can't in general get the original invalid HTML for the reasons Ivan and Andris said.
IE is also “fixing” your code just like Firefox does, albeit in a way you don't notice on serialisation, by creating an Element node with the tagName /strong to correspond to the bogus end-tag. There is no guarantee at all that IE will happen to preserve other invalid markup structures through a parse/serialise cycle.
In fact even for valid code the output of innerHTML won't be exactly the same as the input. Attribute order isn't maintained, tagName case isn't maintained (IE gives you <STRONG>), whitespace is various places is lost, entity references aren't maintained, and so on. If you “need the exact code”, you will have to keep a copy of the exact code, for example in a JavaScript variable in a <script> block written after the content in question.

If you don't need the HTML to render (e.g., you're going to use it as a JS template or something) you can put it in a textarea and retrieve the contents with innerHTML.
<textarea id="myTemplate"><div id="test"><strong>Bold text</strong></strong></div></textarea>
And then:
$('#myTemplate').html() === '<div id="test"><strong>Bold text</strong></strong></div>'
Other than that, the browser gets to decide how to interpret the HTML and it will only return you it's interpretation, not the original.

innerTEXT ? or does that have the same eeffect?

You must use innerXML property. It does exactly what you want to achieve.

Related

Is it syntactically wrong to use the html converter in jsView like so: data-link="html{:property}" instead of data-link="{html:property}"?

We are using the html converter on a lot of our templates that render in jsViews/jsRender. We came across an issue, where jsViews was fumbling on a "Mismatch" error when a tag was in the text it was rendering. We did not notice this, until recently updating to the latest versions. Her is the snippet we were originally using, that is now causing an error:
<div id="Resizable" data-link="html{:Text}"></div>
Now, I noticed on the jsRender APi, it says to handle the tag like the following, and when doing so, it renders the data correctly, encoding the html content as wanted.
<div id="Resizable" data-link="{html:Text}"></div>
My question is this: Was it not setup properly before, and we just never noticed the error, did this change on the latest version, and is the latter way the only correct way to use the html encoder? Any help is greatly appreciated. Thanks!

Here is the documentation topic which explains the syntax for data-linked elements: http://www.jsviews.com/#linked-elem-syntax
See especially the section on Full syntax - multiple targets, multiple tags, multiple bindings... where it says:
The full syntax allows you to bind multiple expressions each to a different target 'attrib', and is written like this: data-link="attrib1{linkExpression1} attrib2{linkExpression2} ..."
And note what it says lower down:
The default target for most elements is innerText, but for inputs and select it is value.
And it gives some examples:
<div data-link="{:name}"></div> (one-way binding to innerText - default target attrib - so automatically HTML encodes).
<div data-link="html{:name}"></div> (one-way binding to innerHTML)
So what this is saying is that the default target for a data-linked div is innerText - which means that if you inject HTML markup it will in effect HTML encode that markup 'for free'. It won't insert HTML tags as inner HTML.
If you did add the HTML converter, you would write it like this <div data-link="{>name}"></div> - but adding HTML encoding when you are inserting as innerText won't change what the user sees.
(An alternative syntax for the same thing would be what you wrote above <div data-link="{html:name}"></div>). See the documentation here on the HTML converter: http://www.jsviews.com/#html.
If you WANT to insert as inner HTML, then you use the HTML target, which the second example above: <div data-link="html{:name}"></div>.
And you could then add encoding as in <div data-link="html{>name}"></div>.
But more likely using the default innerText target and no explicit converter is what you need.
See also this SO response to a similar question How to keep helper function generated HTML tags for JsViews
BTW - no, this should not have changed behavior in the latest version. If you saw a change in behavior, can you add an issue on JsViews GitHub project, ideally with a jsfiddle showing something which rendered differently between the two versions?

After the help from Boris, and looking at the documentation, the answer is clear. It is not syntactically incorrect, but is used in two different ways. One is for encoding the data, and the other is for setting the value to the innerHTML property.
{html:property} ---> encoding
html{:property} ---> use innerHTML at target
html{html:property} ---> This fixed our problem, and was the solution we needed.

Can't create some kinds of nodes from HTML strings using innerHTML

The following javascript (run in the chrome console) does not do what I'd expect:
> var elem = document.createElement("foo");
undefined
> elem.innerHTML = "<tr></tr>"
"<tr></tr>"
> elem.outerHTML
"<foo></foo>"
The <tr> tag has disappeared!
This seems specific to table-related elements. Using <div> or <span> works as expected.
I expect what I'm doing is invalid, as "foo" is not a known element, and presumably table-related elements can only appear within a . Interestingly, the following code works just fine:
> var elem = document.createElement("foo"), tr = document.createElement("tr");
> elem.appendChild(tr);
> elem.outerHTML
"<foo><tr></tr></foo>"
So it seems like the construction itself (a <tr> not within a <table>) is allowable, but the method of using innerHTML to place it there does not work - perhaps this goes through some html cleanup, which removes things that are not strictly, while creating DOM nodes directly is not subject to the same validation.
My question: is there any way to populate an arbitrary DOM node from a string without running into such cleanup / validation issues? My use case will end up with perfectly valid structure (I plan to place this as the child of a sometime later), but the browser is stopping me while I'm trying to build the individual parts.
It sounds a little like DocumentFragment should be what I'm looking for, but as far as I can tell those are only constructable programmatically - they don't support innerHTML.
some background on why I want to do this:
My use case is javascript-based live templating (i.e not outputting html strings, but actual DOM nodes). So the requirements are:
template input must be allowed to be arbitrary HTML (this is why I'm using innerHTML and not constructing nodes programmatically)
it must be possible to create sub-templates that are then attached into a larger document (that's why I can't just create the whole at once).
The second point is how I encountered this bug. My template contains a sub-template.
var row = Html("<tr></tr>");
var table = Html(["<table><thead>", row, "</thead></table>"]);
I will later add code like:
row.append(Html(["<td>", column.header, "</td>"]));
to actually populate the columns. So when it's fully constructed, the html will be valid. But in the intermediate stages, each template / snippet is constructed under a single element. That means that templates like:
Html(["Hello <span>", name, "</span>"]);
still come out as a single node (so that they can be manipulated as a single entity):
<foo>Hello <span>bob</span></foo>
When the template results in only a single child inside the <foo>, the outer node is removed. But during construction, the row template above should look like <foo><tr></tr></foo>. Due to the validation behaviour I'm seeing when using innerHTML it just ends up as <foo></foo>.
I've checked all code works the same in both firefox & chrome, so I don't expect I'm just hitting a browser bug.

Unfortunately the answer to your general question is no, there is no way to use innerHTML to add arbitrarily incomplete HTML fragments. I know this is not the answer you want to hear but that's the way it is.
One of the most misunderstood thing about innerHTML stems from the way the API is designed. It overloads the + and = operators to perform DOM insertion. This tricks programmers into thinking that it is merely doing string operations when in fact innerHTML behaves more like a function rather than a variable. It would be less confusing to people if innerHTML was designed like this:
element.innerHTML('some <b>html</b> here');
unfortunately it's too late to change the API so we must instead understand that it is really an API instead of merely an attribute/variable.
Now, to understand the so called "validation" behavior of innerHTML. When you modify innerHTML it triggers a call to the browser's HTML compiler. It's the same compiler that compiles your html file/document. There's nothing special about the HTML compiler that innerHTML calls. Therefore, whatever you can do to a html file you can pass to innerHTML (the one exception being that embedded javascript don't get executed - probably for security reasons).
This makes sense from the point of view of a browser developer. Why include two separate HTML compilers in the browser? Especially considering the fact that HTML compilers are huge, complex beasts.
The down side to this is that incomplete HTML will be handled the same way it is handled for html documents. In the case of <td> elements not inside a table most browsers will simply strip it away (as you've observed for yourself). That is essentially what you're trying to do - create invalid/incomplete HTML.
There are two work arounds to this:
Extract the table from the page then using string processing (regex et. el.) insert the <td> into the table string then innerHTML the whole table back into the page.
Parse the inserted HTML string and if you find any <td> or <tr> (or <option>) extract out the html element and insert it using DOM methods.
Unfortunately both are quite painful.

Mihai Stancu's comment about jquery made me think: surely jquery manages this if you call $("<tr></tr>"). I know jquery has a shortcut for strings that look like single tags, but it must work for complex HTML as well.
So I took a dive into the jquery source code, and found just the ticket:
https://github.com/jquery/jquery/blob/6a0ee2d9ed34b81d4ad0662423bf815a3110990f/src/manipulation.js#L450
It's using a regex to detect just the name of the first tag in the string, then using this info to figure out what "context" it needs to wrap it in for the innerHTML process to treat it as valid. I think this technique should work for all well-formed inputs.
I've adopted this code into a standalone function which will turn an arbitrary string into a DOM node:
https://gist.github.com/gfxmonk/5299096

Generating/selecting non-standard HTML tags with jQuery, a good idea?

I've noticed that jQuery can create, and access non-existent/non-standard HTML tags. For example,
$('body').append('<fake></fake>').html('blah');
var foo = $('fake').html(); // foo === 'blah'
Will this break in some kind of validation? Is it a bad idea, or are there times this is useful? The main question is, although it can be done, should it be done?
Thanks in advance!

You can use non-standard HTML tags and most of the browsers should work fine, that's why you can use HTML5 tags in browsers that don't recognize them and all you need to do is tell them how to style them (particularly which tags are display: block). But I wouldn't recommend doing it for two reasons: first it breaks validation, and second you may use some tag that will later get added to HTML and suddenly your page stops working in newer browsers.

The biggest issue I see with this is that if you create a tag that's useful to you, who's to say it won't someday become standard? If that happens it may end up playing a role or get styles that you don't anticipate, breaking your code.

The rules of HTML do say that if manipulated through script the result should be valid both before and after the manipulation.
Validation is a means to an end, so if it works for you in some way, then I wouldn't worry too much about it. That said, I wouldn't do it to "sneak" past validation while using something like facebook's <fb:fan /> element - I'd just suck it up and admit the code wasn't valid.

HTML as such allows you to use any markup you like. Browsers may react differently to unknown tags (and don't they to known ones, too?), but the general bottom line is that they ignore unknown tags and try to render their contents instead.
So technically, nothing is stopping you from using <fake> elements (compare what IE7 would do with an HTML5 page and the new tags defined there). HTML standardization has always been an after-the-fact process. Browser vendors invented tags and at some point the line was drawn and it was called HTMLx.
The real question is, if you positively must do it. And if you care whether the W3C validator likes your document or not. Or if you care whether your fellow programmers like your document or not.
If you can do the same and stay within the standard, it's not worth the hassle.

There's really no reason to do something like this. The better way is to use classes like
<p class = "my_class">
And then do something like
$('p.my_class').html('bah');
Edit:
The main reason that it's bad to use fake tags is because it makes your HTML invalid and could screw up the rendering of your page on certain browsers since they don't know how to treat the tag you've created (though most would treat it as some kind of DIV).
That's the main reason this isn't good, it just breaks standards and leads to confusing code that is difficult to maintain because you have to explain what your custom tags are for.
If you were really determined to use custom tags, you could make your web page a valid XML file and then use XSLT to transform the XML into valid HTML. But in this case, I'd just stick with classes.

Must I specify endding tag with jquery.append()?

I am studying somebody else jquery script, and I noticed he is opening a tag without closing it, but it also seems that browsers does not care (Not yet tested with IE)
it is written :
$('#mydiv').append('<ul>')
But there is nowhere a
.append('</ul>')
The script does not close the list, but browsers do it automatically (I just did an 'inspect element' in the browser).
Is that a 'legal' behavior, or one should always close a tag in a javascript autogenerated content ?

To do it properly, you should be appending:
$('#mydiv').append('<ul></ul>')
Yes browsers will handle it (specifically the .innerHTML implementation handles it, not jQuery), at least the major ones, but why not be safe in all cases and use valid markup?
$('#mydiv').append('<ul>')
...still calls .innerHTML, not createElement, only in $('<ul>') is document.createElement() called. As I said originally, the browser handles it with .append(), not jQuery and not document.createElement (which doesn't take syntax like this anyway).
You can see test/play with what I mean here

Short answer: you should.
Long answer that lead to the short answer:
When you say .append('<ul>'),
or even .append('<ul></ul'), behind the scenes jQuery calls document.createElement and the browser knows what to do.
It's not like jQuery actually puts that string of HTML anywhere, but rather parses it and creates the necessary DOM elements
UPDATE-
As Nick pointed out, this might not always be the case. Relevant source: init
If you pass it just ul, it just calls createElement. If the html string is more complicated, it will go into buildFragment which is more complicated than that.
Based on this, I would say the best/fastest way to create a single element thru jQuery, is to do something like
$('<ul>').appendTo($target);
UPDATE 2-
So apparently jQuery only calls createElement in some methods, but append ends up calling clean which has a regex that closes tags. So either way, you're safe, jQuery saves you as usual.
Relevant source:
...
} else if ( typeof elem === "string" ) {
// Fix "XHTML"-style tags in all browsers
elem = elem.replace(rxhtmlTag, "<$1></$2>");
...
UPDATE 3- So it turns out jQuery doens't fix anything for you when you call append, and it just injects the string into a temporary div element. Seems like most browsers know how to deal with the HTML even if not closed properly, but to be save it's probably best to close it yourself! Or if you're feeling lazy, do something like .append($('<ul>')) which doesn't use innerHTML

Regex behaving differently in IE6/IE7

My HTML is something like this :
<select>
<option>ABC (123)</option>
<option>XYZ (789)</option>
</select>
What I'm trying to do is: using JQuery and a regex, I replace the "(" with <span>( here's my JQuery line:
$(this).html($(this).html().replace(/\(/g,"<span>("));
It's working as intended in Firefox, chrome and safari, but (as usual) not behaving correctly on IE6/IE7 (the text after "(" just get removed)
Any thoughts ?
PS: I'm doing it this way because I need "(number") to be in a different color, and <span> in a <option> is not valid.

I don't think it's the regex breaking. The below works fine in IE7:
alert("(test".replace(/\(/g,"<span>("));
What's probably happening is that IE6/7 have no clue how to render a span inside an option and just fails to display anything.

You're saying that span in an option is not valid, and yet that's exactly what you're trying to add. Invalid code isn't prettier just because it happens to be valid at load-time, if you know you're going to mess it upp later. So really, if that's your only concern, do add this span declaratively in your HTML, rather than injecting it later.
But if you want to fix this, I think it might help if you rewrite the regex to cover the entire tag. In a lot of cases, IE doesn't let you just change the HTML to whatever, but will use its own internal representation to fix up the code, according to its own preferences. When you write a table, for instance, IE will always internally infer a tbody, even if there isn't any in the code. In the same manner, if you inject a <span> and there's no </span>, IE might insert one by itself. To get around this, make sure you inject the code in its entirety in one go:
$(this).html($(this).html().replace(/\((.*?)\)/g,"<span>($1)</span>"));

I don't have IE7 but in IE6 the following
javascript:"<select><option>ABC (123)</option><option>XYZ (789)</option></select>".replace(/\(/g,"<strong>(")
yields
<select><option>ABC <strong>(123)</option><option>XYZ <strong>(789)</option></select>
And gets correctly displayed (aside that <strong> has no effect). Also works fine when using <span> instead of <strong>

We Keep Coding

JavaScript is the programming language of the Web.