Can't create some kinds of nodes from HTML strings using innerHTML

Can't create some kinds of nodes from HTML strings using innerHTML - javascript

The following javascript (run in the chrome console) does not do what I'd expect:
> var elem = document.createElement("foo");
undefined
> elem.innerHTML = "<tr></tr>"
"<tr></tr>"
> elem.outerHTML
"<foo></foo>"
The <tr> tag has disappeared!
This seems specific to table-related elements. Using <div> or <span> works as expected.
I expect what I'm doing is invalid, as "foo" is not a known element, and presumably table-related elements can only appear within a . Interestingly, the following code works just fine:
> var elem = document.createElement("foo"), tr = document.createElement("tr");
> elem.appendChild(tr);
> elem.outerHTML
"<foo><tr></tr></foo>"
So it seems like the construction itself (a <tr> not within a <table>) is allowable, but the method of using innerHTML to place it there does not work - perhaps this goes through some html cleanup, which removes things that are not strictly, while creating DOM nodes directly is not subject to the same validation.
My question: is there any way to populate an arbitrary DOM node from a string without running into such cleanup / validation issues? My use case will end up with perfectly valid structure (I plan to place this as the child of a sometime later), but the browser is stopping me while I'm trying to build the individual parts.
It sounds a little like DocumentFragment should be what I'm looking for, but as far as I can tell those are only constructable programmatically - they don't support innerHTML.
some background on why I want to do this:
My use case is javascript-based live templating (i.e not outputting html strings, but actual DOM nodes). So the requirements are:
template input must be allowed to be arbitrary HTML (this is why I'm using innerHTML and not constructing nodes programmatically)
it must be possible to create sub-templates that are then attached into a larger document (that's why I can't just create the whole at once).
The second point is how I encountered this bug. My template contains a sub-template.
var row = Html("<tr></tr>");
var table = Html(["<table><thead>", row, "</thead></table>"]);
I will later add code like:
row.append(Html(["<td>", column.header, "</td>"]));
to actually populate the columns. So when it's fully constructed, the html will be valid. But in the intermediate stages, each template / snippet is constructed under a single element. That means that templates like:
Html(["Hello <span>", name, "</span>"]);
still come out as a single node (so that they can be manipulated as a single entity):
<foo>Hello <span>bob</span></foo>
When the template results in only a single child inside the <foo>, the outer node is removed. But during construction, the row template above should look like <foo><tr></tr></foo>. Due to the validation behaviour I'm seeing when using innerHTML it just ends up as <foo></foo>.
I've checked all code works the same in both firefox & chrome, so I don't expect I'm just hitting a browser bug.

Unfortunately the answer to your general question is no, there is no way to use innerHTML to add arbitrarily incomplete HTML fragments. I know this is not the answer you want to hear but that's the way it is.
One of the most misunderstood thing about innerHTML stems from the way the API is designed. It overloads the + and = operators to perform DOM insertion. This tricks programmers into thinking that it is merely doing string operations when in fact innerHTML behaves more like a function rather than a variable. It would be less confusing to people if innerHTML was designed like this:
element.innerHTML('some <b>html</b> here');
unfortunately it's too late to change the API so we must instead understand that it is really an API instead of merely an attribute/variable.
Now, to understand the so called "validation" behavior of innerHTML. When you modify innerHTML it triggers a call to the browser's HTML compiler. It's the same compiler that compiles your html file/document. There's nothing special about the HTML compiler that innerHTML calls. Therefore, whatever you can do to a html file you can pass to innerHTML (the one exception being that embedded javascript don't get executed - probably for security reasons).
This makes sense from the point of view of a browser developer. Why include two separate HTML compilers in the browser? Especially considering the fact that HTML compilers are huge, complex beasts.
The down side to this is that incomplete HTML will be handled the same way it is handled for html documents. In the case of <td> elements not inside a table most browsers will simply strip it away (as you've observed for yourself). That is essentially what you're trying to do - create invalid/incomplete HTML.
There are two work arounds to this:
Extract the table from the page then using string processing (regex et. el.) insert the <td> into the table string then innerHTML the whole table back into the page.
Parse the inserted HTML string and if you find any <td> or <tr> (or <option>) extract out the html element and insert it using DOM methods.
Unfortunately both are quite painful.

Mihai Stancu's comment about jquery made me think: surely jquery manages this if you call $("<tr></tr>"). I know jquery has a shortcut for strings that look like single tags, but it must work for complex HTML as well.
So I took a dive into the jquery source code, and found just the ticket:
https://github.com/jquery/jquery/blob/6a0ee2d9ed34b81d4ad0662423bf815a3110990f/src/manipulation.js#L450
It's using a regex to detect just the name of the first tag in the string, then using this info to figure out what "context" it needs to wrap it in for the innerHTML process to treat it as valid. I think this technique should work for all well-formed inputs.
I've adopted this code into a standalone function which will turn an arbitrary string into a DOM node:
https://gist.github.com/gfxmonk/5299096

Related

How to re-render hyperHTML to the same element after content change

I am trying to support the same type of thing as React.Children
My code looks like
const elem = document.getElementById("profile")
const render = hyperHTML.bind(elem);
const name = elem.textContent
render`<b>Hi ${name}</b>`
So the API looks like
<div id="profile">alax</div> 🢂 <div id="profile"><b>Hi alax</b></div>
and I am using MutationObserver to rerender on content change
But if the content is changed. hyperHTML says its rending to the right element.. but the element keeps its innerHtml(No update)
I can see the <!--_hyper: -2001947635;--> is removed then the content is set but setting up the render & hyperHTML.bind again does nothing
Any thoughts would be great! Thx
Update
The fix to the above problem is to call hyperHTML.bind`` then your normal render using hyperHTML will work
Context -
I am using hyperHTML to create a custom element library(hyper-element)
My use case: I work in a mix-tech project (some people use jQuery)
Side note, on the why. I want to support something like partial templates
Example of a partial template:
<user-list data="[{name:'ann',url:''},{name:'bob',url:''}]">
<div>{#name}</div>
</user-list>
Output:
<user-list data="[{name:'ann',url:''},{name:'bob',url:''}]">
<div>ann</div>
<div>bob</div>
</user-list>
This is one use of setting custom content in an element you control
At the moment I have the setting of the content by 3-party working/re-rending
https://jsfiddle.net/k25e6ufv/16/
My problem is now: it is rending another custom element and getting the pass content to child element
It looks like hyperHTML is setting the child element's content in front to the element and creating the element without setting the content
Scroll down to bottom of source to see implementation!
https://jsfiddle.net/k25e6ufv/14/
Rending crazy-cats:
Html`
xxx: ${this.wrapedContent} zzzz
`
Current output:
wrapedContent: ppp time:11:35:48 ~ crazy-cats: **Party 11:35:48** xxx: zzzz
<crazy-cats>Party 11:37:21 xxx: <!--_hyper: -362006176;--> zzzz </crazy-cats>
Desired output:
wrapedContent: ppp time:11:35:48 ~ crazy-cats: xxx: **Party 11:35:48** zzzz
<crazy-cats> xxx: Party 11:37:21 zzzz </crazy-cats>

I will try to answer as best as I can, but I'll start saying that when asking for help, it'd be much easier/better to show the simplest use case you are trying to solve.
There is a lot of "surrounding" code in your fiddles so that I'll try to answer only to hyperHTML related bits.
hyper-element ?
I am not sure what's the goal of the library but hyperHTML exposes hyper.Component, and there's also an official HyperHTMLElement class to extend, which does most of the things you manually implement in your examples.
I'll keep answering your questions but please consider trying, at least, the official alternative and maybe push some change there if needed.
partial templates
hyperHTML pattern and strength is the Template Literal standard. Accordingly, to generate TL from the DOM would require either parsing of the content or code evaluation. Both solutions aren't the way to go.
Custom Elements require JavaScript to work, and without JS your partial template is useless and also potentially confusing for the user/consumer.
You don't want to define what to do with the data in the layout, you want to define a Custom Element behavior within the class that defines it.
That means: get rid of old-style in-DOM output, and simply use the Custom Element class to define its content. You maintain the related class only instead of maintaining a layout that has no knowledge about how the CE should represent that data.
TL;DR the following is a bad hyperHTML pattern:
<user-list data="[{name:'ann',url:''},{name:'bob',url:''}]">
<div>{#name}</div>
</user-list>
all you want to do is to write this:
<user-list data="[{name:'ann',url:''},{name:'bob',url:''}]"></user-list>
but be careful, the data attribute in hyperHTML is special only if passed through the template literal. If you want to pass JSON to the component, call the attribute differently.
// hyperHTML data is special, no need to use JSON
render`<c-e data=${{as: 'it is'}}></c-e>`
Above snippet is different from having JSON as data attribute text so your example should use data-json name, and the class should remember to JSON.parse(this.dataset.json) in its constructor (or have an attribute observer that does that for you)
hyperHTML owns elements
When you write:
it looks like hyperHTML is setting the child element's content in front to the element and creating the element without setting the content
you are assuming you should care at all what hyperHTML does: you shouldn't.
The only thing you should understand is that hyperHTML owns the node it handles. If you trash those nodes via different libraries or manually, you are doing something wrong.
hyperHTML(document.body)`<p>hello ${'world'}</p>`;
// obtrusive libraries ... later on ...
document.body.textContent = 'bye bye';
// hyperHTML still owns the body content
hyperHTML(document.body)`<p>hello ${'world'}</p>`;
Above snippet is perfectly fine and totally wrong at the same time.
You don't update the body content manually, you don't interfere with its content via jQuery or other libraries, and you should never trash the content at all.
Once you chose hyperHTML to handle a bound context, that's it, you've made your choice.
This is true for pretty much every library on this world. If you use Angular to create something and you mess it all via jQuery, that breaks. If you write backbone templates and you mess later on with their content manually, that breaks.
If you bind an element to hyperHTML and you mess it up with other libraries, that breaks.
The only thing that won't break are wires, meaning the moment you create a wire, you can append it directly and that's actually a DOM node so it will be there, and it will be handled by hyperHTML.
Yet you should use hyperHTML to handle those changes, never jQuery or JS itself.
The output is right
When you say that the output should not contain the comment you are assuming you should care what output is produced via hyperHTML: you shouldn't!
hyperHTML uses comments as delimiters and these are absolutely fine for both performance, being unaffected by repaint and reflows, and for partial changes like the following one:
hyperHTML(document.body)`<p>${'a'} b ${'c'}</p>`
Both a and c will have a comment as anchor node to be able to update their content with anything later on.
hyperHTML(document.body)`<p>${[list, of, nodes]} b ${otherThing}</p>`
You change interpolations? All good, hyperHTML knows what to replace and where.
force-own the content
If you use a different template literal to re-populate a bound node you are trashing the cache and creating new content.
At that point you are better off with innerHTML because all the features of hyperHTML will be gone.
To start with, if your content can change so much, use an array.
hyper(document.body)`${['text']}`;
// you can clean up the text through empty array
hyper(document.body)`${[]}`;
// re-populate it with new content
hyper(document.body)`${['a', 'b', 'c']}`;
Above example is still better than changing template because all the optimizations for the content will be already there.
However, if you want to be sure the node the initial one created via hyperHTML, assuming no third parts script mutate/trash that node, you can use a wire.
const body = hyper()`<p>my ${'content'}</p>`;
document.body.textContent = '';
document.body.appendChild(body);
It's a bit extreme but at least faster.
As Summary
It looks like you are trying to sneak in hyperHTML into an application that trashes layout all the time through different third parts libraries.
Unless you create a closed Shadow DOM reference and you drop partial template through layout, you'll always have issues with libraries based on side effects with DOM content, libraries that mutates elements they don't own.
In hyperHTML the ownership concept is key, like in React you cannot change at runtime the defined JSX for the component, you should never try to change at runtime the defined template literal for hyperHTML.
Now, as much as I'd like to solve all your issues, I feel like it's right to ask you: are you sure hyperHTML is really the solution for your current app? It looks like surrounding side-effects caused by third parts libraries would constantly break your expectations if you don't use closed mode Shadow DOM and hyperHTML only to update your DOM.

JavaScript HTML injection efficiency/best practice

I'm looking to inject HTML via JavaScript into a page at work.
What I'd like to know is if injecting a re-write of the page is more or less efficient than injecting snippets throughout the page with methods like getElementById().
For example:
document.getElementById("Example").innerHTML = '<h2 id="Example" name="Example">Text</H2>'
document.getElementsByClassName("Example").innerHTML = '<H1>Test</H1>'
...etc. Is this more efficient/effective than simply injecting my own version of the entire page's HTML start to finish?
Edit: Per Lix's comment, I should clarify that I likely will be injecting a large amount of content into the page, but it will affect no more than a dozen elements at any time.

If your project can manage it, it could be better to create DOM Elements and append them to the tree.
The big problem with efficiency would be that setting .innerHTML property would first remove all the nodes and only then parse the html and append it to the DOM.
It's obvious that you should avoid removing and the re-appending identical elements, so if you're sure the "Example" elements would always remain on the page, your way of setting them seems to be a nice optimazation.
If you want to optimize it even further, you could parse the html you want to append to nodes and have a function that checks which ones should be appended and which one shouldn't. But be aware that accessing the DOM is costly. Read more about the ECMA-DOM bridge.
Edit: In some cases it might be better to let the browser do the html parsing and injecting through innerHTML. It depends on the amount of HTML you're inserting and the amount you're deleting. See #Nelson Menezes's comments about innerHTML vs. append.

Depends on the context. If it was only decoration of existing content, then your proposal would suffice. I'd use jQuery anyway, but that's only my preference.
But when injecting the actual content you have two concerns:
maintainability - Make the structure of your code readable and subject to easy change when you need (and you will need).
accessibility - When javascript is disabled, then no content will be visible at all. You should provide a link to desired content in <noscript/> tag or ensure accessibility to everyone any other way you prefer. That's a minority of internet users at the moment, but for professional webmasters they make it count.
To address both of above concerns I prefer to use ajax to load a whole page, some part or even plaintext into existing element. It makes it readable, 'cause the content is sitting in another file completely separated from the script. And since it's a file, you may redirect to it directly when javascript is disabled. It makes the content accessible to anyone.
For plain javascript you'd have to use XMLHttpRequest object, like here.
With jQuery it's even simpler. Depending on what you need you may use .load, .get or .ajax.

Best practice today is using JQuery Manipulation functions.
Most time you'd use one of this 3 functions :
Replace existing HTML node:
$("div").html("New content");
Append a sibling node:
$("div").append("New content");
Remove a node:
$("div").remove();

Strange issues loading html fragments with innerHTML

i have a userjs script for opera which displays its own interface, currently using DOM methods to create elements. This works well, but is hard to maintain as the interface is tied to the code. So I'm looking for a way to separate layout from code. Also i want to keep things simple and really don't want to rely on a framework (jquery...) for that. I don't care about cross-browser functionality, this thing can only work on opera anyway.
i got all the style stuff into css, that helped. Now i'm looking for a way to abstract the layout. A good part of the UI is quite dynamic, so I can't just use one big static html. The idea that came up was to have a piece of html containing the layout for the different UI parts, extract fragments from that and put everything together as needed.
This works pretty well to some degree:
create a div, never parent it.
use .innerHTML to load the html into it
use this getElementsByClassName() to find widgets in it
clone them with widget.cloneNode(true)
parent it etc...
I know of some issues with cloneNode() (risk of duplicate ids, and event handlers missing in the clone) but i can work around them.
The problem is, with .innerHTML loading i get different results from the current DOM code, even though i use a captured layout from the DOM code version ! I'm seeing this with tables for example. For a simple
<table><tr><td></td></tr></table>
the innerHTML version shows up with <tbody> tags in it in dragonfly, and css rules like this one don't apply anymore because of it:
table > tr > td { ... }
I have a baaaad feeling about all this ...
Are there other big differences between DOM and html layouts ?
Maybe i should really be using <tbody> in the DOM stuff ?
How would you do this ?
Bonus question:
what is the reason behind createDocumentFragment() existence ? what can you do with it that can't be done otherwise ?

You're right, it looks like a table whose markup doesn't define <tbody>, will be converted to markup with the <tbody> tag present when reading the innerHTML poperty of a table.
But this shouldn't cause too much trouble for you: as for the CSS issue, drop the > from your selectors (restricts to direct descendent).
One possible benefit of DocumentFragment is that when you need to do significant amount of DOM manipulation, it may cause some performance gain if only a document fragment is manipulated and once all transformations are done, it is attached to the DOM.

Work with AJAX response with DOM methods

I'm retrieving an entire HTML document via AJAX - and that works fine. But I need to extract certain parts of that document and do things with them.
Using a framework (jquery, mootools, etc) is not an option.
The only solution I can think of is to grab the body of the HTML document with a regex (yes, I know, terrible) ie. <body>(.*)</body> put that into the current page's DOM in a hidden element, and work with it from there.
Is there an easier/better way?
Update
I've done some testing, and inserting an entire HTML document into a created element behaves a bit differently across browsers I've tested. For example:
FF3.5: keeps the contents of the HEAD and BODY tags
IE7 / Safari4: Only includes what's between ...
Opera 10.10: Keeps HEAD and everything inside it, Keeps contents of BODY
The behavior of IE7 and Safari are ideal, but different browsers are doing this differently. Since I'm loading a predetermined HTML document I think I'm going to use the regEx to grab what I want and insert it into a DOM element - unless someone has other suggestions.

Elements can exist without being in the page itself. Just dump the HTML into a dummy div.
var wrapper = document.createElement('div');
wrapper.innerHTML = "<ul><li>foo</li><li>bar</li></ul>";
wrapper.getElementsByTagName('li').length; // 2
Given your edits, we run into a sticky situation, since you want getElementById. The matter would probably be easy if you could just create a new virtual document via document.implementation.createDocument, but IE doesn't support that at all.
Using a regex is a messy business, since what if we see something like <body><input value="</body>" /></body>? You could probably just make your regex greedy so that it moves on to the last instance of </body>, but if you do end up running into troubles, a more thorough parsing may be necessary. Even if a full framework isn't an option, you might end up wanting to use something like Sizzle, the core of libraries like jQuery, to look for the element you want. Or, if you're really feeling in a purist sort of mood, you could write the recursive search function yourself - but why take that hit if someone else has already taken it?
var response_el = document.createElement('html'), foo;
response_el.innerHTML = the_html_elements_content;
foo = Sizzle('#foo', response_el);

innerHTML alternative for retrieving contents of page?

I'm currently using innerHTML to retrieve the contents of an HTML element and I've discovered that in some browsers it doesn't return exactly what is in the source.
For example, using innerHTML in Firefox on the following line:
<div id="test"><strong>Bold text</strong></strong></div>
Will return:
<strong>Bold text</strong>
In IE, it returns the original string, with two closing strong tags. I'm assuming in most cases it's not a problem (and may be a benefit) that Firefox cleans up the incorrect code. However, for what I'm trying to accomplish, I need the exact code as it appears in the original HTML source.
Is this at all possible? Is there another Javascript function I can us?

I don't think you can receive incorrect HTML code in modern browsers. And it's right behaviour, because you don't have source of dynamicly generated HTML. For example Firefox' innerHTML returns part of DOM tree represented in string. Not an HTML source. And this is not a problem because second </strong> tag is ignored by the browser anyway.

innerHTML is generated not from the actual source of the document ie. the HTML file but is derived from the DOM object that is rendered by the browser. So if IE somehow shows you incorrect HTML code then it's probably some kind of bug. There is no such method to retrieve the invalid HTML code in every browser.

You can't in general get the original invalid HTML for the reasons Ivan and Andris said.
IE is also “fixing” your code just like Firefox does, albeit in a way you don't notice on serialisation, by creating an Element node with the tagName /strong to correspond to the bogus end-tag. There is no guarantee at all that IE will happen to preserve other invalid markup structures through a parse/serialise cycle.
In fact even for valid code the output of innerHTML won't be exactly the same as the input. Attribute order isn't maintained, tagName case isn't maintained (IE gives you <STRONG>), whitespace is various places is lost, entity references aren't maintained, and so on. If you “need the exact code”, you will have to keep a copy of the exact code, for example in a JavaScript variable in a <script> block written after the content in question.

If you don't need the HTML to render (e.g., you're going to use it as a JS template or something) you can put it in a textarea and retrieve the contents with innerHTML.
<textarea id="myTemplate"><div id="test"><strong>Bold text</strong></strong></div></textarea>
And then:
$('#myTemplate').html() === '<div id="test"><strong>Bold text</strong></strong></div>'
Other than that, the browser gets to decide how to interpret the HTML and it will only return you it's interpretation, not the original.

innerTEXT ? or does that have the same eeffect?

You must use innerXML property. It does exactly what you want to achieve.

We Keep Coding

JavaScript is the programming language of the Web.