When to use NodeIterator

When to use NodeIterator - javascript

Benchmark compares QSA & .forEach vs a NodeIterator
toArray(document.querySelectorAll("div > a.klass")).forEach(function (node) {
// do something with node
});
var filter = {
acceptNode: function (node) {
var condition = node.parentNode.tagName === "DIV" &&
node.classList.contains("klass") &&
node.tagName === "A";
return condition ? NodeFilter.FILTER_ACCEPT : NodeFilter.FILTER_REJECT
}
}
// FIREFOX Y U SUCK
var iter = document.createNodeIterator(document, NodeFilter.SHOW_ELEMENT, filter, false);
var node;
while (node = iter.nextNode()) {
// do thing with node
}
Now either NodeIterator's suck or I'm doing it wrong.
Question: When should I use a NodeIterator ?
In case you don't know, DOM4 specifies what NodeIterator is.

NodeIterator (and TreeWalker, for that matter) are almost never used, because of a variety of reasons. This means that information on the topic is scarce and answers like #gsnedders' come to be, which completely miss the mark. I know this question is almost a decade old, so excuse my necromancy.
Initiation & Performance
=
It is true that the initiation of a NodeIterator is waaay slower than a method like querySelectorAll, but that is not the performance you should be measuring.
The thing about NodeIterators is that they are live-ish in the way that, just like an HTMLCollection or live NodeList, you can keep using the object after initiating it once.
The NodeList returned by querySelectorAll is static and will have to be re-initiated every time you need to match newly added elements.
This version of the jsPerf puts the NodeIterator in the preparation code. The actual test only tries to loop over all newly added elements with iter.nextNode(). You can see that the iterator is now orders of magnitudes faster.
Selector performance
=
Okay, cool. Caching the iterator is faster. This version, however, shows another significant difference. I've added 10 classes (done[0-9]) that the selectors shouldn't be matching. The iterator loses about 10% of its speed, while the querySelectors lose 20%.
On the other hand, this version, shows what happens when you add another div > at the start of the selector. The iterator loses 33% of its speed, while the querySelectors got a speed INCREASE of 10%.
Removing the initial div > at the start of the selector like in this version shows that both methods become slower, because they match more than earlier versions. Like expected, the iterator is relatively more performant than the querySelectors in this case.
This means that filtering on basis of a node's own properties (its classes, attributes, etc.) is probably faster in a NodeIterator, while having a lot of combinators (>, +, ~, etc.) in your selector probably means querySelectorAll is faster.This is especially true for the  (space) combinator. Selecting elements with querySelectorAll('article a') is way easier than manually looping over all parents of every a element, looking for one that has a tagName of 'ARTICLE'.
P.S. in §3.2, I give an example of how the exact opposite can be true if you want the opposite of what the space combinator does (exclude a tags with an article ancestor).
3 Impossible selectors
3.1 Simple hierarchical relationships
Of course, manually filtering elements gives you practically unlimited control. This means that you can filter out elements that would normally be impossible to match with CSS selectors. For example, CSS selectors can only "look back" in the way that selecting divs that are preceded by another div is possible with div + div. Selecting divs that are followed by another div is impossible.
However, inside a NodeFilter, you can achieve this by checking node.nextElementSibling.tagName === 'DIV'. The same goes for every selection CSS selectors can't make.
3.2 More global hierarchical relationships
Another thing I personally love about the usage of NodeFilters, is that when passed to a TreeWalker, you can reject a node and its whole sub-tree by returning NodeFilter.FILTER_REJECT instead of NodeFilter.FILTER_SKIP.
Imagine you want to iterate over all a tags on the page, except for ones with an article ancestor.With querySelectors, you'd type something like
let a = document.querySelectorAll('a')
a = Array.prototype.filter.call(a, function (node) {
while (node = node.parentElement) if (node.tagName === 'ARTICLE') return false
return true
})
While in a NodeFilter, you'd only have to type this
return node.tagName === 'ARTICLE' ? NodeFilter.FILTER_REJECT : // ✨ Magic happens here ✨
node.tagName === 'A' ? NodeFilter.FILTER_ACCEPT :
NodeFilter.FILTER_SKIP
In conclusion
You don't initiate the API every time you need to iterate over nodes of the same kind. Sadly, that assumption was made with the question being asked, and the +500 answer (giving it a lot more credit) doesn't even address the error or any of the perks NodeIterators have.
There's two main advantages NodeIterators have to offer:
Live-ishness, as discussed in §1
Advanced filtering, as discussed in §3
(I can't stress enough how useful the NodeFilter.FILTER_REJECT example is)
However, don't use NodeIterators when any of the following is true:
Its instance is only going to be used once/a few times
Complex hierarchical relationships are queried that are possible with CSS selectors
(i.e. body.no-js article > div > div a[href^="/"])
Sorry for the long answer :)

It's slow for a variety of reasons. Most obviously is the fact that nobody uses it so quite simply far less time has been spent optimizing it. The other problem is it's massively re-entrant, every node having to call into JS and run the filter function.
If you look at revision three of the benchmark, you'll find I've added a reimplementation of what the iterator is doing using getElementsByTagName("*") and then running an identical filter on that. As the results show, it's massively quicker. Going JS -> C++ -> JS is slow.
Filtering the nodes entirely in JS (the getElementsByTagName case) or C++ (the querySelectorAll case) is far quicker than doing it by repeatedly crossing the boundary.
Note also selector matching, as used by querySelectorAll, is comparatively smart: it does right-to-left matching and is based on pre-computed caches (most browsers will iterate over a cached list of all elements with the class "klass", check if it's an a element, and then check if the parent is a div) and hence they won't even bother with iterating over the entire document.
Given that, when to use NodeIterator? Basically never in JavaScript, at least. In languages such as Java (undoubtedly the primary reason why there's an interface called NodeIterator), it will likely be just as quick as anything else, as then your filter will be in the same language as the filter. Apart from that, the only other time it makes sense is in languages where the memory usage of creating a Node object is far greater than the internal representation of the Node.

Related

Optimizing jQuery selector / addBack() when dealing with a large collection

I use jQuery to intentionally remove css classes from elements in a potentially large html table. See below for an explanation why I am doing that.
Currently I am doing it like this:
var tableElements = $("#TreeListElemente").find("*").addBack();
tableElements.removeClass("dxtl dxtl__B2 dxtl__B0 dxtlSelectionCell dxtlHeader dxtl__B3 dxtlControl dx-wrap dxtl__IM dxeHyperlink");
The table sometimes is large and has many elements. I would like to speed up the page load / DOM manipulation.
The IE's built-in Javascript profiler tells me that especially the .addBack() is slow. It seems to do some kind of sorting, which is totally unnecessary to my use case. Could I get rid of that? Is there another way to include the selected element itself, besides addBack()?
IE javascript profiler: Execution time for a collection of about 60000 elements. The inclusive times are in the third column
Or is there another, more efficient way to remove classes from a large set of elements, with selecting an element, itself, and all children?
Note: Why am I doing this: I am using the DevXpress TreeList Component which comes with it's own styling. There is no easy way to "unstyle it" on the server side, thus I chose to do that client-side, the way demonstrated above. In the end, I am selecting the TreeList, all child elements, and remove the relevant css classes from them.
Update/Solution 1
I have successfully implemented the solution proposed by Frédéric Hamidi an got quite an improvement:
IE javascript profiler: Execution time for a collection of about 60000 elements, using the proposal by Frederic. The inclusive times are in the third column
The time needed for the addBack() operation are just gone, remaining just the other stuff. This means an improvement by more than factor 4 overall. Yay!
Update/Solution 2
I have also implemented the solution proposed by A. Wolff and got a slight additional improvement:
IE javascript profiler: Execution time for a collection of about 60000 elements, using the proposal by A. Wolff. The inclusive times are in the third column
The time needed for the find() operation is gone, remaining just the other stuff again. This means an slight improvement of some 10s of milliseconds on my machine. Cool!
This is the solution I am using now:
$("#TreeListElemente, #TreeListElemente [class]").removeClass("dxtl dxtl__B2 dxtl__B0 dxtlSelectionCell dxtlHeader dxtl__B3 dxtlControl dx-wrap dxtl__IM dxeHyperlink");

addBack() does perform a sort to put the matched elements in document order. The easy alternative, add(), does the exact same thing, so it won't solve your problem.
However, the documentation is helpful enough to provide a solution:
To create a jQuery object with elements in a well-defined order and
without sorting overhead, use the $(array_of_DOM_elements) signature.
Therefore, to avoid that overhead, you can write:
var ancestor = $("#TreeListElemente"),
tableElements = $(ancestor.find("*").get().concat(ancestor[0]));
get() and concat() end up building two arrays under the hood, though, so it will affect performance. The end result may be faster than your current approach, depending on the number of elements you match.

The relevant selector to select element with ID TreeListElemente and all its descendants would be:
"#TreeListElemente, #TreeListElemente *"
Now you could filter out descendants having class:
"#TreeListElemente, #TreeListElemente [class]"
So it would give:
$("#TreeListElemente, #TreeListElemente [class]").removeClass("dxtl dxtl__B2 dxtl__B0 dxtlSelectionCell dxtlHeader dxtl__B3 dxtlControl dx-wrap dxtl__IM dxeHyperlink");

Here's a thought:
function deClassify(jq, classes) {
var remove = classes.join(' ');
jq.find('.' + classes.join(',.')).removeClass(remove);
jq.removeClass(remove);
}
deClassify($('.keepme'), ['remove', 'remove2', 'remove3']);
.remove, .remove2, .remove3 {
color: red;
}
.keepme, .keepme2 {
font-weight: bold;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div class="keepme remove remove2">
<div class="keepme2 remove remove3">x</div>
</div>
This avoids "selecting" non-matching elements, reducing the load, and of course no extra sorting is involved...

Jquery to pure javascript and how the intepreter looks up with dom for elements

I have a couple of questions about the inner workings of JavaScript and how the interpreter handles certain queries
The following JQuery will correctly get all the images that contain the word "flowers" in the src
$("img[src*='flowers']");
Jquery makes this very simple but what the pure javascript version?
We have a very large DOM. I take it if I do $("*[src*='flowers']") this will greatly affect performance (wildcard element). I'm interested in what the Javascript interpreter does differently between $("img[src*='flowers']") and $("*[src*='flowers']")

Well, the clearest way to explain the difference is to show you how you'd write both DOM queries in plain JS:
jQuery's $("img[src*='flowers']"):
var images = document.getElementsByTagName('img');//gets all img tags
var result = [];
for (var i = 0; i < images.length;i++)
{
if (images[i].getAttribute('src').indexOf('flowers') !== -1)
{//if img src attribute contains flowers:
result.push(images[i]);
}
}
So as you can see, you're only searching through all img elements, and checking their src attribute. If the src attribute contains the substring "flowers", the add it to the result array.
Whereas $("[src*='flowers']") equates to:
var all = document.getElementsByTagName('*');//gets complete DOM
var result = [];
for (var i =0; i <all.length; i++)
{
if (all[i].hasAttribute('src') && all[i].getAttribute('src').indexOf('flowers') !== -1)
{//calls 2 methods, for each element in DOM ~= twice the overhead
result.push(all[i]);
}
}
So the total number of nodes will be a lot higher than just the number of img nodes. Add to that the fact that you're calling two methods (hasAttribute and getAttibute) for all img elements (thanks to short-circuit evaluation, all elements that don't have an src attribute, the getAttribute method won't be called) there's just a lot more going on behind the scenes in order for you to get the same result.
note:
I'm not saying that this is exactly how jQuery translates the DOM queries for you, it's a simplified version, but the basic principle stands. The second version (slower version) just deals with a lot more elements than the first. That's why it's a lot slower, too.

When you use *[src..] you will try to find all elements from the page, but when you use $("img[src..]") the search is restricted to img elements, like this: imgs = document.getElementsByTagName("img")
Heres a JSFiddle getting those images using pure javascript.
Edit:
turn console on so you can see the return from console.log

The direct JavaScript methods are document.querySelector or document.querySelectorAll. The problem with those is that they are not supported in all browsers, jQuery (through SizzleJS) provides a browser compatible way of doing these things. SizzleJS delegates to document.querySelectorAll if it is available, and it falls back on other mechanisms when it is not available. So unless you want to write the fall back code yourself, it's probably best to stick with something like SizzleJS, which provides the selector functionality without the overhead of jQuery.

jQuery context slows down search [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Performance of jQuery selector with context
In the jQuery DOCS it says
By default, selectors perform their searches within the DOM starting
at the document root. However, an alternate context can be given for
the search by using the optional second parameter to the $() function.
Based on that my understanding is that a selection using a context passed in as the second parameter should be faster then the same selection without the context passed in. However I ran some tests and it seems as if this isn't the case, or at least isn't always the case.
To elaborate, I originally wanted to see if searching for multiple elements at once ($("div1, #div2")) was faster then searching for the two separately ($("#div1") $("div2")). I then decided to test it with the context and without to see how much faster it was with the context, but was surprised when it turned out that the context seemed to be slowing it down.
For example given the following basic HTML markup
<div id="testCnt">
<div id="Div0"></div>
<div id="Div1"></div>
<div id="Div2"></div>
<div id="Div3"></div>
<div id="Div4"></div>
<div id="Div5"></div>
<div id="Div6"></div>
<div id="Div7"></div>
<div id="Div8"></div>
<div id="Div9"></div>
</div>
And the following JavaScript (jQuery 1.8.2, and tested using FireBug)
$(function () {
var $dvCnt = $('#testCnt');
var dvCnt = $dvCnt[0];
console.time('Individual without cache');
for (var i = 0; i < 10000; i++) {
$('#Div0').text('Test');
$('#Div1').text('Test');
$('#Div2').text('Test');
$('#Div3').text('Test');
$('#Div4').text('Test');
$('#Div5').text('Test');
$('#Div6').text('Test');
$('#Div7').text('Test');
$('#Div8').text('Test');
$('#Div9').text('Test');
}
console.timeEnd('Individual without cache');
console.time('Individual with $cache');
for (var i = 0; i < 10000; i++) {
$('#Div0', $dvCnt).text('Test');
$('#Div1', $dvCnt).text('Test');
$('#Div2', $dvCnt).text('Test');
$('#Div3', $dvCnt).text('Test');
$('#Div4', $dvCnt).text('Test');
$('#Div5', $dvCnt).text('Test');
$('#Div6', $dvCnt).text('Test');
$('#Div7', $dvCnt).text('Test');
$('#Div8', $dvCnt).text('Test');
$('#Div9', $dvCnt).text('Test');
}
console.timeEnd('Individual with $cache');
console.time('Individual with DOM cache');
for (var i = 0; i < 10000; i++) {
$('#Div0', dvCnt).text('Test');
$('#Div1', dvCnt).text('Test');
$('#Div2', dvCnt).text('Test');
$('#Div3', dvCnt).text('Test');
$('#Div4', dvCnt).text('Test');
$('#Div5', dvCnt).text('Test');
$('#Div6', dvCnt).text('Test');
$('#Div7', dvCnt).text('Test');
$('#Div8', dvCnt).text('Test');
$('#Div9', dvCnt).text('Test');
}
console.timeEnd('Individual with DOM cache');
console.time('Multiple without cache');
for (var i = 0; i < 10000; i++) {
$('#Div0,#Div1 ,#Div2 ,#Div3 ,#Div4 ,#Div5 ,#Div6, #Div7, #Div8, #Div9').text('Test');
}
console.timeEnd('Multiple without cache');
console.time('Multiple with $cache');
for (var i = 0; i < 10000; i++) {
$('#Div0,#Div1 ,#Div2 ,#Div3 ,#Div4 ,#Div5 ,#Div6, #Div7, #Div8, #Div9', $dvCnt).text('Test');
}
console.timeEnd('Multiple with $cache');
console.time('Multiple with DOM cache');
for (var i = 0; i < 10000; i++) {
$('#Div0,#Div1 ,#Div2 ,#Div3 ,#Div4 ,#Div5 ,#Div6, #Div7, #Div8, #Div9', dvCnt).text('Test');
}
console.timeEnd('Multiple with DOM cache');
});
Here's a jsbin
I'm getting something like the following results
Individual without cache: 11490ms
Individual with $cache: 13315ms
Individual with DOM cache: 14487ms
Multiple without cache: 7557ms
Multiple with $cache: 7824ms
Multiple with DOM cache: 8589ms
Can someone shed some insight on whats going on? Specifically why the search is slowing down when the jQuery context is passed in?
EDIT:
Most of the anwsers here (as well as Performance of jQuery selector with context) basically say that that either the DOM in this example is too small to really gain much or that selecting by ID is going to be fast regardless. I understand both points, the main point of my question is why would the context slow down the search, the size of the DOM shouldn't make a difference for that, and neither should the fact that searching by ID is already very fast.
#pebble suggested that the reason that its slower is because jQuery can't use the native browser methods (getElementByID), this seems to make sense to me, but then why is it faster to search for multiple elements in one selection?
Anyway I dumped the tests into a jsPerf adding cases to search by class and was again surprised to see that the search for multiple classes with a cache this time was the fastest.

I would imagine there are lots of situations where using context will slow things down, mainly because jQuery will try and use browser native methods where it can - rather than traverse the entire dom. One example of this would be using document.getElementById as in your example.
why the slow down?
getElementById only exists on the document object - you have no way of using this on a contextual element - i.e. element.getElementById. So my theory would be that jQuery first does the id request using document.getElementById, and then, if there is a context set - scans through the parents of each element to tell if any of them exist as children of the context - thereby slowing the process down.
Other examples of selectors that may be slow
You will also find other places where depending on the selector you are using you will get performance increases - all down to what methods jQuery can use to speed up it's work. For example:
$('.className');
Would most likely translate to using getElementsByClassName or any other native method offered to select by className, However:
$('.className .anotherClassName');
Wouldn't be able to use this (as it has to take the relationship into account) and would have to use a mixture of querySelector (if it exists) and or pure javascript logic to work things out.
Having a good knowledge of what native methods are available will help you optimise your jQuery queries.
Ways to optimise
If you wish to optimise using a context, I would imagine this would prove a faster query than without:
$('div', context);
This will be because getElementsByTagName has existed since the dawn of time for a while, and can be used in pure JavaScript directly on a DOM element. However if you are to do this, it may be quicker to do the following:
$().pushStack( context[0].getElementsByTagName('div') );
or
$( context[0].getElementsByTagName('div') );
Mainly because you cut down on the jQuery function calls, although this is much less succinct. Another thing to be aware of with regards to many of the popular JavaScript environments - calling a function without arguments is a lot faster than calling with.
A relatively unused method for optimising certain jQuery selectors is to use the jQuery eq pseudo selector - this can speed things up in a similar way to using LIMIT 0,1 in SQL queries - for example:
$('h2 > a');
Would scan inside all H2s looking for A elements, however if you know from the start that there is only ever going to be one A tag within your H2s you can do this:
$('h2 > a:eq(0)');
Plus if you know there is only ever going to be one H2 - the logic is the same:
$('h2:eq(0) > a:eq(0)');
The difference between $().pushStack and $().add
In response to Jasper's comment here is the difference between the two functions:
.add:
function (a,b){var c=typeof a=="string"?p(a,b):p.makeArray(a&&a.nodeType?
[a]:a),d=p.merge(this.get(),c);return this.pushStack(bh(c[0])||bh(d[0])?
d:p.unique(d))}
.pushStack:
function (a,b,c){var d=p.merge(this.constructor(),a);return
d.prevObject=this,d.context=this.context,b==="find"?d.selector=this.selector
+(this.selector?" ":"")+c:b&&(d.selector=this.selector+"."+b+"("+c+")"),d}
The major difference is that .add() uses .pushStack() to acheive it's goals - add allows support for a lot more data types - even jQuery objects. Whereas .pushStack is only designed for DOM Elements, which makes it more optimal if that is what you are using :)
A quicker way to select by ID?
This is obvious, but I thought I'd put this here as sometimes things are missed - a quicker way to select an element by id would be to do the following:
$(document.getElementById('id'));
All because there is no way jQuery/Sizzle can out-do a native method, and it also means you avoid any string parsing on jQuery/Sizzle's part. It's no where near as neat as it's jQuery counterpart though, and probably wont gain that much speed increase, but it is worth mentioning as an optimisation. You could do the following if you were to use ids often.
jQuery.byid = function(id){
return jQuery(document.getElementById(id))
};
$.byid('elementid');
The above would be slightly slower that my previous example, but should still out-do jQuery.

Since you are selecting by ID, jQuery (or sizzle, i forget) is skipping ahead to the faster document.getElementById() in this case. You may get different results when using classes, however even then it may vary by browser.
You could make your testing easier using something like http://jsperf.com/

You are not going to benefit with context when you use an id since that is highly optimized in the browser.
With a id you can call out and say hey. A non programming example, you are in a room of people, you yell out a name, the person answers.
Now lets look at context. Lets say you know the name is a mans name so you separate the room into men and women. You than ask the group of men for their name. One extra step for something that is rather easy.
You will benefit when you are looking up specific things like attributes. Something that is harder for the browser to look up and is not highly optimized. Say you are looking for an input that has a specific attribute. It would be better to reference an element you know that contains it so it does not have to search every input on the page.
Now the fun part is the context selector is slower. It is better to use find. Why? It has to deal with the creation of multiple jQuery objects. :)
So instead of
$('.myClass', dvCnt).text('Test');
do
$(dvCnt).find('.myClass').text('Test');
if you are doing multiple look ups, it is better to store the first one into a variable
var myDiv = $(dvCnt)
myDiv.find('.myClass1').text('Test');
myDiv.find('.myClass2').text('Test');
But now with jQuery doing to querySelector, these optimizations are a smaller deal unless you are using the made up jQuery selectors that querySelector does not support. For browsers that do not support querySelector, the context is important.

You seem to be using #elementid attribute to perform the tests.
Remember that an ID in a HTML page is supposed to be Unique. So this will not make a difference if you give it a context or not when searching for ID..
This test might make more sense if you are trying to target elements with classes or the element tag themselves.
$('.mydiv' , $('#innerDiv')) might be faster than $('.mydiv')

conditionals in javascript based on page currently showing

because of some problems with joomla "in-content javascript" I have to give all my js logic to one file, but there are problems with inconsistence of dom elements across my site (it is ajax driven, so there is only one script and various DOMs).
What is the best solution to make some conditionals solving this problem..
Is it checking $(selector).length, or is there any better solution..
And in case of the $(selector).length , is there a way to save this selector to variable (performance issues)
for example some kind of
var selector = ($(selector).length !== 0) ? this : false ;
if(selector) { makeSomething; }
The this is actually pointing to Window object..So is there any way to make it like this without need of reselection?
Thanks

var $obj = $('selector');
if ($obj.length) { makeSomething(); }
Actually, this is only meaningful if you are searching for the existence of a certain element (that might identify a whole page) and running several operations based on that.
If you just want to do something on the elements like
$('selector').append('x');
the condition might be useless, because if the jQuery collection is empty, the methods won't run anyways (as pointed out by #Gary Green).

node selection and manipulation out of the dom (What is jQuery's trick ?)

Hi I would like to do dom selection and manipulation out of the dom.
The goal is to build my widget out of the dom and to insert it in the dom only once it is ready.
My issue is that getElementById is not supported on a document fragment. I also tried createElement and cloneNode, but it does not work either.
I am trying to do that in plain js. I am used to do this with jQuery which handles it nicely. I tried to find the trick in jQuery source, but no success so far...
Olivier

I have done something similar, but not sure if it will meet your needs.
Create a "holding area" such as a plain <span id="spanReserve"></span> or <td id="cellReserve"></td>. Then you can do something like this in JS function:
var holdingArea = document.getElementById('spanReserve');
holdingArea.innerHTML = widgetHTMLValue;

jQuery will try to use getElementById first, and if that doesn't work, it'll then search all the DOM elements using getAttribute("id") until it finds the one you need.
For instance, if you built the following DOM structure that isn't attached to the document and it was assigned to the javascript var widget:
<div id="widget">
<p><strong id="target">Hello</strong>, world!</p>
</div>
You could then do the following:
var target;
// Flatten all child elements in the div
all_elements = widget.getElementsByTagName("*");
for(i=0; i < all_elements.length; i++){
if(all_widget_elements[i].getAttribute("id") === "target"){
target = all_widget_elements[i];
break;
}
}
target.innerHTML = "Goodbye";
If you need more than just searching by ID, I'd suggest installing Sizzle rather than duplicating the Sizzle functionality. Assuming you have the ability to install another library.
Hope this helps!

EDIT:
what about something simple along these lines:
DocumentFragment.prototype.getElementById = function(id) {
for(n in this.childNodes){
if(id == n.id){
return n;
}
}
return null;
}
Why not just use jQuery or the selection API in whatever other lib youre using? AFAIK all the major libs support selection on fragments.
If you wan tto skip a larger lib like jQ/Prototype/Dojo/etc.. then you could jsut use Sizzle - its the selector engine that powers jQ and Dojo and its offered as a standalone. If thats out of the question as well then i suppose you could dive in to the Sizzle source and see whats going on. All in all though it seems like alot of effort to avoid a few 100k with the added probaility that the code you come up with is going to be slower runtime wise than all the work pulled into Sizzle or another open source library.
http://sizzlejs.com/
Oh also... i think (guessing) jQ's trick is that elements are not out of the DOM. I could be wrong but i think when you do something like:
$('<div></div>');
Its actually in the DOM document its just not part of the body/head nodes. Could be totally wrong about that though, its just a guess.
So you got me curious haha. I took a look at sizzle.. than answer is - its not using DOM methods. It seems using an algorithm that compares the various DOMNode properties mapped to types of selectors - unless im missing something... which is entirely possible :-)
However as noted below in comments it seems Sizzle DOES NOT work on DocumentFragments... So back to square one :-)

Modern browsers ( read: not IE ) have the querySelector method in Element API. You can use that to get and element by id within a DocumentFragment.
jQuery uses sizzle.js
What it does on DocumentFragments is: deeply loop through all the elements in the fragment checking if an element's attribute( in your case 'id' ) is the one you're looking for. To my knowledge, sizzle.js uses querySelector too, if available, to speed things up.
If you're looking for cross browser compatibility, which you probably are, you will need to write your own method, or check for the querySelector method.

It sounds like you are doing to right things. Not sure why it is not working out.
// if it is an existing element
var node = document.getElementById("footer").cloneNode(true);
// or if it is a new element use
// document.createElement("div");
// Here you would do manipulation of the element, setAttribute, add children, etc.
node.childNodes[1].childNodes[1].setAttribute("style", "color:#F00; font-size:128px");
document.documentElement.appendChild(node)

You really have two tools to work with, html() and using the normal jQuery manipulation operators on an XML document and then insert it in the DOM.
To create a widget, you can use html():
$('#target').html('<div><span>arbitrarily complex JS</span><input type="text" /></div>');
I assume that's not what you want. Therefore, look at the additional behaviors of the jQuery selector: when passed a second parameter, it can be its own XML fragment, and manipulation can happen on those documents. eg.
$('<div />').append('<span>').find('span').text('arbitrarily complex JS'). etc.
All the operators like append, appendTo, wrap, etc. can work on fragments like this, and then they can be inserted into the DOM.
A word of caution, though: jQuery uses the browser's native functions to manipulate this (as far as I can tell), so you do get different behaviors on different browsers. Make sure to well formed XML. I've even had it reject improperly formed HTML fragments. Worst case, though, go back and use string concatenation and the html() method.

We Keep Coding

JavaScript is the programming language of the Web.