Javascript: traversing only HTML DOM elements - javascript

I need to traverse the DOM of a HTML page without taking in nodes, like textnodes, that are not elements. I want just the elements that are tags in the HTML page. Is it possible? How?

Without seeing how you are actually navigating the tree, I can just give you a basic example showing how to check the nodeType
function getFirstChildElement (el) {
el = el.firstChild;
while (el && el.nodeType !== 1)
el = el.nextSibling;
return el;
}

You could simply use the children property of the parent element instead of using childNodes, firstChild, lastChild, and the like...
The children, unlike childNodes refers to only the elements (the tags) and skips other nodes like text which is exactly what you want.
Just to illustrate, here's a short demo:
function showChildElements (el) {
for (var i = 0; i < el.children.length; i++) {
alert(el.children[i].tagName);
}
}

Related

Angular JS text highlighter directive

I wanted a directive to highlight text in an element based on a search string.
Most of the available solutions use a filter instead of a directive and are used like this:
<div ng-html-bind-unsafe="This is the contents for this div | highlight:highlightText"></div>
Here is an example
I would rather use a directive than a filter because I don't like the idea of having to put the content of an element in the ng-html-bind attribute. I feel like an element's contents should be inside of it.
Anyway, I wrote I directive for this but was wondering if there is a better way to write it. I feel like it is not the most efficient of methods. Here is the fiddle. Notice that text within the <code> element is not highlighted. This is due to .contents() only returning the direct children and text nodes of the element. This behaviour is fine unless there is a very simple way to recurse through the contents of each child element.
Thanks in advance.
For traversing through the contents of each child elements, you can use recursion. Put the code for add highlighters and remove highlighterd into a function and call these functions for each child element.
.contents() returns a Jquery object. Convert it to an angular element if node.nodeType === 1 and call contents() on it again.
/*Function to add Highlighters*/
scope.addHighlight = function (elm, value) {
angular.forEach(elm.contents(), function (node) {
if (node.nodeType === 3 && scope.needle.test(node.nodeValue)) {
node = angular.element(node);
node.after(node[0].nodeValue.replace(scope.needle, '<span class="highlight">$1</span>')).remove();
} else if (node.nodeType === 1) {
node = angular.element(node);
if (node.contents().length > 0) scope.addHighlight(node, value);
}
});
}
/*Function to remove current Highlighters*/
scope.removeHighlight = function (elm, value) {
angular.forEach(elm.contents(), function (node) {
nodetype = node.nodeType;
node = angular.element(node);
if (node[0].nodeName === 'SPAN' && node.hasClass('highlight')) {
node.after(node.html()).remove();
elm[0].normalize();
}
if (node.children().length > 0 && nodetype === 1) scope.removeHighlight(node, value);
});
}
Here is the updated fiddle.

in pure Javascript, how do I get all elements inside the body tag excluding a certain div and its children?

I'm trying to find all the elements inside the body tag, but there is one element (div) that has a certain class type of "hidden" which I want to exclude it and its children from my array of elements.
here is my var that contains all the elements in the body:
allTagsInBody = document.body.getElementsByTagName('*');
and here is the div that I want to exclude from this list:
<div class="myHiddenElement">
<button>Click here</button>
<div> <button>Click here</button> </div>
<button>Click here</button>
</div>
the problem is that I don't know how many elements there are inside that div and how far nested they are.
As you iterate through each element, you need to not only check if it has your hidden class but if any of its parent elements have the class. Thus you need to recursively check each element's parents. This can be very expensive depending on the number of elements on the page and how deeply nested they are, but here's how's it's done:
var arr = [];
var len;
var i;
var nodes = document.querySelectorAll('body *');
function checkNode(node) {
if (node.classList.contains('myHiddenElement')) {
return true;
} else if (node.parentNode.nodeType === 1) {
return checkNode(node.parentNode);
}
return false;
};
for (i = 0, len = nodes.length; i < len; i++) {
if (checkNode(nodes[i])) {
continue;
} else {
arr.push(nodes[i]);
}
}
Here's a JSFiddle example: http://jsfiddle.net/xzCfs/5/
Unfortunately I don't think there is a way to do this with CSS selectors since the :not() selector only accepts simple selectors, not compound ones (e.g., :not(.myHiddenClass *) <-- would be awesome if that worked).
document.querySelectorAll( '*:not(.myHiddenElement)' );
The .querySelectorAll along with css2 :not() selector will do it.
Try this
​​var elems = document.body.childNodes;
var filtered = Array(); //holds elements that doesn't have 'myHiddenElement' class
​for(var i=0; i<elems.length; i++)
{
if(elems[i].className != 'myHiddenElement')
filtered.push(elems[i]);
}
If all else fails you can always recursively traverse the DOM (it's what all the libraries do anyway):
Here's a generic DOM traverse function:
# Note: Even though this function accepts a callback it is synchronous:
function traverse (node, callback) {
// The callback function must return true to continue processing
// otherwise stop processing down this branch:
if (callback(node)) {
for (var i=0;i < node.childNodes.length; i++) {
traverse(node.childNodes[i],callback);
}
}
}
So, to build up your collection:
var elements = [];
traverse(document,function(node){
// We only care about element nodes, ignore comments, attributes etc:
if (node.nodeType == 1 && node.className != "myHiddenElement") {
elements.push(node);
return true; // continue parsing this branch
}
return false; // ignore this branch and its children
});

Why does `childNodes` return a number larger than I expect?

Could you please look at this jsFiddle example, and tell me why the number '11' is alerted rather than '5' (the number of <li> elements)?
From jsFiddle:
HTML
<ul id="list">
<li>milk</li>
<li>butter</li>
<li>eggs</li>
<li>orange juice</li>
<li>bananas</li>
</ul>
JavaScript
var list = document.getElementById('list');
var list_items = list.childNodes;
alert(list_items.length);
The childNodes, depending on the browser used, will return the text nodes, as well as the tags that are children of the parent node. So technically, the whitespace in between the <li> tags will also be counted among the childNodes.
To avoid processing them, you may check that nodeType != 3. Here is a list of node types.
var list = document.getElementById('list');
var list_items = list.childNodes;
var li_items = [];
for (var i=0; i<list_items.length; i++) {
console.log(list_items[i].nodeType);
// Add all the <li> nodes to an array, skip the text nodes
if (list_items[i].nodeType != 3) {
li_items.push(list_items[i]);
}
}
You have text nodes there.
You can skip them while iterating with...
for (var i = 0, length = list_items.length; i < length; i++) {
if (list_items[i].nodeType != 1) {
continue;
}
// Any code here that accesses list_items[i] will sure to be an element.
}
jsFiddle.
Alternatively, you could do it in a more functional way...
list_items = Array.prototype.filter.call(list_items, function(element) {
return element.nodeType == 1;
});
jsFiddle.
You must use convert it to a proper array to use the filter() method. childNodes property returns a NodeList object.
As others have pointed out, the childNode count inclues the text nodes, generated by the whitespace between the <li> elements.
<ul id="list"><li>milk</li><li>butter</li><li>eggs</li><li>orange juice</li><li>bananas</li></ul>
That will give you 5 childNodes because it omits the whitespace.
Text nodes are included in the child nodes count. To get the proper value, you'd need to strip out text nodes, or make sure they are not in your code. Any white space between code is considered a space and a text node, so your count is the total number of text nodes.
I cobbled together a solution for this that I like. (I got the idea from this blog post.)
1) First I get the number of child elements nodes by using:
nodeObject.childElementCount;
2) Then I wrote a function that will return any child element node by index number. I did this by using firstElementChild and nextElementSibling in a for loop.
function getElement(x, parentNode){
var item = parentNode.firstElementChild
for (i=0;i<x;i++){
item = item.nextElementSibling;
}
return item;
}
This returns the child element I need for anything I want to pull from it. It skips the problem with childNodes retuning all the different nodes that are not helpful when trying to parse just the elements. I am sure someone more experienced than me could clean this up. But I found this so helpful that I had to post it.
Use obj.children instead.
var list = document.getElementById('list');
var list_items = list.children;
alert(list_items.length);
The difference between this children and childNodes, is that childNodes contain all nodes, including text nodes and comment nodes, while children only contain element nodes.
from w3schools.

TEXT_NODE: returns ONLY text?

I'm using JavaScript in order to extract all text from a DOM object. My algorithm goes over the DOM object itself and it's descendants, if the node is a TEXT_NODE type than accumulates it's nodeValue.
For some weird reason I also get things like:
#hdr-editions a { text-decoration:none; }
#cnn_hdr-editionS { text-align:left;clear:both; }
#cnn_hdr-editionS a { text-decoration:none;font-size:10px;top:7px;line-height:12px;font-weight:bold; }
#hdr-prompt-text b { display:inline-block;margin:0 0 0 20px; }
#hdr-editions li { padding:0 10px; }
How do I filter this? Do I need to use something else? I want ONLY text.
From the looks of things, you're also collecting the text from <style> elements. You might want to run a check for those:
var ignore = { "STYLE":0, "SCRIPT":0, "NOSCRIPT":0, "IFRAME":0, "OBJECT":0 }
if (element.tagName in ignore)
continue;
You can add any other elements to the object map to ignore them.
You want to skip over style elements.
In your loop, you could do this...
if (element.tagName == 'STYLE') {
continue;
}
You also probably want to skip over script, textarea, etc.
This is text as far as the DOM is concerned. You'll have to filter out (skip) <script> and <style> tags.
[Answer added after reading OP's comments to Andy's excellent answer]
The problem is that you see the text nodes inside elements whose content is normally not rendered by browsers - such as STYLE and SCRIPT tags.
When scan the DOM tree, using depth-first search I assume, your scan should skip over the content of such tags.
For example - a recursive depth-first DOM tree walker might look like this:
function walker(domObject, extractorCallback) {
if (domObject == null) return; // fail fast
extractorCallback(domObject);
if (domObject.nodeType != Node.ELEMENT_NODE) return;
var childs = domObject.childNodes;
for (var i = 0; i < childs.length; i++)
walker(childs[i]);
}
var textvalue = "":
walker(document, function(node) {
if (node.nodeType == Node.TEXT_NODE)
textvalue += node.nodeValue;
});
In such a case, if your walker encounters tags that you know you won't like to see their content, you should just skip going into that part of the tree. So walker() will have to be adapted as thus:
var ignore = { "STYLE":0, "SCRIPT":0, "NOSCRIPT":0, "IFRAME":0, "OBJECT":0 }
function walker(domObject, extractorCallback) {
if (domObject == null) return; // fail fast
extractorCallback(domObject);
if (domObject.nodeType != Node.ELEMENT_NODE) return;
if (domObject.tagName in ignore) return; // <--- HERE
var childs = domObject.childNodes;
for (var i = 0; i < childs.length; i++)
walker(childs[i]);
}
That way, if we see a tag that you don't like, we simply skip it and all its children, and your extractor will never be exposed to the text nodes inside such tags.

How do I select text nodes with jQuery?

I would like to get all descendant text nodes of an element, as a jQuery collection. What is the best way to do that?
jQuery doesn't have a convenient function for this. You need to combine contents(), which will give just child nodes but includes text nodes, with find(), which gives all descendant elements but no text nodes. Here's what I've come up with:
var getTextNodesIn = function(el) {
return $(el).find(":not(iframe)").addBack().contents().filter(function() {
return this.nodeType == 3;
});
};
getTextNodesIn(el);
Note: If you're using jQuery 1.7 or earlier, the code above will not work. To fix this, replace addBack() with andSelf(). andSelf() is deprecated in favour of addBack() from 1.8 onwards.
This is somewhat inefficient compared to pure DOM methods and has to include an ugly workaround for jQuery's overloading of its contents() function (thanks to #rabidsnail in the comments for pointing that out), so here is non-jQuery solution using a simple recursive function. The includeWhitespaceNodes parameter controls whether or not whitespace text nodes are included in the output (in jQuery they are automatically filtered out).
Update: Fixed bug when includeWhitespaceNodes is falsy.
function getTextNodesIn(node, includeWhitespaceNodes) {
var textNodes = [], nonWhitespaceMatcher = /\S/;
function getTextNodes(node) {
if (node.nodeType == 3) {
if (includeWhitespaceNodes || nonWhitespaceMatcher.test(node.nodeValue)) {
textNodes.push(node);
}
} else {
for (var i = 0, len = node.childNodes.length; i < len; ++i) {
getTextNodes(node.childNodes[i]);
}
}
}
getTextNodes(node);
return textNodes;
}
getTextNodesIn(el);
Jauco posted a good solution in a comment, so I'm copying it here:
$(elem)
.contents()
.filter(function() {
return this.nodeType === 3; //Node.TEXT_NODE
});
$('body').find('*').contents().filter(function () { return this.nodeType === 3; });
jQuery.contents() can be used with jQuery.filter to find all child text nodes. With a little twist, you can find grandchildren text nodes as well. No recursion required:
$(function() {
var $textNodes = $("#test, #test *").contents().filter(function() {
return this.nodeType === Node.TEXT_NODE;
});
/*
* for testing
*/
$textNodes.each(function() {
console.log(this);
});
});
div { margin-left: 1em; }
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
<div id="test">
child text 1<br>
child text 2
<div>
grandchild text 1
<div>grand-grandchild text 1</div>
grandchild text 2
</div>
child text 3<br>
child text 4
</div>
jsFiddle
I was getting a lot of empty text nodes with the accepted filter function. If you're only interested in selecting text nodes that contain non-whitespace, try adding a nodeValue conditional to your filter function, like a simple $.trim(this.nodevalue) !== '':
$('element')
.contents()
.filter(function(){
return this.nodeType === 3 && $.trim(this.nodeValue) !== '';
});
http://jsfiddle.net/ptp6m97v/
Or to avoid strange situations where the content looks like whitespace, but is not (e.g. the soft hyphen ­ character, newlines \n, tabs, etc.), you can try using a Regular Expression. For example, \S will match any non-whitespace characters:
$('element')
.contents()
.filter(function(){
return this.nodeType === 3 && /\S/.test(this.nodeValue);
});
If you can make the assumption that all children are either Element Nodes or Text Nodes, then this is one solution.
To get all child text nodes as a jquery collection:
$('selector').clone().children().remove().end().contents();
To get a copy of the original element with non-text children removed:
$('selector').clone().children().remove().end();
For some reason contents() didn't work for me, so if it didn't work for you, here's a solution I made, I created jQuery.fn.descendants with the option to include text nodes or not
Usage
Get all descendants including text nodes and element nodes
jQuery('body').descendants('all');
Get all descendants returning only text nodes
jQuery('body').descendants(true);
Get all descendants returning only element nodes
jQuery('body').descendants();
Coffeescript Original:
jQuery.fn.descendants = ( textNodes ) ->
# if textNodes is 'all' then textNodes and elementNodes are allowed
# if textNodes if true then only textNodes will be returned
# if textNodes is not provided as an argument then only element nodes
# will be returned
allowedTypes = if textNodes is 'all' then [1,3] else if textNodes then [3] else [1]
# nodes we find
nodes = []
dig = (node) ->
# loop through children
for child in node.childNodes
# push child to collection if has allowed type
nodes.push(child) if child.nodeType in allowedTypes
# dig through child if has children
dig child if child.childNodes.length
# loop and dig through nodes in the current
# jQuery object
dig node for node in this
# wrap with jQuery
return jQuery(nodes)
Drop In Javascript Version
var __indexOf=[].indexOf||function(e){for(var t=0,n=this.length;t<n;t++){if(t in this&&this[t]===e)return t}return-1}; /* indexOf polyfill ends here*/ jQuery.fn.descendants=function(e){var t,n,r,i,s,o;t=e==="all"?[1,3]:e?[3]:[1];i=[];n=function(e){var r,s,o,u,a,f;u=e.childNodes;f=[];for(s=0,o=u.length;s<o;s++){r=u[s];if(a=r.nodeType,__indexOf.call(t,a)>=0){i.push(r)}if(r.childNodes.length){f.push(n(r))}else{f.push(void 0)}}return f};for(s=0,o=this.length;s<o;s++){r=this[s];n(r)}return jQuery(i)}
Unminified Javascript version: http://pastebin.com/cX3jMfuD
This is cross browser, a small Array.indexOf polyfill is included in the code.
Can also be done like this:
var textContents = $(document.getElementById("ElementId").childNodes).filter(function(){
return this.nodeType == 3;
});
The above code filters the textNodes from direct children child nodes of a given element.
if you want to strip all tags, then try this
function:
String.prototype.stripTags=function(){
var rtag=/<.*?[^>]>/g;
return this.replace(rtag,'');
}
usage:
var newText=$('selector').html().stripTags();
For me, plain old .contents() appeared to work to return the text nodes, just have to be careful with your selectors so that you know they will be text nodes.
For example, this wrapped all the text content of the TDs in my table with pre tags and had no problems.
jQuery("#resultTable td").content().wrap("<pre/>")
I had the same problem and solved it with:
Code:
$.fn.nextNode = function(){
var contents = $(this).parent().contents();
return contents.get(contents.index(this)+1);
}
Usage:
$('#my_id').nextNode();
Is like next() but also returns the text nodes.
This gets the job done regardless of the tag names. Select your parent.
It gives an array of strings with no duplications for parents and their children.
$('parent')
.find(":not(iframe)")
.addBack()
.contents()
.filter(function() {return this.nodeType == 3;})
//.map((i,v) => $(v).text()) // uncomment if you want strings

Categories