Extract text of node, ignoring childNodes

Extract text of node, ignoring childNodes - javascript

Given a simple structure like this:
<td><span>Text1</span></td>
<td><span>Text2</span></td>
<td><span>Text3</span></td>
<td><span><a href='#'>Link</a>Text4</span></td>
I am trying to extract all Text1-4 with Javascript, w/out any child nodes.
Loop for the cols
...
x = rows[i].getElementsByTagName("TD")[n].getElementsByTagName('span')[0];
...
Output for each x
Test1
Test2
Test3
<a href='#'>Link</a>Text4
Is there a simple way to "ignore" the html tags of an element?
Edit
I tried this
if(x.hasChildNodes()){
x = rows[i].getElementsByTagName("TD")[n].getElementsByTagName('span')[0].getElementsByTagName('a')[0];
}
but that gives me Link obviously

use .innerText which will get text and all nested text for you, and replace div with td for your example, I just added it to show result
const tags = document.querySelectorAll('div > span');
tags.forEach(tag => {
const nodes = tag.childNodes;
nodes.forEach(node => {
if(node.nodeType === node.TEXT_NODE) {
console.log(node.nodeValue);
}
});
});
<div><span>Text1</span></div>
<div><span>Text2</span></div>
<div><span>Text3</span></div>
<div><span><a href='#'>Link</a>Text4</span></div>

Related

How does Javascript filter out HTML tags while selecting words using regular expressions?

Notice：
I'm not parsing HTML with regex,
here I only use it for plain text.
It's just that it goes beyond plain text and affects other html tags
Why does everyone say I should use DOM instead of regular expressions?
DOM obviously cannot select all words on a web page based on an array of words.
before I used document.createTreeWalker() to filter all text labels, it was too complicated and caused more errors.
So I want to do it with simple regex instead. Do you have a better way?
I think just 'filter out all text inside "<>"' with very simple regex syntax wouldn't it work? Why make it so complicated?
I need to select the words from the page based on an array of words, and wrap the words around 'span' tags (keeping the original HTML tags).
The problem with my code is that it replaces the attribute values of the HTML tag as well.
I need regular expressions to filter out HTML tags and select words.
I added a condition to the regular expression :(^<.*>), but it didn't work and broke my code.
How to do?
My code:
code Error: The <div id="text"> should not be wrapped around the SPAN tag
<!DOCTYPE html>
<html>
<head>
<style>span{background:#ccc;}</style>
<script>
//wrap span tags for all words
function add_span(word_array, element_) {
for (let i = 0; i < word_array.length; i++) {
var reg_str = "([\\s.?,\"\';:!()\\[\\]{}<>\/])"; // + "^(<.*>)"
var reg = new RegExp(reg_str + "(" + word_array[i] + ")" + reg_str, 'g');
element_ = element_.replace(reg, '$1<span>$2</span>$3');
}
return element_;
}
window.onload = function(){
console.log(document.body.innerText);
// word array
var word_array = ['is', 'test', 'testis', 'istest', 'text']
var text_html = add_span(word_array, document.body.innerHTML);
document.body.innerHTML = text_html;
console.log(text_html);
}
</script>
</head>
<body>
<div id="text"><!--Error: The class attribute value here should not be wrapped around the SPAN tag-->
is test testis istest,
is[test]testis{istest}testis(istest)testis istest
</div>
</body></html>

I had fun with this one and learned a few things too. You could replace the traversal implementation with TreeWalker if you'd like. I added a nested div#text2 to demonstrate how it works with arbitrary tree depth. I tried to keep the same general approach you were using, but needed to make some modifications to the regex and add tree traversal. Hope this helps!
function traverse(tree) {
const queue = [tree];
while (queue.length) {
const node = queue.shift();
if (node.nodeType === Node.TEXT_NODE) {
const textContent = node.textContent.trim();
if (textContent) {
const textContentWithSpans = textContent
.replaceAll(/\b(is|test|testis|istest|text)\b/g, '<span>$&</span>');
const template = document.createElement('template');
template.innerHTML = textContentWithSpans;
const fragment = template.content;
node.parentNode.replaceChild(fragment, node);
}
}
for (let child of node.childNodes) {
queue.push(child);
}
}
}
traverse(document.getElementById('demo-wrapper'));
<div id="demo-wrapper">
<div id="text">
is test testis istest,
is[test]testis{istest}testis(istest)testis istest
<div id="text2">
foo bar test istest
</div>
</div>
</div>

jQuery .not() function not working within .parent() function [duplicate]

If I have html like this:
<li id="listItem">
This is some text
<span id="firstSpan">First span text</span>
<span id="secondSpan">Second span text</span>
</li>
I'm trying to use .text() to retrieve just the string "This is some text", but if I were to say $('#list-item').text(), I get "This is some textFirst span textSecond span text".
Is there a way to get (and possibly remove, via something like .text("")) just the free text within a tag, and not the text within its child tags?
The HTML was not written by me, so this is what I have to work with. I know that it would be simple to just wrap the text in tags when writing the html, but again, the html is pre-written.

I liked this reusable implementation based on the clone() method found here to get only the text inside the parent element.
Code provided for easy reference:
$("#foo")
.clone() //clone the element
.children() //select all the children
.remove() //remove all the children
.end() //again go back to selected element
.text();

Simple answer:
$("#listItem").contents().filter(function(){
return this.nodeType == 3;
})[0].nodeValue = "The text you want to replace with"

This seems like a case of overusing jquery to me. The following will grab the text ignoring the other nodes:
document.getElementById("listItem").childNodes[0];
You'll need to trim that but it gets you what you want in one, easy line.
EDIT
The above will get the text node. To get the actual text, use this:
document.getElementById("listItem").childNodes[0].nodeValue;

Easier and quicker:
$("#listItem").contents().get(0).nodeValue

Similar to the accepted answer, but without cloning:
$("#foo").contents().not($("#foo").children()).text();
And here is a jQuery plugin for this purpose:
$.fn.immediateText = function() {
return this.contents().not(this.children()).text();
};
Here is how to use this plugin:
$("#foo").immediateText(); // get the text without children

isn't the code:
var text = $('#listItem').clone().children().remove().end().text();
just becoming jQuery for jQuery's sake? When simple operations involve that many chained commands & that much (unnecessary) processing, perhaps it is time to write a jQuery extension:
(function ($) {
function elementText(el, separator) {
var textContents = [];
for(var chld = el.firstChild; chld; chld = chld.nextSibling) {
if (chld.nodeType == 3) {
textContents.push(chld.nodeValue);
}
}
return textContents.join(separator);
}
$.fn.textNotChild = function(elementSeparator, nodeSeparator) {
if (arguments.length<2){nodeSeparator="";}
if (arguments.length<1){elementSeparator="";}
return $.map(this, function(el){
return elementText(el,nodeSeparator);
}).join(elementSeparator);
}
} (jQuery));
to call:
var text = $('#listItem').textNotChild();
the arguments are in case a different scenario is encountered, such as
<li>some text<a>more text</a>again more</li>
<li>second text<a>more text</a>again more</li>
var text = $("li").textNotChild(".....","<break>");
text will have value:
some text<break>again more.....second text<break>again more

Try this:
$('#listItem').not($('#listItem').children()).text()

It'll need to be something tailored to the needs, which are dependent on the structure you're presented with. For the example you've provided, this works:
$(document).ready(function(){
var $tmp = $('#listItem').children().remove();
$('#listItem').text('').append($tmp);
});
Demo: http://jquery.nodnod.net/cases/2385/run
But it's fairly dependent on the markup being similar to what you posted.

$($('#listItem').contents()[0]).text()
Short variant of Stuart answer.
or with get()
$($('#listItem').contents().get(0)).text()

I presume this would be a fine solution also - if you want to get contents of all text nodes that are direct children of selected element.
$(selector).contents().filter(function(){ return this.nodeType == 3; }).text();
Note: jQuery documentation uses similar code to explain contents function: https://api.jquery.com/contents/
P.S. There's also a bit uglier way to do that, but this shows more in depth how things work, and allows for custom separator between text nodes (maybe you want a line break there)
$(selector).contents().filter(function(){ return this.nodeType == 3; }).map(function() { return this.nodeValue; }).toArray().join("");

jQuery.fn.ownText = function () {
return $(this).contents().filter(function () {
return this.nodeType === Node.TEXT_NODE;
}).text();
};

If the position index of the text node is fixed among its siblings, you can use
$('parentselector').contents().eq(index).text()

This is an old question but the top answer is very inefficient. Here's a better solution:
$.fn.myText = function() {
var str = '';
this.contents().each(function() {
if (this.nodeType == 3) {
str += this.textContent || this.innerText || '';
}
});
return str;
};
And just do this:
$("#foo").myText();

I propose to use the createTreeWalker to find all texts elements not attached to html elements (this function can be used to extend jQuery):
function textNodesOnlyUnder(el) {
var resultSet = [];
var n = null;
var treeWalker = document.createTreeWalker(el, NodeFilter.SHOW_TEXT, function (node) {
if (node.parentNode.id == el.id && node.textContent.trim().length != 0) {
return NodeFilter.FILTER_ACCEPT;
}
return NodeFilter.FILTER_SKIP;
}, false);
while (n = treeWalker.nextNode()) {
resultSet.push(n);
}
return resultSet;
}
window.onload = function() {
var ele = document.getElementById('listItem');
var textNodesOnly = textNodesOnlyUnder(ele);
var resultingText = textNodesOnly.map(function(val, index, arr) {
return 'Text element N. ' + index + ' --> ' + val.textContent.trim();
}).join('\n');
document.getElementById('txtArea').value = resultingText;
}
<li id="listItem">
This is some text
<span id="firstSpan">First span text</span>
<span id="secondSpan">Second span text</span>
</li>
<textarea id="txtArea" style="width: 400px;height: 200px;"></textarea>

I wouldn't bother with jQuery for this, especially not the solutions that make unnecessary clones of the elements. A simple loop grabbing text nodes is all you need. In modern JavaScript (as of this writing — "modern" is a moving target!) and trimming whitespace from the beginning and end of the result:
const { childNodes } = document.getElementById("listItem");
let text = "";
for (const node of childNodes) {
if (node.nodeType === Node.TEXT_NODE) {
text += node.nodeValue;
}
}
text = text.trim();
Live Example:
const { childNodes } = document.getElementById("listItem");
let text = "";
for (const node of childNodes) {
if (node.nodeType === Node.TEXT_NODE) {
text += node.nodeValue;
}
}
console.log(text);
<li id="listItem">
This is some text
<span id="firstSpan">First span text</span>
<span id="secondSpan">Second span text</span>
</li>
Some people would use reduce for this. I'm not a fan, I think a simple loop is clearer, but this usage does update the accumulator on each iteration, so it's not actually abusing reduce:
const { childNodes } = document.getElementById("listItem");
const text = [...childNodes].reduce((text, node) =>
node.nodeType === Node.TEXT_NODE ? text + node.nodeValue : text
, "").trim();
const { childNodes } = document.getElementById("listItem");
const text = [...childNodes].reduce((text, node) =>
node.nodeType === Node.TEXT_NODE ? text + node.nodeValue : text
, "").trim();
console.log(text);
<li id="listItem">
This is some text
<span id="firstSpan">First span text</span>
<span id="secondSpan">Second span text</span>
</li>
Or without creating a temporary array:
const { childNodes } = document.getElementById("listItem");
const text = Array.prototype.reduce.call(childNodes, (text, node) =>
node.nodeType === Node.TEXT_NODE ? text + node.nodeValue : text
, "").trim();
const { childNodes } = document.getElementById("listItem");
const text = Array.prototype.reduce.call(childNodes, (text, node) =>
node.nodeType === Node.TEXT_NODE ? text + node.nodeValue : text
, "").trim();
console.log(text);
<li id="listItem">
This is some text
<span id="firstSpan">First span text</span>
<span id="secondSpan">Second span text</span>
</li>

Using plain JavaScript in IE 9+ compatible syntax in just a few lines:
const childNodes = document.querySelector('#listItem').childNodes;
if (childNodes.length > 0) {
childNodesLoop:
for (let i = 0; i < childNodes.length; i++) {
//only target text nodes (nodeType of 3)
if (childNodes[i].nodeType === 3) {
//do not target any whitespace in the HTML
if (childNodes[i].nodeValue.trim().length > 0) {
childNodes[i].nodeValue = 'Replacement text';
//optimized to break out of the loop once primary text node found
break childNodesLoop;
}
}
}
}

Get all text in an element without text in any child elements still seems non trivial to do in 2022.
No jQuery needed though.
To get all raw textNode(s) content:
const getElementTextWithoutChildElements = (el) =>
Array.from(el.childNodes) // iterator to array
.filter(node => node.nodeType === 3) // only text nodes
.map(node => node.textContent) // get text
.join('') // stick together
;
Or similar, using reduce:
const getElementTextWithoutChildElements = (el) =>
[].reduce.call(
el.childNodes,
(a, b) => a + (b.nodeType === 3 ? b.textContent : ''),
''
);
Should work with this:
<div>
you get this
<b>not this</b>
you get this too
</div>
will return:
you get this
you get this too
Whitespace between elements could be tricky, suggest using with .trim() and/or normalize all whitespace, e.g.
For debugging and logging to quickly identify elements I find this is usually enough:
getElementTextWithoutChildElements(...).replace(/\s+/g, ' ').trim();
// 'you get this you get this too'
Though you might want to tweak whitespace differently, perhaps within the reduce() function itself to handle whitespace per node.
e.g. whitespace handling per node:
const getElementTextWithoutChildElements_2 = (el) =>
Array.from(el.childNodes)
.filter(node => node.nodeType === 3)
.map(node => node.textContent.trim()) // added .trim()
.join(',') // added ','
;
Quick tests for things above:
document.body.innerHTML = `
you get this
<b>not this</b>
you get this too
`;
// '\n you get this\n <b>not this</b>\n you get this too\n'
getElementTextWithoutChildElements(document.body);
// '\n you get this\n \n you get this too\n'
getElementTextWithoutChildElements(document.body).replace(/\s+/g, ' ').trim();
// 'you get this you get this too'
getElementTextWithoutChildElements_2(document.body);
// 'you get this,you get this too'

This is a good way for me
var text = $('#listItem').clone().children().remove().end().text();

I came up with a specific solution that should be much more efficient than the cloning and modifying of the clone. This solution only works with the following two reservations, but should be more efficient than the currently accepted solution:
You are getting only the text
The text you want to extract is before the child elements
With that said, here is the code:
// 'element' is a jQuery element
function getText(element) {
var text = element.text();
var childLength = element.children().text().length;
return text.slice(0, text.length - childLength);
}

Live demo
<li id="listItem">
This is some text
<span id="firstSpan">First span text</span>
<span id="secondSpan">Second span text</span>
</li>
<input id="input" style="width: 300px; margin-top: 10px;">
<script type="text/javascript">
$("#input").val($("#listItem").clone().find("span").remove().end().text().trim());
//use .trim() to remove any white space
</script>

For beginners:
I preferred #DUzun's answer because it's simple to understand and more efficient than the accepted answer. But it only partially worked for me as you can't directly pass the element with a class selector like this
$(".landing-center .articlelanding_detail").get(0).immediateText() //gives .immediateText is not a function error
or this
$(".landing-center .articlelanding_detail")[0].immediateText() //gives .immediateText is not a function error
because once you extract the native Element by using [index] or .get(index) out of the $() function you loose jQuery Object methods chainability as mentioned here. And most of the solutions are only in context to ids, not so elegant to use multiple times for the elements with a class selectors.
So, I wrote jQuery plugin:
$.fn.mainText = function(x=0) {
return $.trim(this.eq(x).contents().not(this.eq(x).children()).text().replace(/[\t\n]+/g,' '));
};
This will return the text of the element irrespective of if ids or class are used as selectors excluding child elements. Also will remove any \t or \n to get a clean string.
Use it like this:
Case 1
$("#example").mainText(); // get the text of element with example id
Case 2
$(".example").mainText(); // get the text of first element with example class
Case 3
$(".example").mainText(1); // get the text of second element with example class and so on..

Alternative version of the answere without JQuery
[...document.getElementById("listItem").childNodes].find(c => c.nodeType === Node.TEXT_NODE).nodeValue

Just like the question, I was trying to extract text in order to do some regex substitution of the text but was getting problems where my inner elements (ie: <i>, <div>, <span>, etc.) were getting also removed.
The following code seems to work well and solved all my problems.
It uses some of the answers provided here but in particular, will only substitute the text when the element is of nodeType === 3.
$(el).contents().each(function() {
console.log(" > Content: %s [%s]", this, (this.nodeType === 3));
if (this.nodeType === 3) {
var text = this.textContent;
console.log(" > Old : '%s'", text);
regex = new RegExp("\\[\\[" + rule + "\\.val\\]\\]", "g");
text = text.replace(regex, value);
regex = new RegExp("\\[\\[" + rule + "\\.act\\]\\]", "g");
text = text.replace(regex, actual);
console.log(" > New : '%s'", text);
this.textContent = text;
}
});
What the above does is loop through all the elements of the given el (which was simply obtained with $("div.my-class[name='some-name']");. For each inner element, it basically ignores them. For each portion of text (as determined by if (this.nodeType === 3)) it will apply the regex substitution only to those elements.
The this.textContent = text portion simply replaces the substituted text, which in my case, I was looking for tokens like [[min.val]], [[max.val]], etc.
This short code excerpt will help anyone trying to do what the question was asking ... and a bit more.

Not sure how flexible or how many cases you need it to cover, but for your example, if the text always comes before the first HTML tags – why not just split the inner html at the first tag and take the former:
$('#listItem').html().split('<span')[0];
and if you need it wider maybe just
$('#listItem').html().split('<')[0];
and if you need the text between two markers, like after one thing but before another, you can do something like (untested) and use if statements to make it flexible enough to have a start or end marker or both, while avoiding null ref errors:
var startMarker = '';// put any starting marker here
var endMarker = '<';// put the end marker here
var myText = String( $('#listItem').html() );
// if the start marker is found, take the string after it
myText = myText.split(startMarker)[1];
// if the end marker is found, take the string before it
myText = myText.split(endMarker)[0];
console.log(myText); // output text between the first occurrence of the markers, assuming both markers exist. If they don't this will throw an error, so some if statements to check params is probably in order...
I generally make utility functions for useful things like this, make them error free, and then rely on them frequently once solid, rather than always rewriting this type of string manipulation and risking null references etc. That way, you can re-use the function in lots of projects and never have to waste time on it again debugging why a string reference has an undefined reference error. Might not be the shortest 1 line code ever, but after you have the utility function, it is one line from then on. Note most of the code is just handling parameters being there or not to avoid errors :)
For example:
/**
* Get the text between two string markers.
**/
function textBetween(__string,__startMark,__endMark){
var hasText = typeof __string !== 'undefined' && __string.length > 0;
if(!hasText) return __string;
var myText = String( __string );
var hasStartMarker = typeof __startMark !== 'undefined' && __startMark.length > 0 && __string.indexOf(__startMark)>=0;
var hasEndMarker = typeof __endMark !== 'undefined' && __endMark.length > 0 && __string.indexOf(__endMark) > 0;
if( hasStartMarker ) myText = myText.split(__startMark)[1];
if( hasEndMarker ) myText = myText.split(__endMark)[0];
return myText;
}
// now with 1 line from now on, and no jquery needed really, but to use your example:
var textWithNoHTML = textBetween( $('#listItem').html(), '', '<'); // should return text before first child HTML tag if the text is on page (use document ready etc)

Use an extra condition to check if innerHTML and innerText are the same. Only in those cases, replace the text.
$(function() {
$('body *').each(function () {
console.log($(this).html());
console.log($(this).text());
if($(this).text() === "Search" && $(this).html()===$(this).text()) {
$(this).html("Find");
}
})
})
http://jsfiddle.net/7RSGh/

To be able to trim the result, use DotNetWala's like so:
$("#foo")
.clone() //clone the element
.children() //select all the children
.remove() //remove all the children
.end() //again go back to selected element
.text()
.trim();
I found out that using the shorter version like document.getElementById("listItem").childNodes[0] won't work with jQuery's trim().

just put it in a <p> or <font> and grab that $('#listItem font').text()
First thing that came to mind
<li id="listItem">
<font>This is some text</font>
<span id="firstSpan">First span text</span>
<span id="secondSpan">Second span text</span>
</li>

You can try this
alert(document.getElementById('listItem').firstChild.data)

I am not a jquery expert, but how about,
$('#listItem').children().first().text()

This untested, but I think you may be able to try something like this:
$('#listItem').not('span').text();
http://api.jquery.com/not/

How to traverse just the first level of <LI> elements `with no class tag` [duplicate]

If I have html like this:
<li id="listItem">
This is some text
<span id="firstSpan">First span text</span>
<span id="secondSpan">Second span text</span>
</li>
I'm trying to use .text() to retrieve just the string "This is some text", but if I were to say $('#list-item').text(), I get "This is some textFirst span textSecond span text".
Is there a way to get (and possibly remove, via something like .text("")) just the free text within a tag, and not the text within its child tags?
The HTML was not written by me, so this is what I have to work with. I know that it would be simple to just wrap the text in tags when writing the html, but again, the html is pre-written.

I liked this reusable implementation based on the clone() method found here to get only the text inside the parent element.
Code provided for easy reference:
$("#foo")
.clone() //clone the element
.children() //select all the children
.remove() //remove all the children
.end() //again go back to selected element
.text();

Simple answer:
$("#listItem").contents().filter(function(){
return this.nodeType == 3;
})[0].nodeValue = "The text you want to replace with"

This seems like a case of overusing jquery to me. The following will grab the text ignoring the other nodes:
document.getElementById("listItem").childNodes[0];
You'll need to trim that but it gets you what you want in one, easy line.
EDIT
The above will get the text node. To get the actual text, use this:
document.getElementById("listItem").childNodes[0].nodeValue;

Easier and quicker:
$("#listItem").contents().get(0).nodeValue

Similar to the accepted answer, but without cloning:
$("#foo").contents().not($("#foo").children()).text();
And here is a jQuery plugin for this purpose:
$.fn.immediateText = function() {
return this.contents().not(this.children()).text();
};
Here is how to use this plugin:
$("#foo").immediateText(); // get the text without children

isn't the code:
var text = $('#listItem').clone().children().remove().end().text();
just becoming jQuery for jQuery's sake? When simple operations involve that many chained commands & that much (unnecessary) processing, perhaps it is time to write a jQuery extension:
(function ($) {
function elementText(el, separator) {
var textContents = [];
for(var chld = el.firstChild; chld; chld = chld.nextSibling) {
if (chld.nodeType == 3) {
textContents.push(chld.nodeValue);
}
}
return textContents.join(separator);
}
$.fn.textNotChild = function(elementSeparator, nodeSeparator) {
if (arguments.length<2){nodeSeparator="";}
if (arguments.length<1){elementSeparator="";}
return $.map(this, function(el){
return elementText(el,nodeSeparator);
}).join(elementSeparator);
}
} (jQuery));
to call:
var text = $('#listItem').textNotChild();
the arguments are in case a different scenario is encountered, such as
<li>some text<a>more text</a>again more</li>
<li>second text<a>more text</a>again more</li>
var text = $("li").textNotChild(".....","<break>");
text will have value:
some text<break>again more.....second text<break>again more

Try this:
$('#listItem').not($('#listItem').children()).text()

It'll need to be something tailored to the needs, which are dependent on the structure you're presented with. For the example you've provided, this works:
$(document).ready(function(){
var $tmp = $('#listItem').children().remove();
$('#listItem').text('').append($tmp);
});
Demo: http://jquery.nodnod.net/cases/2385/run
But it's fairly dependent on the markup being similar to what you posted.

$($('#listItem').contents()[0]).text()
Short variant of Stuart answer.
or with get()
$($('#listItem').contents().get(0)).text()

I presume this would be a fine solution also - if you want to get contents of all text nodes that are direct children of selected element.
$(selector).contents().filter(function(){ return this.nodeType == 3; }).text();
Note: jQuery documentation uses similar code to explain contents function: https://api.jquery.com/contents/
P.S. There's also a bit uglier way to do that, but this shows more in depth how things work, and allows for custom separator between text nodes (maybe you want a line break there)
$(selector).contents().filter(function(){ return this.nodeType == 3; }).map(function() { return this.nodeValue; }).toArray().join("");

jQuery.fn.ownText = function () {
return $(this).contents().filter(function () {
return this.nodeType === Node.TEXT_NODE;
}).text();
};

If the position index of the text node is fixed among its siblings, you can use
$('parentselector').contents().eq(index).text()

This is an old question but the top answer is very inefficient. Here's a better solution:
$.fn.myText = function() {
var str = '';
this.contents().each(function() {
if (this.nodeType == 3) {
str += this.textContent || this.innerText || '';
}
});
return str;
};
And just do this:
$("#foo").myText();

I propose to use the createTreeWalker to find all texts elements not attached to html elements (this function can be used to extend jQuery):
function textNodesOnlyUnder(el) {
var resultSet = [];
var n = null;
var treeWalker = document.createTreeWalker(el, NodeFilter.SHOW_TEXT, function (node) {
if (node.parentNode.id == el.id && node.textContent.trim().length != 0) {
return NodeFilter.FILTER_ACCEPT;
}
return NodeFilter.FILTER_SKIP;
}, false);
while (n = treeWalker.nextNode()) {
resultSet.push(n);
}
return resultSet;
}
window.onload = function() {
var ele = document.getElementById('listItem');
var textNodesOnly = textNodesOnlyUnder(ele);
var resultingText = textNodesOnly.map(function(val, index, arr) {
return 'Text element N. ' + index + ' --> ' + val.textContent.trim();
}).join('\n');
document.getElementById('txtArea').value = resultingText;
}
<li id="listItem">
This is some text
<span id="firstSpan">First span text</span>
<span id="secondSpan">Second span text</span>
</li>
<textarea id="txtArea" style="width: 400px;height: 200px;"></textarea>

I wouldn't bother with jQuery for this, especially not the solutions that make unnecessary clones of the elements. A simple loop grabbing text nodes is all you need. In modern JavaScript (as of this writing — "modern" is a moving target!) and trimming whitespace from the beginning and end of the result:
const { childNodes } = document.getElementById("listItem");
let text = "";
for (const node of childNodes) {
if (node.nodeType === Node.TEXT_NODE) {
text += node.nodeValue;
}
}
text = text.trim();
Live Example:
const { childNodes } = document.getElementById("listItem");
let text = "";
for (const node of childNodes) {
if (node.nodeType === Node.TEXT_NODE) {
text += node.nodeValue;
}
}
console.log(text);
<li id="listItem">
This is some text
<span id="firstSpan">First span text</span>
<span id="secondSpan">Second span text</span>
</li>
Some people would use reduce for this. I'm not a fan, I think a simple loop is clearer, but this usage does update the accumulator on each iteration, so it's not actually abusing reduce:
const { childNodes } = document.getElementById("listItem");
const text = [...childNodes].reduce((text, node) =>
node.nodeType === Node.TEXT_NODE ? text + node.nodeValue : text
, "").trim();
const { childNodes } = document.getElementById("listItem");
const text = [...childNodes].reduce((text, node) =>
node.nodeType === Node.TEXT_NODE ? text + node.nodeValue : text
, "").trim();
console.log(text);
<li id="listItem">
This is some text
<span id="firstSpan">First span text</span>
<span id="secondSpan">Second span text</span>
</li>
Or without creating a temporary array:
const { childNodes } = document.getElementById("listItem");
const text = Array.prototype.reduce.call(childNodes, (text, node) =>
node.nodeType === Node.TEXT_NODE ? text + node.nodeValue : text
, "").trim();
const { childNodes } = document.getElementById("listItem");
const text = Array.prototype.reduce.call(childNodes, (text, node) =>
node.nodeType === Node.TEXT_NODE ? text + node.nodeValue : text
, "").trim();
console.log(text);
<li id="listItem">
This is some text
<span id="firstSpan">First span text</span>
<span id="secondSpan">Second span text</span>
</li>

Using plain JavaScript in IE 9+ compatible syntax in just a few lines:
const childNodes = document.querySelector('#listItem').childNodes;
if (childNodes.length > 0) {
childNodesLoop:
for (let i = 0; i < childNodes.length; i++) {
//only target text nodes (nodeType of 3)
if (childNodes[i].nodeType === 3) {
//do not target any whitespace in the HTML
if (childNodes[i].nodeValue.trim().length > 0) {
childNodes[i].nodeValue = 'Replacement text';
//optimized to break out of the loop once primary text node found
break childNodesLoop;
}
}
}
}

Get all text in an element without text in any child elements still seems non trivial to do in 2022.
No jQuery needed though.
To get all raw textNode(s) content:
const getElementTextWithoutChildElements = (el) =>
Array.from(el.childNodes) // iterator to array
.filter(node => node.nodeType === 3) // only text nodes
.map(node => node.textContent) // get text
.join('') // stick together
;
Or similar, using reduce:
const getElementTextWithoutChildElements = (el) =>
[].reduce.call(
el.childNodes,
(a, b) => a + (b.nodeType === 3 ? b.textContent : ''),
''
);
Should work with this:
<div>
you get this
<b>not this</b>
you get this too
</div>
will return:
you get this
you get this too
Whitespace between elements could be tricky, suggest using with .trim() and/or normalize all whitespace, e.g.
For debugging and logging to quickly identify elements I find this is usually enough:
getElementTextWithoutChildElements(...).replace(/\s+/g, ' ').trim();
// 'you get this you get this too'
Though you might want to tweak whitespace differently, perhaps within the reduce() function itself to handle whitespace per node.
e.g. whitespace handling per node:
const getElementTextWithoutChildElements_2 = (el) =>
Array.from(el.childNodes)
.filter(node => node.nodeType === 3)
.map(node => node.textContent.trim()) // added .trim()
.join(',') // added ','
;
Quick tests for things above:
document.body.innerHTML = `
you get this
<b>not this</b>
you get this too
`;
// '\n you get this\n <b>not this</b>\n you get this too\n'
getElementTextWithoutChildElements(document.body);
// '\n you get this\n \n you get this too\n'
getElementTextWithoutChildElements(document.body).replace(/\s+/g, ' ').trim();
// 'you get this you get this too'
getElementTextWithoutChildElements_2(document.body);
// 'you get this,you get this too'

This is a good way for me
var text = $('#listItem').clone().children().remove().end().text();

I came up with a specific solution that should be much more efficient than the cloning and modifying of the clone. This solution only works with the following two reservations, but should be more efficient than the currently accepted solution:
You are getting only the text
The text you want to extract is before the child elements
With that said, here is the code:
// 'element' is a jQuery element
function getText(element) {
var text = element.text();
var childLength = element.children().text().length;
return text.slice(0, text.length - childLength);
}

Live demo
<li id="listItem">
This is some text
<span id="firstSpan">First span text</span>
<span id="secondSpan">Second span text</span>
</li>
<input id="input" style="width: 300px; margin-top: 10px;">
<script type="text/javascript">
$("#input").val($("#listItem").clone().find("span").remove().end().text().trim());
//use .trim() to remove any white space
</script>

For beginners:
I preferred #DUzun's answer because it's simple to understand and more efficient than the accepted answer. But it only partially worked for me as you can't directly pass the element with a class selector like this
$(".landing-center .articlelanding_detail").get(0).immediateText() //gives .immediateText is not a function error
or this
$(".landing-center .articlelanding_detail")[0].immediateText() //gives .immediateText is not a function error
because once you extract the native Element by using [index] or .get(index) out of the $() function you loose jQuery Object methods chainability as mentioned here. And most of the solutions are only in context to ids, not so elegant to use multiple times for the elements with a class selectors.
So, I wrote jQuery plugin:
$.fn.mainText = function(x=0) {
return $.trim(this.eq(x).contents().not(this.eq(x).children()).text().replace(/[\t\n]+/g,' '));
};
This will return the text of the element irrespective of if ids or class are used as selectors excluding child elements. Also will remove any \t or \n to get a clean string.
Use it like this:
Case 1
$("#example").mainText(); // get the text of element with example id
Case 2
$(".example").mainText(); // get the text of first element with example class
Case 3
$(".example").mainText(1); // get the text of second element with example class and so on..

Alternative version of the answere without JQuery
[...document.getElementById("listItem").childNodes].find(c => c.nodeType === Node.TEXT_NODE).nodeValue

Just like the question, I was trying to extract text in order to do some regex substitution of the text but was getting problems where my inner elements (ie: <i>, <div>, <span>, etc.) were getting also removed.
The following code seems to work well and solved all my problems.
It uses some of the answers provided here but in particular, will only substitute the text when the element is of nodeType === 3.
$(el).contents().each(function() {
console.log(" > Content: %s [%s]", this, (this.nodeType === 3));
if (this.nodeType === 3) {
var text = this.textContent;
console.log(" > Old : '%s'", text);
regex = new RegExp("\\[\\[" + rule + "\\.val\\]\\]", "g");
text = text.replace(regex, value);
regex = new RegExp("\\[\\[" + rule + "\\.act\\]\\]", "g");
text = text.replace(regex, actual);
console.log(" > New : '%s'", text);
this.textContent = text;
}
});
What the above does is loop through all the elements of the given el (which was simply obtained with $("div.my-class[name='some-name']");. For each inner element, it basically ignores them. For each portion of text (as determined by if (this.nodeType === 3)) it will apply the regex substitution only to those elements.
The this.textContent = text portion simply replaces the substituted text, which in my case, I was looking for tokens like [[min.val]], [[max.val]], etc.
This short code excerpt will help anyone trying to do what the question was asking ... and a bit more.

Not sure how flexible or how many cases you need it to cover, but for your example, if the text always comes before the first HTML tags – why not just split the inner html at the first tag and take the former:
$('#listItem').html().split('<span')[0];
and if you need it wider maybe just
$('#listItem').html().split('<')[0];
and if you need the text between two markers, like after one thing but before another, you can do something like (untested) and use if statements to make it flexible enough to have a start or end marker or both, while avoiding null ref errors:
var startMarker = '';// put any starting marker here
var endMarker = '<';// put the end marker here
var myText = String( $('#listItem').html() );
// if the start marker is found, take the string after it
myText = myText.split(startMarker)[1];
// if the end marker is found, take the string before it
myText = myText.split(endMarker)[0];
console.log(myText); // output text between the first occurrence of the markers, assuming both markers exist. If they don't this will throw an error, so some if statements to check params is probably in order...
I generally make utility functions for useful things like this, make them error free, and then rely on them frequently once solid, rather than always rewriting this type of string manipulation and risking null references etc. That way, you can re-use the function in lots of projects and never have to waste time on it again debugging why a string reference has an undefined reference error. Might not be the shortest 1 line code ever, but after you have the utility function, it is one line from then on. Note most of the code is just handling parameters being there or not to avoid errors :)
For example:
/**
* Get the text between two string markers.
**/
function textBetween(__string,__startMark,__endMark){
var hasText = typeof __string !== 'undefined' && __string.length > 0;
if(!hasText) return __string;
var myText = String( __string );
var hasStartMarker = typeof __startMark !== 'undefined' && __startMark.length > 0 && __string.indexOf(__startMark)>=0;
var hasEndMarker = typeof __endMark !== 'undefined' && __endMark.length > 0 && __string.indexOf(__endMark) > 0;
if( hasStartMarker ) myText = myText.split(__startMark)[1];
if( hasEndMarker ) myText = myText.split(__endMark)[0];
return myText;
}
// now with 1 line from now on, and no jquery needed really, but to use your example:
var textWithNoHTML = textBetween( $('#listItem').html(), '', '<'); // should return text before first child HTML tag if the text is on page (use document ready etc)

Use an extra condition to check if innerHTML and innerText are the same. Only in those cases, replace the text.
$(function() {
$('body *').each(function () {
console.log($(this).html());
console.log($(this).text());
if($(this).text() === "Search" && $(this).html()===$(this).text()) {
$(this).html("Find");
}
})
})
http://jsfiddle.net/7RSGh/

To be able to trim the result, use DotNetWala's like so:
$("#foo")
.clone() //clone the element
.children() //select all the children
.remove() //remove all the children
.end() //again go back to selected element
.text()
.trim();
I found out that using the shorter version like document.getElementById("listItem").childNodes[0] won't work with jQuery's trim().

just put it in a <p> or <font> and grab that $('#listItem font').text()
First thing that came to mind
<li id="listItem">
<font>This is some text</font>
<span id="firstSpan">First span text</span>
<span id="secondSpan">Second span text</span>
</li>

You can try this
alert(document.getElementById('listItem').firstChild.data)

I am not a jquery expert, but how about,
$('#listItem').children().first().text()

This untested, but I think you may be able to try something like this:
$('#listItem').not('span').text();
http://api.jquery.com/not/

Append multiple items in JavaScript

I have the following function and I am trying to figure out a better way to append multiple items using appendChild().
When the user clicks on Add, each item should look like this:
<li>
<input type="checkbox">
<label>Content typed by the user</label>
<input type="text">
<button class="edit">Edit</button>
<button class="delete">Delete</button>
</li>
and I have this function to add these elements:
function addNewItem(listElement, itemInput) {
var listItem = document.createElement("li");
var listItemCheckbox = document.createElement("input");
var listItemLabel = document.createElement("label");
var editableInput = document.createElement("input");
var editButton = document.createElement("button");
var deleteButton = document.createElement("button");
// define types
listItemCheckbox.type = "checkbox";
editableInput.type = "text";
// define content and class for buttons
editButton.innerText = "Edit";
editButton.className = "edit";
deleteButton.innerText = "Delete";
deleteButton.className = "delete";
listItemLabel.innerText = itemText.value;
// appendChild() - append these items to the li
listElement.appendChild(listItem);
listItem.appendChild(listItemCheckbox);
listItem.appendChild(listItemLabel);
listItem.appendChild(editButton);
listItem.appendChild(deleteButton);
if (itemText.value.length > 0) {
itemText.value = "";
inputFocus(itemText);
}
}
But you can notice that I am repeating three times the appendChild() for listItem. Is it possible to add multiple items to the appendChild() ?

You can do it with DocumentFragment.
var documentFragment = document.createDocumentFragment();
documentFragment.appendChild(listItem);
listItem.appendChild(listItemCheckbox);
listItem.appendChild(listItemLabel);
listItem.appendChild(editButton);
listItem.appendChild(deleteButton);
listElement.appendChild(documentFragment);
DocumentFragments allow developers to place child elements onto an
arbitrary node-like parent, allowing for node-like interactions
without a true root node. Doing so allows developers to produce
structure without doing so within the visible DOM

You can use the append method in JavaScript.
This is similar to jQuery's append method but it doesnot support IE and Edge.
You can change this code
listElement.appendChild(listItem);
listItem.appendChild(listItemCheckbox);
listItem.appendChild(listItemLabel);
listItem.appendChild(editButton);
listItem.appendChild(deleteButton);
to
listElement.append(listItem,listItemCheckbox,listItemLabel,editButton,deleteButton);

Personally, I don't see why you would do this.
But if you really need to replace all the appendChild() with one statement, you can assign the outerHTML of the created elements to the innerHTML of the li element.
You just need to replace the following:
listElement.appendChild(listItem);
listItem.appendChild(listItemCheckbox);
listItem.appendChild(listItemLabel);
listItem.appendChild(editButton);
listItem.appendChild(deleteButton);
With the following:
listItem.innerHTML+= listItemCheckbox.outerHTML + listItemLabel.outerHTML + editButton.outerHTML + deleteButton.outerHTML;
listElement.appendChild(listItem);
Explanation:
The outerHTML attribute of the element DOM interface gets the serialized HTML fragment describing the element including its descendants. So assigning the outerHTML of the created elements to the innerHTML of the li element is similar to appending them to it.

Merging the answers by #Atrahasis and #Slavik:
if (Node.prototype.appendChildren === undefined) {
Node.prototype.appendChildren = function() {
let children = [...arguments];
if (
children.length == 1 &&
Object.prototype.toString.call(children[0]) === "[object Array]"
) {
children = children[0];
}
const documentFragment = document.createDocumentFragment();
children.forEach(c => documentFragment.appendChild(c));
this.appendChild(documentFragment);
};
}
This accepts children as multiple arguments, or as a single array argument:
foo.appendChildren(bar1, bar2, bar3);
bar.appendChildren([bar1, bar2, bar3]);
Update – June 2020
Most all current browsers support append and the "spread operator" now.
The calls above can be re-written as:
foo.append(bar1, bar2, bar3);
bar.append(...[bar1, bar2, bar3]);

Let's try this:
let parentNode = document.createElement('div');
parentNode.append(...[
document.createElement('div'),
document.createElement('div'),
document.createElement('div'),
document.createElement('div'),
document.createElement('div')
]);
console.log(parentNode);

You need to append several children ? Just make it plural with appendChildren !
First things first :
HTMLLIElement.prototype.appendChildren = function () {
for ( var i = 0 ; i < arguments.length ; i++ )
this.appendChild( arguments[ i ] );
};
Then for any list element :
listElement.appendChildren( a, b, c, ... );
//check :
listElement.childNodes;//a, b, c, ...
Works with every element that has the appendChild method of course ! Like HTMLDivElement.

You can use createContextualFragment, it return a documentFragment created from a string.
It is perfect if you have to build and append more than one Nodes to an existing Element all together, because you can add it all without the cons of innerHTML
https://developer.mozilla.org/en-US/docs/Web/API/Range/createContextualFragment
// ...
var listItem = document.createElement("li");
var documentFragment = document.createRange().createContextualFragment(`
<input type="checkbox">
<label>Content typed by the user</label>
<input type="text">
<button class="edit">Edit</button>
<button class="delete">Delete</button>
`)
listItem.appendChild(documentFragment)
// ...

You could just group the elements into a single innerHTML group like this:
let node = document.createElement('li');
node.innerHTML = '<input type="checkbox"><label>Content typed by the user</label> <input type="text"><button class="edit">Edit</button><button class="delete">Delete</button>';
document.getElementById('orderedList').appendChild(node);
then appendChild() is only used once.

It's possible to write your own function if you use the built in arguments object
function appendMultipleNodes(){
var args = [].slice.call(arguments);
for (var x = 1; x < args.length; x++){
args[0].appendChild(args[x])
}
return args[0]
}
Then you would call the function as such:
appendMultipleNodes(parent, nodeOne, nodeTwo, nodeThree)

Why isn't anybody mentioning the element.append() function ?!
you can simply use it to append multiple items respectively as so:
listItem.append(listItemCheckbox, listItemLabel, editButton, deleteButton);

This is a quick fix
document.querySelector("#parentid .parenClass").insertAdjacentHTML('afterend', yourChildElement.outerHTML);

Guys I really recommend you to use this one.
[listItemCheckbox, listItemLabel, editButton, deleteButton]
.forEach((item) => listItem.appendChild(item));
Since you can't append multiple children at once. I think this one looks better.

Also here's a helper function that uses the fragment technique as introduced in the #Slavik's answer and merges it with DOMParser API:
function createHtmlFromString(stringHtml) {
const parser = new DOMParser();
const htmlFragment = document.createDocumentFragment();
const children = parser.parseFromString(stringHtml, "text/html").body
.children;
htmlFragment.replaceChildren(...children);
return htmlFragment;
}
Now to append multiple children with this, you can make the code much more readable and brief, e.g.:
const htmlFragment = createHtmlFromString(`<div class="info">
<span></span>
<h2></h2>
<p></p>
<button></button>
</div>
<div class="cover">
<img />
</div>
`);
Here's also a working example of these used in action: example link.
Note1: You could add text content in the above tags too and it works, but if it's data from user (or fetched from API), you'd better not trust it for better security. Instead, first make the fragment using the above function and then do something like this:
htmlFragment.querySelector(".info > span").textContent = game.name;
Note2: Don't use innerHTML to insert HTML, it is unsecure.

Great way to dynamically add elements to a webpage. This function takes 3 arguments, 1 is optional. The wrapper will wrap the parent element and it's elements inside another element. Useful when creating tables dynamically.
function append(parent, child, wrapper="") {
if (typeof child == 'object' && child.length > 1) {
child.forEach(c => {
parent.appendChild(c);
});
} else {
parent.appendChild(child);
}
if (typeof wrapper == 'object') {
wrapper.appendChild(parent);
}
}

I would like to add that if you want to add some variability to your html, you can also add variables like this:
let node = document.createElement('div');
node.classList.add("some-class");
node.innerHTML = `<div class="list">
<div class="title">${myObject.title}</div>
<div class="subtitle">${myObject.subtitle}
</div>`;

Using jQuery to gather all text nodes from a wrapped set, separated by spaces

I'm looking for a way to gather all of the text in a jQuery wrapped set, but I need to create spaces between sibling nodes that have no text nodes between them.
For example, consider this HTML:
<div>
<ul>
<li>List item #1.</li><li>List item #2.</li><li>List item #3.</li>
</ul>
</div>
If I simply use jQuery's text() method to gather the text content of the <div>, like such:
var $div = $('div'), text = $div.text().trim();
alert(text);
that produces the following text:
List item #1.List item #2.List item #3.
because there is no whitespace between each <li> element. What I'm actually looking for is this (note the single space between each sentence):
List item #1. List item #3. List item #3.
This suggest to me that I need to traverse the DOM nodes in the wrapped set, appending the text for each to a string, followed by a space. I tried the following code:
var $div = $('div'), text = '';
$div.find('*').each(function() {
text += $(this).text().trim() + ' ';
});
alert(text);
but this produced the following text:
This is list item #1.This is list item #2.This is list item #3. This is list item #1. This is list item #2. This is list item #3.
I assume this is because I'm iterating through every descendant of <div> and appending the text, so I'm getting the text nodes within both <ul> and each of its <li> children, leading to duplicated text.
I think I could probably find/write a plain JavaScript function to recursively walk the DOM of the wrapped set, gathering and appending text nodes - but is there a simpler way to do this using jQuery? Cross-browser consistency is very important.
Thanks for any help!

jQuery deals mostly with elements, its text-node powers are relatively weak. You can get a list of all children with contents(), but you'd still have to walk it checking types, so that's really no different from just using plain DOM childNodes. There is no method to recursively get text nodes so you would have to write something yourself, eg. something like:
function collectTextNodes(element, texts) {
for (var child= element.firstChild; child!==null; child= child.nextSibling) {
if (child.nodeType===3)
texts.push(child);
else if (child.nodeType===1)
collectTextNodes(child, texts);
}
}
function getTextWithSpaces(element) {
var texts= [];
collectTextNodes(element, texts);
for (var i= texts.length; i-->0;)
texts[i]= texts[i].data;
return texts.join(' ');
}

This is the simplest solution I could think of:
$("body").find("*").contents().filter(function(){return this.nodeType!==1;});

You can use the jQuery contents() method to get all nodes (including text nodes), then filter down your set to only the text nodes.
$("body").find("*").contents().filter(function(){return this.nodeType!==1;});
From there you can create whatever structure you need.

I built on #bobince's terrific answer to make search tool that would search all columns of a table and filter the rows to show only those that matched (case-insensitively) all of a user's search terms (provided in any order).
Here is a screenshot example:
And here is my javascript/jQuery code:
$(function orderFilter() {
// recursively collect all text from child elements (returns void)
function collectTextNodes(element, texts) {
for (
let child = element.firstChild;
child !== null;
child = child.nextSibling
) {
if (child.nodeType === Node.TEXT_NODE) {
texts.push(child);
} else if (child.nodeType === Node.ELEMENT_NODE) {
collectTextNodes(child, texts);
}
}
}
// separate all text from all children with single space
function getAllText(element) {
const texts = [];
collectTextNodes(element, texts);
for (let i = texts.length; i-- > 0; ) texts[i] = texts[i].data;
return texts.join(' ').replace(/\s\s+/g, ' ');
}
// check to see if the search value appears anywhere in child text nodes
function textMatchesFilter(tbody, searchVal) {
const tbodyText = getAllText(tbody).toLowerCase();
const terms = searchVal.toLowerCase().replace(/\s\s+/g, ' ').split(' ');
return terms.every(searchTerm => tbodyText.includes(searchTerm));
}
// filter orders to only show those matching certain fields
$(document).on('keyup search', 'input.js-filter-orders', evt => {
const searchVal = $(evt.target).val();
const $ordersTable = $('table.js-filterable-table');
$ordersTable.find('tbody[hidden]').removeAttr('hidden');
if (searchVal.length <= 1) return;
// Auto-click the "Show more orders" button and reveal any collapsed rows
$ordersTable
.find('tfoot a.show-hide-link.collapsed, tbody.rotate-chevron.collapsed')
.each((_idx, clickToShowMore) => {
clickToShowMore.click();
});
// Set all tbodies to be hidden, then unhide those that match
$ordersTable
.find('tbody')
.attr('hidden', '')
.filter((_idx, tbody) => textMatchesFilter(tbody, searchVal))
.removeAttr('hidden');
});
});
For our purposes, it works perfectly! Hope this helps others!

We Keep Coding

JavaScript is the programming language of the Web.

Extract text of node, ignoring childNodes - javascript

Related

How does Javascript filter out HTML tags while selecting words using regular expressions?

jQuery .not() function not working within .parent() function [duplicate]

How to traverse just the first level of <LI> elements `with no class tag` [duplicate]

Append multiple items in JavaScript

Using jQuery to gather all text nodes from a wrapped set, separated by spaces

Categories

Resources