JavaScript wrapping unwrapped plain text

JavaScript wrapping unwrapped plain text - javascript

I have some non-static content on my webpage and I need all unwrapped plain text to be wrapped inside an anchor element with a class, however, I need to maintain the placement of that text.
I've search around on this site and found questions like these which are asking the same question except the text is always in the beginning or the end and the answers always prepend/append the content back into the <div>.
To make the question even more complicated, there are scenarios where the content will be only unwrapped plain text and I'll need to wrap that as well.
My HTML:
<div>
<a>1</a>
<a>2</a>
3
<a>4</a>
</div>
Sometimes:
<div>
1
</div>
I've tried all the answers on this page but they all reorder the text.
Any ideas?

Iterate over all text nodes of the element:
$('div').contents().filter(function() {
return this.nodeType === 3;
}).wrap('<a class="awesomeClass"></a>');
DEMO
.contents() retrieves all child nodes of the element, not only element nodes.
The .filter callback discards all nodes that are not text nodes. It works as follows:
Each DOM node has the property nodeType which specifies its type (what surprise). This value is a constant. Element nodes have the value 1 and text nodes have the value 3. .filter will remove all elements from the set for which the callback returns false.
Then each of the text nodes is wrapped in a new element.
I'm having a whitespace problem.
If your HTML looks like
<div>
1
</div>
then the element has one child node, a text node. The value of the text node does not only consist of the visible character(s), but also of all the whitespace characters, e.g. the line break following the opening tag. Here they are made visible (⋅ is a space, ¬ is a line break):
<div>¬
⋅⋅1¬
</div>
The value of the text node is therefore ¬⋅⋅1¬.
A way to solve this would be to trim the node values, removing trailing and preceding whitespace character:
$('div').contents().filter(function() {
return this.nodeType === 3;
}).each(function() {
this.nodeValue = $.trim(this.nodeValue);
}).wrap('<a class="awesomeClass"></a>');
DEMO

Related

How to iterate all the children including those without tags in an DOM element with JavaScript

I am trying to parse a ruby tag like this:
<div id="foo">
<ruby>
<rb>気</rb>
<rp>(</rp>
<rt>き</rt>
<rp>)</rp>
</ruby>
が
<ruby>
<rb>狂</rb>
<rp>(</rp>
<rt>くる</rt>
<rp>)</rp>
</ruby>
ってしまう。
</div>
The problem is, that I am unable to iterate all the child elements including those without tags. All the functions like:
document.getElementById("foo").children and $("#foo").children()
return only the two ruby tags without the text in between.
I am trying to get a list like:
{ruby}
が
{ruby}
ってしまう
Is there a way I can get a list of all the tags and text?

You can use Node.childNodes (See documentation)
document.getElementById("foo").childNodes
But here is where it can get tricky:
In your HTML, between the <div> tag and the <ruby> tag, there is whitespaces and a newline. This will be parsed as a TextNode here. So .childNodes will return 5 nodes:
A TextNode for what is between <div> and the first <ruby> (including a newline and whitespaces).
The first <ruby> element.
A TextNode containing the text between the two <ruby> elements (including two newlines and whitespaces)
The second <ruby> element.
A TextNode containing the text between the second </ ruby> and </ div>. (including a newline and whitespaces)
So, if you only need non-empty TextNode, when there is actually some text:
const nodes = [...document.getElementById('foo').childNodes].filter(node => !node.nodeValue || node.nodeValue.trim())

About querySelector() with multiple selectors

I had a situation in which I wanted to focus either an input tag, if it existed, or it's container if it didn't. So I thought of an intelligent way of doing it:
document.querySelector('.container input, .container').focus();
Funny, though, querySelector always returns the .container element.
I started to investigate and came out that, no matter the order in which the different selectors are put, querySelector always returns the same element.
For example:
var elem1 = document.querySelector('p, div, pre');
var elem2 = document.querySelector('pre, div, p');
elem1 === elem2; // true
elem1.tagName; // "P".
My question is: What are the "reasons" of this behavior and what "rules" (if any) make P elements have priority over DIV and PRE elements.
Note: In the situation mentioned above, I came out with a less-elegant but functional solution:
(document.querySelector('.container input') ||
document.querySelector('.container') ).focus();

document.querySelector returns only the first element matched, starting from the first element in the markup. As written on MDN:
Returns the first element within the document (using depth-first
pre-order traversal of the document's nodes|by first element in
document markup and iterating through sequential nodes by order of
amount of child nodes) that matches the specified group of selectors.
If you want all elements to match the query, use document.querySelectorAll (docs), i.e. document.querySelectorAll('pre, div, p'). This returns an array of the matched elements.

The official document says that,
Returns the first element within the document (using depth-first pre-order traversal of the document's nodes|by first element in document markup and iterating through sequential nodes by order of amount of child nodes) that matches the specified group of selectors.
So that means, in your first case .container is the parent element so that it would be matched first and returned. And in your second case, the paragraph should be the first element in the document while comparing with the other pre and div. So it was returned.

That's precisely the intended behavior of .querySelector() — it finds all the elements in the document that match your query, and then returns the first one.
That's not "the first one you listed", it's "the first one in the document".
This works, essentially, like a CSS selector. The selectors p, div, pre and pre, div, p are identical; they both match three different types of element. So the reason elem1.tagName == 'P' is simply that you have a <p> on the page before any <pre> or <div> tags.

You can try selecting all elements with document.querySelectorAll("p.a, p.b") as shown in the example below and using a loop to focus on all elements that are found.
<html>
<body>
<p class="a">element 1</p>
<p class="b">element 2</p>
<script>
var list=document.querySelectorAll("p.a, p.b");
for (let i = 0; i < list.length; i++) {
list[i].style.backgroundColor = "red";
}
</script>
</body>
</html>

Why one DIV has only one child but has three child nodes?

I have two divs looks like:
<div id="responseframe">
<div id="oldframe">
</div>
</div>
I thought the #oldframe DIV is the only child of #responseframe. However, when I write this in javascript,
var old=document.getElementById("responseframe");
var nodesnumber=old.childNodes.length;
console.log("-------------Here is the nodes number of reponseframe---------: "+nodesnumber);
var nodesname=old.childNodes[i].nodeName;
console.log("-------------Here is the nodes name of reponseframe's child---------: "+nodesname);
console told me #responseframe has 3 child nodes and,
childNode[0] is #text;
childNode[1] is DIV ;
childNode[2] is #text
Why there are 2 #text? Thank you for any idea.

Because of you added new line after <div id="responseframe"> and after first </div>.
If you put this in one line will be there is one node: div.
Html:
<div id="responseframe"><div id="oldframe"></div></div>
Output:
-------------Here is the nodes number of reponseframe---------: 1
-------------Here is the nodes name of reponseframe's child---------: DIV
Here is fiddle: http://jsfiddle.net/cassln/t7kec97u/2/

Node.childNodes property returns all direct child elementNodes of parent element including textNodes, commentNodes.
So in your case you have:
<div id="responseframe"><!-- this whole space area is considered by html as single space so you got your first #text Node
--><div id="oldframe"><!-- this area is ignored because this is not direct child area of the responseframe
--></div><!-- this whole space area is considered by html as single space so you got your second #text Node
--></div>
So finally we got direct children: #text0 #DIV(oldframe) #text1.
If you want to get only direct elementNode (without textNodes and commentNodes) you need Node.children.
var old=document.getElementById("responseframe");
var nodesnumber=old.children.length;
console.log("-------------Here is the nodes number of reponseframe---------: "+nodesnumber);
var nodesname=old.children[i].nodeName;
console.log("-------------Here is the nodes name of reponseframe's child---------: "+nodesname);

How do I use jQuery to get a set of every element with certain text, but NOT the parents?

I want to use jQuery to select every element that has a certain text string in it, but not the parents of that element. How do I do this? I have no control whatsoever over the HTML code, but here is an example:
<body>
<div>
<p>This is a paragraph and <a>this is not</a>.</p>
Here we have a div.
</div>
</body>
If I use the word "this" as my match word, I want jQuery to provide me with a set containing the <a> and the <p>, but not the <div> or the <body>.
Again, I have no control AT ALL over the HTML!
Thanks!
** Clarification: I do want the parent of the element if the parent ALSO has a "this" in its immediate text. Thus, I want the <a> AND the <p>.

Update::
Here is what I came up with: jsfiddle
var myArray = $('*:contains("this")','body').filter(function(){
if($(this).contents().filter(function(){
return(this.nodeType == 3);
}).text().indexOf('this')===-1){
return false;
}
return true;
});
$.each(myArray,function(){
console.log(this.nodeName);
});
Starts similar to the link posted by Robin, but it forces to only search in the context of body elements - this keeps your scripts safe if they are not inline.
The next part is a filter that checks to see if the current element direct text nodes contain the text.
This is a bit convoluted, but to walk through it:
.contents() - docs - gets the immediate nodes
.filter() - docs - we want to only test on test nodes, so we filter them out
this.nodeType - w3 spec - check to see if its a text node
.test() - docs - gets a string of the text nodes.
.indexOf() - check that string for our string
Note I did the :contains() at the top and in the second filter, the first isn't needed per say but I think the initial test should reduce the number of deeper tests and speed it up slightly.

Here's my solution with pure JS.
Code:
function findElmsWithWord(word, elm, found){
if (elm.nodeType === 3 && elm.data.indexOf(word) !== -1)
found.push(elm.parentNode);
else
for (var i = 0; i < elm.childNodes.length; i++)
findElmsWithWord(word, elm.childNodes[i], found);
}
var elms = [];
findElmsWithWord('this', document.body, elms);
console.log(elms);
It recursively walks the dom until it finds the text nodes that contain the word in question. And then adds the text node's parent as a result.

Remove empty tags using RegEx

I want to delete empty tags such as <label></label>, <font> </font> so that:
<label></label><form></form>
<p>This is <span style="color: red;">red</span>
<i>italic</i>
</p>
will be cleaned as:
<p>This is <span style="color: red;">red</span>
<i>italic</i>
</p>
I have this RegEx in javascript, but it deletes the the empty tags but it also delete this: "<i>italic</i></p>"
str=str.replace(/<[\S]+><\/[\S]+>/gim, "");
What I am missing?

You have "not spaces" as your character class, which means "<i>italic</i></p>" will match. The first half of your regex will match "<(i>italic</i)>" and the second half "</(p)>". (I've used brackets to show what each [\S]+ matches.)
Change this:
/<[\S]+><\/[\S]+>/
To this:
/<[^/>][^>]*><\/[^>]+>/
Overall you should really be using a proper HTML processor, but if you're munging HTML soup this should suffice :)

Regex is not for HTML. If you're in JavaScript anyway I'd be encouraged to use jQuery DOM processing.
Something like:
$('*:empty').remove();
Alternatively:
$("*").filter(function()
{
return $.trim($(this).html()).length > 0;
}).remove();

All the answers with regex are only validate
<label></label>
but in the case of
<label> </label>
<label> </label>
<label>
</label>
try this pattern to get all the above
<[^/>]+>[ \n\r\t]*</[^>]+>

You need /<[\S]+?><\/[\S]+?>/ -- the difference is the ?s after the +s, to match "as few as possible" (AKA "non-greedy match") nonspace characters (though 1 or more), instead of the bare +s which match"as many as possible" (AKA "greedy match").
Avoiding regular expressions altogether, as the other answer recommends, is also an excellent idea, but I wanted to point out the important greedy vs non-greedy distinction, which will serve you well in a huge variety of situations where regexes are warranted.

I like MattMitchell's jQuery solution but here is another option using native JavaScript.
function CleanChildren(elem)
{
var children = elem.childNodes;
var len = elem.childNodes.length;
for (var i = 0; i < len; i++)
{
var child = children[i];
if(child.hasChildNodes())
CleanChildren(child);
else
elem.removeChildNode(child);
}
}

Here's a modern native JavaScript solution; which is actually quite similar to the jQuery one from 2010. I adapted it from that answer for a project that I am working on, and thought I would share it here.
document.querySelectorAll("*:empty").forEach((x)=>{x.remove()});
document.querySelectorAll returns a NodeList; which is essentially an array of all DOM nodes which match the CSS selector given to it as an argument.
*:empty is a selector which selects all elements (* means "any element") that is empty (which is what :empty means).
This will select any empty element within the entire document, if you only wanted to remove any empty elements from within a certain part of the page (i.e. only those within some div element); you can add an id to that element and then use the selector #id *:empty, which means any empty element within the element with an id of id.
This is almost certainly what you want. Technically some important tags (e.g. <meta> tags, <br> tags, <img> tags, etc) are "empty"; so without specifying a scope, you will end up deleting some tags you probably care about.
forEach loops through every element in the resulting NodeList, and runs the anonymous function (x)=>{x.remove()} on it. x is the current element in the list, and calling .remove() on it removes that element from the DOM.
Hopefully this helps someone. It's amazing to see how far JavaScript has come in just 8 years; from almost always needing a library to write something complex like this in a concise manner to being able to do so natively.
Edit
So, the method detailed above will work fine in most circumstances, but it has two issues:
Elements like <div> </div> are not treated as :empty (not the space in-between). CSS Level 4 selectors fix this with the introduction of the :blank selector (which is like empty except it ignores whitespace), but currently only Firefox supports it (in vendor-prefixed form).
Self-closing tags are caught by :empty - and this will remain the case with :blank, too.
I have written a slightly larger function which deals with these two use cases:
document.querySelectorAll("*").forEach((x)=>{
let tagName = "</" + x.tagName + ">";
if (x.outerHTML.slice(tagName.length).toUpperCase() == tagName
&& /[^\s]/.test(x.innerHTML)) {
x.remove();
}
});
We iterate through every element on the page. We grab that element's tag name (for example, if the element is a div this would be DIV, and use it to construct a closing tag - e.g. </DIV>.
That tag is 6 characters long. We check if the upper-cased last 6 characters of the elements HTML matches that. If it does we continue. If it doesn't, the element does't have a closing tag, and therefore must be self-closing. This is preferable over a list, because it means you don't have to update anything should a new self-closing tag get added to the spec.
Then, we check if the contents of the element contain any whitespace. /[^\s]/ is a RegEx. [] is a set in RegEx, and will match any character that appears inside it. If ^ is the first element, the set becomes negated - it will match any element that is NOT in the set. \s means whitespace - tabs, spaces, line breaks. So what [^\s] says is "any character that is not white space".
Matching against that, if the tag is not self-closing, and its contents contain a non-whitespace character, then we remove it.
Of course, this is a bit bigger and less elegant than the previous one-liner. But it should work for essentially every case.

This is an issue of greedy regex. Try this:
str=str.replace(/<[\^>]+><\/[\S]+>/gim, "");
or
str=str.replace(/<[\S]+?><\/[\S]+>/gim, "");
In your regex, <[\S]+?> matches <i>italic</i> and the <\/[\S]+> matches the </p>

You can use this one
text = text.replace(/<[^/>][^>]>\s</[^>]+>/gim, "");

found this on code pen:
jQuery though but does the job
$('element').each(function() {
if ($(this).text() === '') {
$(this).remove();
}
});
You will need to alter the element to point to where you want to remove empty tags. Do not point at document cause it will result in my answer at Toastrackenigma

remove empty tags with cheerio will and also removing images:
$('*')
.filter(function(index, el) {
return (
$(el)
.text()
.trim().length === 0
)
})
.remove()
remove empty tags with cheerio, but also keep images:
$('*')
.filter(function(index, el) {
return (
el.tagName !== 'img' &&
$(el).find(`img`).length === 0 &&
$(el)
.text()
.trim().length === 0
)
})
.remove()

<([^>]+)\s*>\s*<\/\1\s*>
<div>asdf</div>
<div></div> -- will match only this
<div></notdiv>
-- and this
<div >
</div >
try yourself https://regexr.com/

We Keep Coding

JavaScript is the programming language of the Web.

JavaScript wrapping unwrapped plain text - javascript

Related

How to iterate all the children including those without tags in an DOM element with JavaScript

About querySelector() with multiple selectors

Why one DIV has only one child but has three child nodes?

How do I use jQuery to get a set of every element with certain text, but NOT the parents?

Remove empty tags using RegEx

Categories

Resources