Javascript - Remove empty paragraphs from HTML string - javascript

In a Javascript object, I have a string of HTML (node.innerHTML) and I need to remove empty paragraphs from the string, then return the string. Empty paragraphs include <p></p>, <p> </p>, and <p> </p>. Ideally, I think the string should be parsed as HTML code for processing as opposed to using a regex. I have tried all sorts of approaches and cannot seem to get it to work correctly.
Here is code I have tried, but it returns an object with only prevObject data, plus it does not seem to remove the empty paragraphs.
function strip_empty_p (node) {
var html = $(node.innerHTML);
html = html.filter($('p'),function () {
return this.innerHTML == "" ||
this.innerHTML == " " ||
this.innerHTML == " "
}).remove();
node.innerHTML = html.innerHTML;
return node.innerHTML;
}

If you have access to a node, it'd be much better to run through it as HTML instead of getting innerHTML and parsing it that way.
const parent = document.querySelector('div');
parent.childNodes.forEach(child => child.nodeType === document.ELEMENT_NODE
&& !child.innerText.trim()
&& parent.removeChild(child));
<div>
<p></p>
<p> </p>
<p>Not empty</p>
<p> </p>
<p>Also not empty</p>
<p></p>
</div>
This method will be infinitely more reliable and quicker than trying to parse it as if it were text.
If you load it as text from somewhere, convert it to an HTML node first, if it's well-formed enough.
If it's malformed HTML, then life becomes a lot more difficult and you'd have to do some tricky and error-prone string parsing.

You can do this via jQuery. You can either check .text() or innerHTML() of the paragraphs. Here's an example on jsfiddle: https://jsfiddle.net/akvap5mg/
$(document).ready(function(){
$("p").each(function(){
var $this = $(this);
if($this.text().trim() === '') {
$this.remove();
}
});
});

If you wish to remove empty paragraphs, one needs to exercise utmost caution if that is what you intend to literally do because every time an element gets removed it changes the whole ball game. This is what ultimately worked for me:
var d = document.getElementsByTagName("div")[0];
for (i=0, max=4; i <= max; i++) {
if ( d.children[i].textContent.trim() == ""){
d.removeChild(d.children[i]);
max--;
i--;
}
}
live code
If you don't adjust variables max and i, then detecting the DOM elements gets thrown off and an empty paragraph may not get removed.
Alternatively, the following code figuratively removes empty paragraphs and there is no need to adjust variables max and i:
var d = document.getElementsByTagName("div")[0];
for (i=0, max=4; i <= max; i++) {
if ( d.children[i].textContent.trim() == ""){
d.children[i].style.display="None";
}
}
live code
Incidentally, here's a more vivid example here

Related

JQuery | change a part of a string with links

I have the following code:
<div class="TopMenu">
<h3>Create an Account</h3>
<h3>yup</h3>
<h3>lol</h3>
yo
<ul>
<li sytle="display:">
start or
finish
</li>
</ul>
and I'm using:
$('.TopMenu li:contains("or")').each(function() {
var text = $(this).text();
$(this).text(text.replace('or', 'triple'));
});
It works fine, but suddenly the links aren't active,
how do I fix it?
Thank you very much in advance.
Here's what your jQuery basically translates to when it's being run:
text = this.textContent;
// text = "\n\t\tstart or\n\t\t finish\n\t\t\n";
text = text.replace('or','triple');
// text = "\n\t\tstart triple\n\t\t finish\n\t\t\n";
this.textContent = text;
// essentially, remove everything from `this` and put a single text node there
Okay, that's not a great explanation XD The point is, setting textContent (or, in jQuery, calling .text()), replaces the element's contents with that text.
What you want to do is just affect the text nodes. I'm not aware of how to do this in jQuery, but here's some Vanilla JS:
function recurse(node) {
var nodes = node.childNodes, l = nodes.length, i;
for( i=0; i<l; i++) {
if( nodes[i].nodeType == 1) recurse(node);
else if( nodes[i].nodeType == 3) {
nodes[i].nodeValue = nodes[i].nodeValue.replace(/\bor\b/g,'triple');
}
}
}
recurse(document.querySelector(".TopMenu"));
Note the regex-based replacement will prevent "boring" from becoming "btripleing". Use Vanilla JS and its magic powers or I shall buttbuttinate you!
Change .text() to .html()
$('.TopMenu li:contains("or")').each(function() {
var text = $(this).html();
$(this).html(text.replace('or', 'triple'));
});
See Fiddle
Since or is a text node, you can use .contents() along with .replaceWith() instead:
$('.TopMenu li:contains("or")').each(function () {
var text = $(this).text();
$(this).contents().filter(function () {
return this.nodeType === 3 && $.trim(this.nodeValue).length;
}).replaceWith(' triple ');
});
Fiddle Demo
You need to us .html() instead of .text(),
Like this:
$('.TopMenu li:contains("or")').each(function() {
var text = $(this).html();
$(this).html(text('or', 'triple'));
});
Here is a live example: http://jsfiddle.net/7Mamj/
jsFiddle Demo
You are placing the anchors into text by doing that. You should iterate the matched elements' childNodes and only use replace on their textContent to avoid modifying any html tags or attributes.
$('.TopMenu li:contains("or")').each(function() {
for(var i = 0; i < this.childNodes.length; i++){
if(this.childNodes[i].nodeName != "#text") continue;
this.childNodes[i].textContent = this.childNodes[i].textContent.replace(' or ', ' triple ');
}
});
It is a bit more complicated task. You need to replace text in text nodes (nodeType === 3), which can be done with contents() and each iteration:
$('.TopMenu li:contains("or")').contents().each(function() {
if (this.nodeType === 3) {
this.nodeValue = this.nodeValue.replace('or', 'triple');
}
});
All other approaches will either rewrite the markup in the <li> element (removing all attached events), or just remove the inner elements.
As discussed in the comments below, fool-proof solution will be to use replacement with regular expression, i.e. this.nodeValue.replace(/\bor\b/g, 'triple'), which will match all or as standalone words and not as parts of words.
DEMO: http://jsfiddle.net/48E6M/

Wrap a tag around multiple instances of a string using Javascript

I’m trying to wrap multiple instances of a string found in html around a tag (span or abbr) using pure JS. I have found a way to do it by using the code:
function wrapString() {
document.body.innerHTML = document.body.innerHTML.replace(/string/g, ‘<tag>string</tag>');
};
but using this code messes with a link’s href or an input’s value so I want to exclude certain tags (A, INPUT, TEXTAREA etc.).
I have tried this:
function wrapString() {
var allElements = document.getElementsByTagName('*');
for (var i=0;i<allElements.length;i++){
if (allElements[i].tagName != "SCRIPT" && allElements[i].tagName != "A" && allElements[i].tagName != "INPUT" && allElements[i].tagName != "TEXTAREA") {
allElements[i].innerHTML = allElements[i].innerHTML.replace(/string/g, ‘<span>string</span>');
}
}
}
but it didn’t work as it gets ALL the elements containing my string (HTML, BODY, parent DIV etc.), plus it kept crushing my browser. I even tried with JQuery's ":containing" Selector but I face the same problem as I do not know what the string's container is beforehand to add it to the selector.
I want to use pure JavaScript to do that as I was planning on using it as a bookmark for quick access to any site but I welcome all answers regarding JQuery and other frameworks as well.
P.S. If something like that has already been answered I couldn't find it...
This is a quite complicated problem actually (you can read this detailed blog post about it).
You need to:
recurse on the dom tree
find all text nodes
do your replace on its data
make the modified data into dom nodes
insert the dom nodes to the tree, before the original text node
remove the original text node
Here is a demo fiddle.
And if you still need tagName based exclusions, look at this fiddle
The code:
function wrapInElement(element, replaceFrom, replaceTo) {
var index, textData, wrapData, tempDiv;
// recursion for the child nodes
if (element.childNodes.length > 0) {
for (index = 0; index < element.childNodes.length; index++) {
wrapInElement(element.childNodes[index], replaceFrom, replaceTo);
}
}
// non empty text node?
if (element.nodeType == Node.TEXT_NODE && /\S/.test(element.data)) {
// replace
textData = element.data;
wrapData = textData.replace(replaceFrom, replaceTo);
if (wrapData !== textData) {
// create a div
tempDiv = document.createElement('div');
tempDiv.innerHTML = wrapData;
// insert
while (tempDiv.firstChild) {
element.parentNode.insertBefore(tempDiv.firstChild, element);
}
// remove text node
element.parentNode.removeChild(element);
}
}
}
function wrapthis() {
var body = document.getElementsByTagName('body')[0];
wrapInElement(body, "this", "<span class='wrap'>this</span>");
}

How To Append <a></a> Tags To Specific Words In An Element With jQuery

The tricky part is not selecting the elements here, but just selecting the text within. The only true jQuery that will give you back text contents is .contents(). So I'm getting the contents of every element not he page, and I want to pick out a word, such as "hashtag". Then append to it.
What am I doing wrong here:
<html>
<p>
The word hashtag is in this sentence.
</p>
</html>
jQuery:
$(function() {
$('*')
.contents()
.filter(function(){
return this.nodeType === 3;
})
.filter(function(){
return this.nodeValue.indexOf('hashtag') != -1;
})
.each(function(){
alert("It works!")
});
});
$('*') grabs every element
.contents() grabs the contents of every element
.filter(function(){ return this.noteType === 3; refines it down to the text contents of elements. (#3 node type is text)
return this.nodeValue.indexOf('hashtag') should grab the word "hashtag". Not sure if this is working.
!= -1; should prevent it from grabbing every single element in the HTML. Not sure about that one.
Why doesn't it work? I know I have anything appending tags yet, but can I select the word "hashtag" thanks!
If you want to do this for the whole page you can work on the HTML of the body element:
$(function() {
var regExp = new RegExp("\\b(" + "hashtag" + ")\\b", "gm");
var html = $('body').html();
$('body').html(html.replace(regExp, "<a href='#'>$1</a>"));
});
Keep in mind that this may be slow if your page is large. Also, all elements will be rewritten and thus loose their event handlers etc.
If you don't want this or want to restrict the replacement to certain elements, you can select and iterate over them:
$(function() {
var regExp = new RegExp("\\b(" + "hashtag" + ")\\b", "gm");
$('div, p, span').each(function() { // use your selector of choice here
var html = $(this).html();
$(this).html(html.replace(regExp, "<a href='#'>$1</a>"));
});
});
JS :
function replaceText() {
$("*").each(function() {
if($(this).children().length==0) {
$(this).html($(this).text().replace('hashtag', '<span style="color: red;">hashtag</span>'));
}
});
}
$(document).ready(replaceText);
$("html").ajaxStop(replaceText);
HTML :
<html>
<p>
The word hashtag is in this sentence.
</p>
</html>
Fiddle : http://jsfiddle.net/zCxsY/
Source : jQuery - Find and replace text, after body was loaded
This is done with span but will work with obviously
The clean variant would be this:
$(function() {
var searchTerm = 'hashtag';
$('body *').contents()
.filter(function () {
return this.nodeType == 3
&& this.nodeValue.indexOf(searchTerm) > -1;
})
.replaceWith(function () {
var i, l, $dummy = $("<span>"),
parts = this.nodeValue.split(searchTerm);
for (i=0, l=parts.length; i<l; i++) {
$dummy.append(document.createTextNode(parts[i]));
if (i < l - 1) {
$dummy.append( $("<a>", {href: "", text: searchTerm}) );
}
}
return $dummy.contents();
})
});
It splits the value of the text node at searchTerm and re-joins the parts as a sequence of either new text nodes or <a> elements. The nodes created this way replace the respective text node.
This way all text values keep their original meaning, which cannot be guaranteed when you call replace() on them and feed them to .html() (think of text that contains HTML special characters).
See jsFiddle: http://jsfiddle.net/Tomalak/rGcxw/
I don't know jQuery very much but I think you can't just say .indexOf('hashtag'), you have to iterate through the text itself. Let's say with substring. Probably there's an jQuery function that will do this for you, but that might be your problem for finding 'hashtag'.

Clean Microsoft Word Pasted Text using JavaScript

I am using a 'contenteditable' <div/> and enabling PASTE.
It is amazing the amount of markup code that gets pasted in from a clipboard copy from Microsoft Word. I am battling this, and have gotten about 1/2 way there using Prototypes' stripTags() function (which unfortunately does not seem to enable me to keep some tags).
However, even after that, I wind up with a mind-blowing amount of unneeded markup code.
So my question is, is there some function (using JavaScript), or approach I can use that will clean up the majority of this unneeded markup?
Here is the function I wound up writing that does the job fairly well (as far as I can tell anyway).
I am certainly open for improvement suggestions if anyone has any. Thanks.
function cleanWordPaste( in_word_text ) {
var tmp = document.createElement("DIV");
tmp.innerHTML = in_word_text;
var newString = tmp.textContent||tmp.innerText;
// this next piece converts line breaks into break tags
// and removes the seemingly endless crap code
newString = newString.replace(/\n\n/g, "<br />").replace(/.*<!--.*-->/g,"");
// this next piece removes any break tags (up to 10) at beginning
for ( i=0; i<10; i++ ) {
if ( newString.substr(0,6)=="<br />" ) {
newString = newString.replace("<br />", "");
}
}
return newString;
}
Hope this is helpful to some of you.
You can either use the full CKEditor which cleans on paste, or look at the source.
I am using this:
$(body_doc).find('body').bind('paste',function(e){
var rte = $(this);
_activeRTEData = $(rte).html();
beginLen = $.trim($(rte).html()).length;
setTimeout(function(){
var text = $(rte).html();
var newLen = $.trim(text).length;
//identify the first char that changed to determine caret location
caret = 0;
for(i=0;i < newLen; i++){
if(_activeRTEData[i] != text[i]){
caret = i-1;
break;
}
}
var origText = text.slice(0,caret);
var newText = text.slice(caret, newLen - beginLen + caret + 4);
var tailText = text.slice(newLen - beginLen + caret + 4, newLen);
var newText = newText.replace(/(.*(?:endif-->))|([ ]?<[^>]*>[ ]?)|( )|([^}]*})/g,'');
newText = newText.replace(/[·]/g,'');
$(rte).html(origText + newText + tailText);
$(rte).contents().last().focus();
},100);
});
body_doc is the editable iframe, if you are using an editable div you could drop out the .find('body') part. Basically it detects a paste event, checks the location cleans the new text and then places the cleaned text back where it was pasted. (Sounds confusing... but it's not really as bad as it sounds.
The setTimeout is needed because you can't grab the text until it is actually pasted into the element, paste events fire as soon as the paste begins.
How about having a "paste as plain text" button which displays a <textarea>, allowing the user to paste the text in there? that way, all tags will be stripped for you. That's what I do with my CMS; I gave up trying to clean up Word's mess.
You can do it with regex
Remove head tag
Remove script tags
Remove styles tag
let clipboardData = event.clipboardData || window.clipboardData;
let pastedText = clipboardData.getData('text/html');
pastedText = pastedText.replace(/\<head[^>]*\>([^]*)\<\/head/g, '');
pastedText = pastedText.replace(/\<script[^>]*\>([^]*)\<\/script/g, '');
pastedText = pastedText.replace(/\<style[^>]*\>([^]*)\<\/style/g, '');
// pastedText = pastedText.replace(/<(?!(\/\s*)?(b|i|u)[>,\s])([^>])*>/g, '');
here the sample : https://stackblitz.com/edit/angular-u9vprc
I did something like that long ago, where i totally cleaned up the stuff in a rich text editor and converted font tags to styles, brs to p's, etc, to keep it consistant between browsers and prevent certain ugly things from getting in via paste. I took my recursive function and ripped out most of it except for the core logic, this might be a good starting point ("result" is an object that accumulates the result, which probably takes a second pass to convert to a string), if that is what you need:
var cleanDom = function(result, n) {
var nn = n.nodeName;
if(nn=="#text") {
var text = n.nodeValue;
}
else {
if(nn=="A" && n.href)
...;
else if(nn=="IMG" & n.src) {
....
}
else if(nn=="DIV") {
if(n.className=="indent")
...
}
else if(nn=="FONT") {
}
else if(nn=="BR") {
}
if(!UNSUPPORTED_ELEMENTS[nn]) {
if(n.childNodes.length > 0)
for(var i=0; i<n.childNodes.length; i++)
cleanDom(result, n.childNodes[i]);
}
}
}
This works great to remove any comments from HTML text, including those from Word:
function CleanWordPastedHTML(sTextHTML) {
var sStartComment = "<!--", sEndComment = "-->";
while (true) {
var iStart = sTextHTML.indexOf(sStartComment);
if (iStart == -1) break;
var iEnd = sTextHTML.indexOf(sEndComment, iStart);
if (iEnd == -1) break;
sTextHTML = sTextHTML.substring(0, iStart) + sTextHTML.substring(iEnd + sEndComment.length);
}
return sTextHTML;
}
Had a similar issue with line-breaks being counted as characters and I had to remove them.
$(document).ready(function(){
$(".section-overview textarea").bind({
paste : function(){
setTimeout(function(){
//textarea
var text = $(".section-overview textarea").val();
// look for any "\n" occurences and replace them
var newString = text.replace(/\n/g, '');
// print new string
$(".section-overview textarea").val(newString);
},100);
}
});
});
Could you paste to a hidden textarea, copy from same textarea, and paste to your target?
Hate to say it, but I eventually gave up making TinyMCE handle Word crap the way I want. Now I just have an email sent to me every time a user's input contains certain HTML (look for <span lang="en-US"> for example) and I correct it manually.

JavaScript to add HTML tags around content

I was wondering if it is possible to use JavaScript to add a <div> tag around a word in an HTML page.
I have a JS search that searches a set of HTML files and returns a list of files that contain the keyword. I'd like to be able to dynamically add a <div class="highlight"> around the keyword so it stands out.
If an alternate search is performed, the original <div>'s will need to be removed and new ones added. Does anyone know if this is even possible?
Any tips or suggestions would be really appreciated.
Cheers,
Laurie.
In general you will need to parse the html code in order to ensure that you are only highlighting keywords and not invisible text or code (such as alt text attributes for images or actual markup). If you do as Jesse Hallett suggested:
$('body').html($('body').html().replace(/(pretzel)/gi, '<b>$1</b>'));
You will run into problems with certain keywords and documents. For example:
<html>
<head><title>A history of tables and tableware</title></head>
<body>
<p>The table has a fantastic history. Consider the following:</p>
<table><tr><td>Year</td><td>Number of tables made</td></tr>
<tr><td>1999</td><td>12</td></tr>
<tr><td>2009</td><td>14</td></tr>
</table>
<img src="/images/a_grand_table.jpg" alt="A grand table from designer John Tableius">
</body>
</html>
This relatively simple document might be found by searching for the word "table", but if you just replace text with wrapped text you could end up with this:
<<span class="highlight">table</span>><tr><td>Year</td><td>Number of <span class="highlight">table</span>s made</td></tr>
and this:
<img src="/images/a_grand_<span class="highlight">table</span>.jpg" alt="A grand <span class="highlight">table</span> from designer John <span class="highlight">Table</span>ius">
This means you need parsed HTML. And parsing HTML is tricky. But if you can assume a certain quality control over the html documents (i.e. no open-angle-brackets without closing angle brackets, etc) then you should be able to scan the text looking for non-tag, non-attribute data that can be further-marked-up.
Here is some Javascript which can do that:
function highlight(word, text) {
var result = '';
//char currentChar;
var csc; // current search char
var wordPos = 0;
var textPos = 0;
var partialMatch = ''; // container for partial match
var inTag = false;
// iterate over the characters in the array
// if we find an HTML element, ignore the element and its attributes.
// otherwise try to match the characters to the characters in the word
// if we find a match append the highlight text, then the word, then the close-highlight
// otherwise, just append whatever we find.
for (textPos = 0; textPos < text.length; textPos++) {
csc = text.charAt(textPos);
if (csc == '<') {
inTag = true;
result += partialMatch;
partialMatch = '';
wordPos = 0;
}
if (inTag) {
result += csc ;
} else {
var currentChar = word.charAt(wordPos);
if (csc == currentChar && textPos + (word.length - wordPos) <= text.length) {
// we are matching the current word
partialMatch += csc;
wordPos++;
if (wordPos == word.length) {
// we've matched the whole word
result += '<span class="highlight">';
result += partialMatch;
result += '</span>';
wordPos = 0;
partialMatch = '';
}
} else if (wordPos > 0) {
// we thought we had a match, but we don't, so append the partial match and move on
result += partialMatch;
result += csc;
partialMatch = '';
wordPos = 0;
} else {
result += csc;
}
}
if (inTag && csc == '>') {
inTag = false;
}
}
return result;
}
Wrapping is pretty easy with jQuery:
$('span').wrap('<div class="highlight"></div>'); // wraps spans in a b tag
Then, to remove, something like this:
$('div.highlight').each(function(){ $(this).after( $(this).text() ); }).remove();
Sounds like you will have to do some string splitting, though, so wrap may not work unless you want to pre-wrap all your words with some tag (ie. span).
The DOM API does not provide a super easy way to do this. As far as I know the best solution is to read text into JavaScript, use replace to make the changes that you want, and write the entire content back. You can do this either one HTML node at a time, or modify the whole <body> at once.
Here is how that might work in jQuery:
$('body').html($('body').html().replace(/(pretzel)/gi, '<b>$1</b>'));
couldn't you just write a selector as such to wrap it all?
$("* :contains('foo')").wrap("<div class='bar'></div>");
adam wrote the code above to do the removal:
$('div.bar').each(function(){ $(this).after( $(this).text() ); }).remove();
edit: on second thought, the first statement returns an element which would wrap the element with the div tag and not the sole word. maybe a regex replace would be a better solution here.

Categories