Chrome extension breaks DOM - javascript

I'm making a Chrome extension that replaces certain text on a page with new text and a link. To do this I'm using document.body.innerHTML, which I've read breaks the DOM. When the extension is enabled it seems to break the loading of YouTube videos and pages at codepen.io. I've tried to fix this by excluding YouTube and codepen in the manifest, and by filtering them out in the code below, but it doesn't seem to be working.
Can anyone suggest an alternative to using document.body.innerHTML or see other problems in my code that may be breaking page loads? Thanks.
var texts=["some text","more text"];
if(!window.location.href.includes("www.google.com")||!window.location.href.includes("youtube.com")||!window.location.href.includes("codepen.io")){
for(var i=0;i<texts.length;i++){
if(document.documentElement.textContent || document.documentElement.innerText.includes(texts[i])){
var regex = new RegExp(texts[i],'g');
document.body.innerHTML = document.body.innerHTML.replace(regex,
"<a href='https://www.somesite.org'>replacement text</a>");
}
}
}

Using innerHTML to do this is like using a shotgun to do brain surgery. Not to mention that this can even result in invalid HTML. You will end up having to whitelist every single website that uses any JavaScript at this rate, which is obviously not feasible.
The correct way to do it is to not touch innerHTML at all. Recursively iterate through all the DOM nodes (using firstChild, nextSibling) on the page and look for matches in text nodes. When you find one, replace that single node (replaceChild) with your link (createElement), and new text nodes (createTextNode, appendChild, insertBefore) for any leftover bits.
Essentially you will want to look for a node like:
Text: this is some text that should be linked
And programmatically replace it with nodes like:
Text: this is
Element: a href="..."
Text: replacement text
Text: that should be linked
Additionally if you want to support websites that generate content with JavaScript you'll have to run this replacement process on dynamically inserted content as well. A MutationObserver would be one way to do that, but bear in mind this will probably slow down websites.

Related

Document.write nondestructive alternative [duplicate]

In tutorials I've learnt to use document.write. Now I understand that by many this is frowned upon. I've tried print(), but then it literally sends it to the printer.
So what are alternatives I should use, and why shouldn't I use document.write? Both w3schools and MDN use document.write.
The reason that your HTML is replaced is because of an evil JavaScript function: document.write().
It is most definitely "bad form." It only works with webpages if you use it on the page load; and if you use it during runtime, it will replace your entire document with the input. And if you're applying it as strict XHTML structure it's not even valid code.
the problem:
document.write writes to the document stream. Calling document.write on a closed (or loaded) document automatically calls document.open which will clear the document.
-- quote from the MDN
document.write() has two henchmen, document.open(), and document.close(). When the HTML document is loading, the document is "open". When the document has finished loading, the document has "closed". Using document.write() at this point will erase your entire (closed) HTML document and replace it with a new (open) document. This means your webpage has erased itself and started writing a new page - from scratch.
I believe document.write() causes the browser to have a performance decrease as well (correct me if I am wrong).
an example:
This example writes output to the HTML document after the page has loaded. Watch document.write()'s evil powers clear the entire document when you press the "exterminate" button:
I am an ordinary HTML page. I am innocent, and purely for informational purposes. Please do not <input type="button" onclick="document.write('This HTML page has been succesfully exterminated.')" value="exterminate"/>
me!
the alternatives:
.innerHTML This is a wonderful alternative, but this attribute has to be attached to the element where you want to put the text.
Example: document.getElementById('output1').innerHTML = 'Some text!';
.createTextNode() is the alternative recommended by the W3C.
Example: var para = document.createElement('p');
para.appendChild(document.createTextNode('Hello, '));
NOTE: This is known to have some performance decreases (slower than .innerHTML). I recommend using .innerHTML instead.
the example with the .innerHTML alternative:
I am an ordinary HTML page.
I am innocent, and purely for informational purposes.
Please do not
<input type="button" onclick="document.getElementById('output1').innerHTML = 'There was an error exterminating this page. Please replace <code>.innerHTML</code> with <code>document.write()</code> to complete extermination.';" value="exterminate"/>
me!
<p id="output1"></p>
Here is code that should replace document.write in-place:
document.write=function(s){
var scripts = document.getElementsByTagName('script');
var lastScript = scripts[scripts.length-1];
lastScript.insertAdjacentHTML("beforebegin", s);
}
You can combine insertAdjacentHTML method and document.currentScript property.
The insertAdjacentHTML() method of the Element interface parses the specified text as HTML or XML and inserts the resulting nodes into the DOM tree at a specified position:
'beforebegin': Before the element itself.
'afterbegin': Just inside the element, before its first child.
'beforeend': Just inside the element, after its last child.
'afterend': After the element itself.
The document.currentScript property returns the <script> element whose script is currently being processed. Best position will be beforebegin — new HTML will be inserted before <script> itself. To match document.write's native behavior, one would position the text afterend, but then the nodes from consecutive calls to the function aren't placed in the same order as you called them (like document.write does), but in reverse. The order in which your HTML appears is probably more important than where they're place relative to the <script> tag, hence the use of beforebegin.
document.currentScript.insertAdjacentHTML(
'beforebegin',
'This is a document.write alternative'
)
As a recommended alternative to document.write you could use DOM manipulation to directly query and add node elements to the DOM.
Just dropping a note here to say that, although using document.write is highly frowned upon due to performance concerns (synchronous DOM injection and evaluation), there is also no actual 1:1 alternative if you are using document.write to inject script tags on demand.
There are a lot of great ways to avoid having to do this (e.g. script loaders like RequireJS that manage your dependency chains) but they are more invasive and so are best used throughout the site/application.
I fail to see the problem with document.write. If you are using it before the onload event fires, as you presumably are, to build elements from structured data for instance, it is the appropriate tool to use. There is no performance advantage to using insertAdjacentHTML or explicitly adding nodes to the DOM after it has been built. I just tested it three different ways with an old script I once used to schedule incoming modem calls for a 24/7 service on a bank of 4 modems.
By the time it is finished this script creates over 3000 DOM nodes, mostly table cells. On a 7 year old PC running Firefox on Vista, this little exercise takes less than 2 seconds using document.write from a local 12kb source file and three 1px GIFs which are re-used about 2000 times. The page just pops into existence fully formed, ready to handle events.
Using insertAdjacentHTML is not a direct substitute as the browser closes tags which the script requires remain open, and takes twice as long to ultimately create a mangled page. Writing all the pieces to a string and then passing it to insertAdjacentHTML takes even longer, but at least you get the page as designed. Other options (like manually re-building the DOM one node at a time) are so ridiculous that I'm not even going there.
Sometimes document.write is the thing to use. The fact that it is one of the oldest methods in JavaScript is not a point against it, but a point in its favor - it is highly optimized code which does exactly what it was intended to do and has been doing since its inception.
It's nice to know that there are alternative post-load methods available, but it must be understood that these are intended for a different purpose entirely; namely modifying the DOM after it has been created and memory allocated to it. It is inherently more resource-intensive to use these methods if your script is intended to write the HTML from which the browser creates the DOM in the first place.
Just write it and let the browser and interpreter do the work. That's what they are there for.
PS: I just tested using an onload param in the body tag and even at this point the document is still open and document.write() functions as intended. Also, there is no perceivable performance difference between the various methods in the latest version of Firefox. Of course there is a ton of caching probably going on somewhere in the hardware/software stack, but that's the point really - let the machine do the work. It may make a difference on a cheap smartphone though. Cheers!
The question depends on what you are actually trying to do.
Usually, instead of doing document.write you can use someElement.innerHTML or better, document.createElement with an someElement.appendChild.
You can also consider using a library like jQuery and using the modification functions in there: http://api.jquery.com/category/manipulation/
This is probably the most correct, direct replacement: insertAdjacentHTML.
Try to use getElementById() or getElementsByName() to access a specific element and then to use innerHTML property:
<html>
<body>
<div id="myDiv1"></div>
<div id="myDiv2"></div>
</body>
<script type="text/javascript">
var myDiv1 = document.getElementById("myDiv1");
var myDiv2 = document.getElementById("myDiv2");
myDiv1.innerHTML = "<b>Content of 1st DIV</b>";
myDiv2.innerHTML = "<i>Content of second DIV element</i>";
</script>
</html>
Use
var documentwrite =(value, method="", display="")=>{
switch(display) {
case "block":
var x = document.createElement("p");
break;
case "inline":
var x = document.createElement("span");
break;
default:
var x = document.createElement("p");
}
var t = document.createTextNode(value);
x.appendChild(t);
if(method==""){
document.body.appendChild(x);
}
else{
document.querySelector(method).appendChild(x);
}
}
and call the function based on your requirement as below
documentwrite("My sample text"); //print value inside body
documentwrite("My sample text inside id", "#demoid", "block"); // print value inside id and display block
documentwrite("My sample text inside class", ".democlass","inline"); // print value inside class and and display inline
I'm not sure if this will work exactly, but I thought of
var docwrite = function(doc) {
document.write(doc);
};
This solved the problem with the error messages for me.

How can I make my Javascript code select all 'b' tags on a page instead of only the first one?

We have a very large website that is quite old and has a lot of 'b' tags. My boss wants to change them to 'strong' tags but this will require a lot of time to change manually so she was hoping we could change it with some code.
I had a nice bit of JQuery code that worked (intermittently), but I couldn't get it to work on the site as it uses JQuery 1.9.1 and cannot be upgraded.
I then found this piece of Javascript which does what I need but only works on the first 'b' tag on the page and all others stay as 'b' tags. I don't know enough about Javascript selectors to change the firstChild selector.
<script>
function replaceElement(source, newType) {
// Create the document fragment
const frag = document.createDocumentFragment();
// Fill it with what's in the source element
while (source.firstChild) {
frag.appendChild(source.firstChild);
}
// Create the new element
const newElem = document.createElement(newType);
// Empty the document fragment into it
newElem.appendChild(frag);
// Replace the source element with the new element on the page
source.parentNode.replaceChild(newElem, source);
}
// Replace the <b> with a <div>
replaceElement(document.querySelector('b'), 'strong');
</script>
You might use querySelectorAll:
Array.from(document.querySelectorAll('b')).forEach(e=>{
replaceElement(e, 'strong');
});
But this is really a xy question. You really should do the change server side, for example by using some search/replace (learn to use your code editor). You're adding to the code debt here.
Note also that there's no obvious reason to prefer strong over b in HTML5.
Use getElementsByTagName(). It's more efficient than querySelectorAll because it doesn't have to parse selectors, and it describes better what you are really trying to do - get elements by tag name.
var elements = document.getElementsByTagName("b");
replaceElement(elements[0], "strong");
replaceElement(elements[1], "strong");
replaceElement(elements[2], "strong");
You can also iterate over this collection by using Array.from().
You would be better off finding the source of the <b> tags and changing them there as Denys has mentioned.
Updating the DOM would have little benefit and would cause performance issues when there are many tags on a page.
Does this system use a CMS or database to store the content? I would look to use something like these 2 SQL queries to replace them across the site:
update content_table set content_column = replace(content_column, '<b>','<strong>');
update content_table set content_column = replace(content_column, '</b>','</strong>');

JavaScript Library/Function to find Unclosed HTML Tags

I am currently looking for a solution to find and list out any unclosed HTML tags from an arbitrary slice of raw HTML. I don't feel like this should be an awful problem, but I cannot seem to find something that does it in JS. Unfortunately, this needs to be client-side since it is being used for rendering annotations to HTML pages. Obviously, annotations are somewhat nasty business, since they select or apply formatting that may apply to only part of an HTML element (i.e., a markup overlaid onto an existing HTML markup).
One simple use-case is where you might want to only render part of an HTML page, but then inject the rest later. For example, imagine a hypothetical segment:
<p>This is my text <StartDelayedInject/> with a comment I added. </p>
<p> But it doesn't exist until now. </p> <StopDelayedInject/>
I'll be doing some pre-processing to rebuild the HTML so that I wrap partial elements into span-type elements that apply the appropriate formatting. Initially this would be parsed in the form:
<p><span>This is my text</span></p>
After some user action, it would then be modified to a form such as:
<p><span>This is my text</span><span>with a comment I added.</span></p>
<p>But it doesn't exist until now.</p>
This is a very simplified example case (obviously things like ul elements and tables get hairier), but gives the general principle. However, to do this effectively, I need to be able to check a segment of HTML and figure out there are tags that have opened (but not closed). If I know that information, I can wrap the last unterminated text data into a span, close the unclosed tag, and know to return to that point to inject the remainder of the content when needed. However, I need to know the tags that were still open, so that when I inject or modify another segment of content, I can make sure to put it in the right place (e.g., get "with a comment I added." in the first paragraph).
From my understanding of context-free grammars, this should be a relatively trivial task. Each time you open/enter or close/exit a tag, you could just keep a stack of the tags opened but not yet closed. With that said, I'd much rather use a library that's a bit more of a mature solution than make naive parser for that purpose. I'd assume there's some JS HTML parser around that would do this, right? Plenty of them know how to close tags, so so clearly at some point they calculated this.
The problem is that JavaScript only has access to the html in two ways:
In a sense that each element is an object with properties and methods created by the browser on page load.
In a sense that it is a string of text.
Using the first method of interfacing with html, there is no way to detect unclosed tags as you only have access to the objects that the browser creates for you after it parses the html.
Using the second method, you would have to run the entire string of html through an html parser. Some people might assume you could do it simply with regexp, however, this is not feasible. I refer you to this fantastic stackoverflow question.
Even if you found a really robust html parser to use, you would still run into the problem created by the fact that, before your JavaScript even touches it, the browser will have attempted to parse the potentially broken html and there could be errors everywhere.
Edit:
If you like the parser idea, John Resig created this example one you might want to reference.
Not perfect but here's my quick method for checking for mismatch between open/close tags:
function find_unclosed_tags(str) {
str = str.toLowerCase();
var tags = ["a", "span", "div", "ul", "li", "h1", "h2", "h3", "h4", "h5", "h6", "p", "table", "tr", "td", "b", "i", "u"];
var mismatches = [];
tags.forEach(function(tag) {
var pattern_open = '<'+tag+'( |>)';
var pattern_close = '</'+tag+'>';
var diff_count = (str.match(new RegExp(pattern_open,'g')) || []).length - (str.match(new RegExp(pattern_close,'g')) || []).length;
if(diff_count != 0) {
mismatches.push("Open/close mismatch for tag " + tag + ".");
}
});
return mismatches;
}

replace text function not working in explorer

I have a js replace function to replace text next to two radio buttons on a pre set form.
Script is as follows.
document.body.innerHTML=document.body.innerHTML.replace("Payment by <b>Moneybookers</b>
e-wallet<br>","");
document.body.innerHTML=document.body.innerHTML.replace("Maestro, Visa and other credit/debit cards by <b>Moneybookers</b>","Pago con Diners Club, Mastercard o Visa");}onload=x;
The script works fine in Chrome and Firefox, however, the script is not actioned in Explorer.
I believe it has something to do with there being , / - within the text I am replacing? When I use the function to replace text with no , / - in the text - it works fine in explorer, however, for example when I try to replace text.. - "Maestro, Visa and other credit/debit cards by Moneybookers" this does not work in explorer.. I'm assuming because of the coma and forward slash. Honestly I've tried everything but just can not get this to work. Any help would be greatly appreciated!!
Not sure whether it's related (I'm a Mac user without IE) but you shouldn't use multiline strings. Use \n instead.
What is returned by innerHTML varies from one browser to an other, because there is no standard about it (the content will be the same, but the way it's displayed can be different). Doing replace like that is likely to fail on some browser. You should just take an other approach to do your replace.
A better approach would be to wrap the text you want to replace with a span, this way you can more easily target the content you want to replace.
<span id="thatFirstThing">Payment by <b>Moneybookers</b>e-wallet<br></span>
An after you can do
document.getElementById("thatFirstThing").innerHTML = "";
P.S.: Doing innerHTML replace on the body also has a huge side-effect. Since you are replacing the content of your hole page. All the event handler that where bind on your page will disappear.
Edit: If you can't modify the HTML page, it's a little bit more tricky, because the DOM is not well adapted to do such thing. What you could do is to target parent element by navigating through the DOM with document.getElementById and childNodes. And once you have your parent element just write the new content you want, without doing replace.
In the end it would look something like this :
document.getElementById("someSection").childNodes[0].childNodes[1].childNodes[0].innerHTML = "";

What are alternatives to document.write?

In tutorials I've learnt to use document.write. Now I understand that by many this is frowned upon. I've tried print(), but then it literally sends it to the printer.
So what are alternatives I should use, and why shouldn't I use document.write? Both w3schools and MDN use document.write.
The reason that your HTML is replaced is because of an evil JavaScript function: document.write().
It is most definitely "bad form." It only works with webpages if you use it on the page load; and if you use it during runtime, it will replace your entire document with the input. And if you're applying it as strict XHTML structure it's not even valid code.
the problem:
document.write writes to the document stream. Calling document.write on a closed (or loaded) document automatically calls document.open which will clear the document.
-- quote from the MDN
document.write() has two henchmen, document.open(), and document.close(). When the HTML document is loading, the document is "open". When the document has finished loading, the document has "closed". Using document.write() at this point will erase your entire (closed) HTML document and replace it with a new (open) document. This means your webpage has erased itself and started writing a new page - from scratch.
I believe document.write() causes the browser to have a performance decrease as well (correct me if I am wrong).
an example:
This example writes output to the HTML document after the page has loaded. Watch document.write()'s evil powers clear the entire document when you press the "exterminate" button:
I am an ordinary HTML page. I am innocent, and purely for informational purposes. Please do not <input type="button" onclick="document.write('This HTML page has been succesfully exterminated.')" value="exterminate"/>
me!
the alternatives:
.innerHTML This is a wonderful alternative, but this attribute has to be attached to the element where you want to put the text.
Example: document.getElementById('output1').innerHTML = 'Some text!';
.createTextNode() is the alternative recommended by the W3C.
Example: var para = document.createElement('p');
para.appendChild(document.createTextNode('Hello, '));
NOTE: This is known to have some performance decreases (slower than .innerHTML). I recommend using .innerHTML instead.
the example with the .innerHTML alternative:
I am an ordinary HTML page.
I am innocent, and purely for informational purposes.
Please do not
<input type="button" onclick="document.getElementById('output1').innerHTML = 'There was an error exterminating this page. Please replace <code>.innerHTML</code> with <code>document.write()</code> to complete extermination.';" value="exterminate"/>
me!
<p id="output1"></p>
Here is code that should replace document.write in-place:
document.write=function(s){
var scripts = document.getElementsByTagName('script');
var lastScript = scripts[scripts.length-1];
lastScript.insertAdjacentHTML("beforebegin", s);
}
You can combine insertAdjacentHTML method and document.currentScript property.
The insertAdjacentHTML() method of the Element interface parses the specified text as HTML or XML and inserts the resulting nodes into the DOM tree at a specified position:
'beforebegin': Before the element itself.
'afterbegin': Just inside the element, before its first child.
'beforeend': Just inside the element, after its last child.
'afterend': After the element itself.
The document.currentScript property returns the <script> element whose script is currently being processed. Best position will be beforebegin — new HTML will be inserted before <script> itself. To match document.write's native behavior, one would position the text afterend, but then the nodes from consecutive calls to the function aren't placed in the same order as you called them (like document.write does), but in reverse. The order in which your HTML appears is probably more important than where they're place relative to the <script> tag, hence the use of beforebegin.
document.currentScript.insertAdjacentHTML(
'beforebegin',
'This is a document.write alternative'
)
As a recommended alternative to document.write you could use DOM manipulation to directly query and add node elements to the DOM.
Just dropping a note here to say that, although using document.write is highly frowned upon due to performance concerns (synchronous DOM injection and evaluation), there is also no actual 1:1 alternative if you are using document.write to inject script tags on demand.
There are a lot of great ways to avoid having to do this (e.g. script loaders like RequireJS that manage your dependency chains) but they are more invasive and so are best used throughout the site/application.
I fail to see the problem with document.write. If you are using it before the onload event fires, as you presumably are, to build elements from structured data for instance, it is the appropriate tool to use. There is no performance advantage to using insertAdjacentHTML or explicitly adding nodes to the DOM after it has been built. I just tested it three different ways with an old script I once used to schedule incoming modem calls for a 24/7 service on a bank of 4 modems.
By the time it is finished this script creates over 3000 DOM nodes, mostly table cells. On a 7 year old PC running Firefox on Vista, this little exercise takes less than 2 seconds using document.write from a local 12kb source file and three 1px GIFs which are re-used about 2000 times. The page just pops into existence fully formed, ready to handle events.
Using insertAdjacentHTML is not a direct substitute as the browser closes tags which the script requires remain open, and takes twice as long to ultimately create a mangled page. Writing all the pieces to a string and then passing it to insertAdjacentHTML takes even longer, but at least you get the page as designed. Other options (like manually re-building the DOM one node at a time) are so ridiculous that I'm not even going there.
Sometimes document.write is the thing to use. The fact that it is one of the oldest methods in JavaScript is not a point against it, but a point in its favor - it is highly optimized code which does exactly what it was intended to do and has been doing since its inception.
It's nice to know that there are alternative post-load methods available, but it must be understood that these are intended for a different purpose entirely; namely modifying the DOM after it has been created and memory allocated to it. It is inherently more resource-intensive to use these methods if your script is intended to write the HTML from which the browser creates the DOM in the first place.
Just write it and let the browser and interpreter do the work. That's what they are there for.
PS: I just tested using an onload param in the body tag and even at this point the document is still open and document.write() functions as intended. Also, there is no perceivable performance difference between the various methods in the latest version of Firefox. Of course there is a ton of caching probably going on somewhere in the hardware/software stack, but that's the point really - let the machine do the work. It may make a difference on a cheap smartphone though. Cheers!
The question depends on what you are actually trying to do.
Usually, instead of doing document.write you can use someElement.innerHTML or better, document.createElement with an someElement.appendChild.
You can also consider using a library like jQuery and using the modification functions in there: http://api.jquery.com/category/manipulation/
This is probably the most correct, direct replacement: insertAdjacentHTML.
Try to use getElementById() or getElementsByName() to access a specific element and then to use innerHTML property:
<html>
<body>
<div id="myDiv1"></div>
<div id="myDiv2"></div>
</body>
<script type="text/javascript">
var myDiv1 = document.getElementById("myDiv1");
var myDiv2 = document.getElementById("myDiv2");
myDiv1.innerHTML = "<b>Content of 1st DIV</b>";
myDiv2.innerHTML = "<i>Content of second DIV element</i>";
</script>
</html>
Use
var documentwrite =(value, method="", display="")=>{
switch(display) {
case "block":
var x = document.createElement("p");
break;
case "inline":
var x = document.createElement("span");
break;
default:
var x = document.createElement("p");
}
var t = document.createTextNode(value);
x.appendChild(t);
if(method==""){
document.body.appendChild(x);
}
else{
document.querySelector(method).appendChild(x);
}
}
and call the function based on your requirement as below
documentwrite("My sample text"); //print value inside body
documentwrite("My sample text inside id", "#demoid", "block"); // print value inside id and display block
documentwrite("My sample text inside class", ".democlass","inline"); // print value inside class and and display inline
I'm not sure if this will work exactly, but I thought of
var docwrite = function(doc) {
document.write(doc);
};
This solved the problem with the error messages for me.

Categories