Regex to remove <script> and its content in Javascript

Regex to remove <script> and its content in Javascript - javascript

I am trying to remove scripts and their content from html body and this is what I have came up until now
just_text = just_text.replace(/<\s*script[^>]*>(<\s*\/script[^>]*>|$)/ig, '');
It does not work as want to, I still get the content.
Can you please help me?
Thank you

The answer to such questions is always the same: Don't use regular expressions. Instead, parse the HTML, modify the DOM and serialize it back to HTML if you need to.
Example:
var container = document.createElement('div');
container.innerHTML = just_text;
// find and remove `script` elements
var scripts = container.getElementsByTagName('script');
for (var i = scripts.length; i--; ) {
scripts[i].parentNode.removeChild(scripts[i]);
}
just_text = container.innerHTML;
If you want to remove the script tags from the page itself, it's basically the same:
var scripts = document.body.getElementsByTagName('script');
for (var i = scripts.length; i--; ) {
scripts[i].parentNode.removeChild(scripts[i]);
}

Related

How to extract all html text excluding what's inside the script tag?

For a project I want to create a variable that stores all the text within the html, so pretty much everything between tags, titles, paragraphs, everything visible for a user on a webpage. However I don't want my javascript code that's between the script tag to show up in this output too.
I was trying with something like this:
var content = $("html").remove("script").text()
But this is not working.

Here it is:
First use this:
var r = document.getElementsByTagName('script');
for (var i = (r.length-1); i >= 0; i--) {
if(r[i].getAttribute('id') != 'a'){
r[i].parentNode.removeChild(r[i]);
}
}
And then:
var txt = document.body.innerText;
OR
var txt = $('body').text();

var contentDiv = $('<div/>', {
html: $('body').clone()
});
contentDiv.find('script').remove()
return contentDiv.text()

escape blacklist html tags outside of code tags

I have a (white)list of tags that I will allow as html outside of code tags. For any html written within <code> tags, I want to use regex in javascript to replace the < and > characters with < and &62;
So the <b> symbols should be replaced within <code><b>bold</b></code>, but not the <code> tags themselves - they should remain as html.
I don't want to allow the <script> tag outside of the <code> block, so I wont have 'script' in my whitelist. If the script tag is within the code tags, then the ascii replace should take place.
This is quite similar to how the RTE works here on stackoverflow.
I need to do this all client-side using javascript + regex. Any help would be much appreciated.
Thanks

Basically you can do something like this
function changeCode() {
var codeTags = document.getElementsByTagName('code');
for(var i = 0; i < codeTags.length; i++) {
var current = codeTags[i];
current.innerHTML = current.innerHTML.replace(/</g, "<").replace(/>/g, ">");
}
}
window.onload = changeCode; //executes after the DOM is ready
But it needs some work on it, it just changes the HTML of the <code> tag escaping the <,> chars with a simple regex. Anyway you should add something like a common class or name so that the loop can execute the function only on the code tags you want.
for(var i = 0; i < codeTags.length; i++) {
var current = codeTags[i];
if (current.className.indexOf('someclass') !== -1 || current.name !== 'somename') continue;
current.innerHTML = current.innerHTML.replace(/</g, "<").replace(/>/g, ">");
}

Get the dom element that represents the javascript tag where the code that is running is embebed

Example:
<script>
// I want to have the dom element of the script tag that is holding the code that goes here
</script>
If I do:
<script>
alert(this)
</script>
this is the window.
I want to access the script tag itself.

This should find the <script> elements you have in your page.
var scripts = document.getElementsByTagName("script");
for (var i = 0; i < scripts.length; i++) {
var script = scripts[i];
// Now you can do stuff with script
}
If you want the one the code is in I think you can do this:
var scripts = document.getElementsByTagName("script");
var script = scripts[scripts.length - 1];
Just curious: what do you want to do with it once you've got it?

Found in another Stack (https://stackoverflow.com/a/6933817), maybe this answer solve your problem:
var target = document.documentElement; // start at the root element
while (target.childNodes.length && target.lastChild.nodeType == 1) { // find last HTMLElement child node
target = target.lastChild;
}
// target is now the script element
alert(target.parentNode); // this is p

Turn jQuery into vanilla JS - Insert p element after h1

Any ideas on how I would convert this jQuery to vanilla JS:
$('.section > h1').after('<p>This paragraph was inserted with jQuery</p>');
I am new to jQuery and even newer to vanilla JS.
This is as far as I got:
var newP = document.createElement('p');
var pTxt = document.createTextNode('This paragraph was inserted with JavaScript');
var header = document.getElementsByTagName('h1');
Not sure where to go from here?

jQuery does a lot for you behind the scenes. The equivalent plain DOM code might look something like this:
// Get all header elements
var header = document.getElementsByTagName('h1'),
parent,
newP,
text;
// Loop through the elements
for (var i=0, m = header.length; i < m; i++) {
parent = header[i].parentNode;
// Check for "section" in the parent's classname
if (/(?:^|\s)section(?:\s|$)/i.test(parent.className)) {
newP = document.createElement("p");
text = document.createTextNode('This paragraph was inserted with JavaScript');
newP.appendChild(text);
// Insert the new P element after the header element in its parent node
parent.insertBefore(newP, header[i].nextSibling);
}
}
See it in action
Note that you can also use textContent/innerText instead of creating the text node. It's good that you're trying to learn how to directly manipulate the DOM rather than just letting jQuery do all the work. It's nice to understand this stuff, just remember that jQuery and other frameworks are there to lighten these loads for you :)

You might find this function useful (I didn't test)
function insertAfter(node, referenceNode) {
referenceNode.parentNode.insertBefore(node, referenceNode.nextSibling);
}

Oh it's not so bad...
var h1s = document.getElementsByTagName('h1');
for (var i=0, l=h1s.length; i<l; i++) {
var h1 = h1s[i], parent = h1.parentNode;
if (parent.className.match(/\bsection\b/i)) {
var p = document.createElement('p');
p.innerHTML = 'This paragraph was inserted with JavaScript';
parent.insertBefore(p, h1.nextSibling);
}
}

Dynamically adding target attribute to a collection of pre-existing anchor tags within a known div element

I have a div with id="images".
The div contains some images that are each wrapped in an anchor tag with no target attribute.
I'd like to insert script into my page that pulls a reference to each of these anchor elements and ads a target="new" attribute to them (in the runtime) so that when they are clicked they each open in a new window.
I don't want to hardcode the target attributes on the anchor tags. This is a post deployment workaround. I'm not using jquery in this application.
<div id="images"><img src="foo.png" />...etc </div>

No jQuery required! You can do this easily using native DOM methods:
// Find all the anchors you want to modify
var anchors = document.getElementById('images').getElementsByTagName('a'),
i = anchors.length;
// Add the target to each one
while(i--) anchors[i].target = "new";

You can traverse all the anchor elements inside your div, first by looking up the div itself, and then you can use the element.getElementsByTagName method:
var imagesDiv = document.getElementById('images'),
images = imagesDiv.getElementsByTagName('a');
for (var i = 0, n = images.length; i < n; i++) {
images[i].target = "_blank";
}

function replaceAllAnchors(Source,stringToFind,stringToReplace){
//sample call: body=replaceAllAnchors(body,'<a ','<a target="_blank" ');
var temp = Source;
var replacedStr="";
var index = temp.indexOf(stringToFind);
while(index != -1){
temp = temp.replace(stringToFind,stringToReplace);
replacedStr=replacedStr+temp.substr(0,temp.indexOf("/a>")+3);
temp=temp.substr(temp.indexOf("/a>")+3);
index = temp.indexOf(stringToFind);
}
replacedStr=replacedStr+temp;
return replacedStr;
}

Why can't you use jQuery? I've added this here for other people who google.
It's 1 line of code in a loop:
$('#images a').each(function(){ $(this).attr('target', '_blank'); });
Now isn't that much more simple? Use jQuery if you can.

We Keep Coding

JavaScript is the programming language of the Web.

Regex to remove <script> and its content in Javascript - javascript

I am trying to remove scripts and their content from html body and this is what I have came up until now just_text = just_text.replace(/<\sscript[^>]>(<\s\/script[^>]>|$)/ig, ''); It does not work as want to, I still get the content. Can you please help me? Thank you

Related

How to extract all html text excluding what's inside the script tag?

escape blacklist html tags outside of code tags

Get the dom element that represents the javascript tag where the code that is running is embebed

Turn jQuery into vanilla JS - Insert p element after h1

Dynamically adding target attribute to a collection of pre-existing anchor tags within a known div element

Categories

Resources

We Keep Coding

JavaScript is the programming language of the Web.

Regex to remove <script> and its content in Javascript - javascript

I am trying to remove scripts and their content from html body and this is what I have came up until now just_text = just_text.replace(/<\s*script[^>]*>(<\s*\/script[^>]*>|$)/ig, ''); It does not work as want to, I still get the content. Can you please help me? Thank you

Related

How to extract all html text excluding what's inside the script tag?

escape blacklist html tags outside of code tags

Get the dom element that represents the javascript tag where the code that is running is embebed

Turn jQuery into vanilla JS - Insert p element after h1

Dynamically adding target attribute to a collection of pre-existing anchor tags within a known div element

Categories

Resources

I am trying to remove scripts and their content from html body and this is what I have came up until now just_text = just_text.replace(/<\sscript[^>]>(<\s\/script[^>]>|$)/ig, ''); It does not work as want to, I still get the content. Can you please help me? Thank you