How to find multiple different matches using regex in Javascript? [duplicate]

How to find multiple different matches using regex in Javascript? [duplicate] - javascript

var html = '<p>sup</p>'
I want to run document.querySelectorAll('p') on that text without inserting it into the dom.
In jQuery you can do $(html).find('p')
If it's not possible, what's the cleanest way to to do a temporary insert making sure it doesn't interfere with anything. then only query that element. then remove it.
(I'm doing ajax requests and trying to parse the returned html)

With IE 10 and above, you can use the DOM Parser object to parse DOM directly from HTML.
var parser = new DOMParser();
var doc = parser.parseFromString(html, "text/html");
var paragraphs = doc.querySelectorAll('p');

You can create temporary element, append html to it and run querySelectorAll
var element = document.createElement('div');
element.insertAdjacentHTML('beforeend', '<p>sup</p>');
element.querySelectorAll('p')

Related

How to remove a complete tag from a string using javascript? [duplicate]

This question already has answers here:
Removing all script tags from html with JS Regular Expression
(14 answers)
Closed 3 years ago.
My input is as follows
input = "hello <script>alert("I am stealing your data");</script>"
I want to remove the complete script tag from the string and the output should look like
output = "hello"
Tried following command but its not removing complete tag.
input.replace(/(<([^>]+)>)/ig, ''));
It gives result us
"hello alert("I am stealing you data");"

You should not use regular expressions for this. Instead use the DOM parser capabilities:
var input = 'hello <script\>alert("I am stealing your data");</script\>';
var span = document.createElement("span");
span.innerHTML = input; // This will not execute scripts
// Remove all script tags within this span element:
Array.from(span.querySelectorAll("script"), script => script.remove());
// Get the remaining HTML out of it
var scriptless = span.innerHTML;
console.log(scriptless);
Just note that it is a very bad idea to let the user pass arbitrary HTML to your application. Sanitising involves a lot more than just removing script tags.

You do not need to use a regular expression, because those can be easy to trick and are not fit for parsing HTML content, especially not untrusted HTML content.
Instead, you can use a DOMParser to create a new document and use the DOM API to find and remove all script tags, then return the rest of the content:
function sanitise(input) {
const parser = new DOMParser();
const doc = parser.parseFromString(input, "text/html");
//find all script tags
const scripts = doc.getElementsByTagName('script');
for (const script of scripts)
script.remove(); //remove from the DOM
return doc.body.textContent.trim();
}
//using the + because otherwise Stack Snippets breaks
console.log(sanitise("hello <script>alert('I am stealing your data');</scr"+"ipt>"))

Looping on Document nodes (Including childrens )

I'm trying to replace "document.write" inside an iframe that contains an ad.
I'm currently parsing the html string via "DOMParser" and i'm getting a dom document in return, i would like to loop on this document and insert each node and any childrens to the real iframe dom.
But i've encountered two problems:
1.if the string from the original document.write doesn't contains <head><html><body> the "DOMParser" api adds them anyway.
2.i can loop on all nodes like this:
ParsedHtml.querySelectorAll("*").forEach(function(node) {
but i'm getting on the first iteration the <html> node with all the nodes as childrens and also the <body> node again with all children nodes, and when the iteration continues i'm getting to the same nodes again.
so what i really like to do is to parse the string from document.write and start inserting node after node including all childrens directly to the iframe dom in the correct order (head first the body ...)
Thanks !
Code example:
var RealDocumentWrite = document.write;
document.write = function(str){
//call our parser here instead of using document.write
}
//our parser:
function Parser(str){
var parser = new DOMParser();
ParsedHtml = parser.parseFromString(str, "text/html");
ParsedHtml.querySelectorAll("*").forEach(function(node) {
//in this for loop the problems that i specified above occur
}
Example execution:
if our input will be :
var samplehtml = '<html><head><script>console.log(1);</script></head><body><div id="111"></div></body></html>'
and we use console.log(node); inside the for loop the output will be :
https://ibb.co/hxzDc8
as you can see in the first iteration the node is the entire html tag including all childrens (head body...) the second is only the head than just the script inside the head than the entire body and than just the div inside the body .
so thats not good for me at all , i need to parse the string as a document and start appending the nodes in the correct order to the iframe thats it
New code:
ParsedHtml.querySelectorAll("head > *").forEach(function(node) {
IframeDoc.head.appendChild(node);
});
ParsedHtml.querySelectorAll("body > *").forEach(function(node) {
IframeDoc.body.appendChild(node);
});

How to convert from a DOM Object to html?

I have taken a string of html in and have converted it to a DOM Object.
document.getElementById("textarea").value
var parser = new DOMParser();
var html = parser.parseFromString(htmlString, 'text/html');
How do I take the DOM Object that I created and convert it back to html? Could you please show an example how I could put it on an html page?

From the HTMLDocument that parseFromString() gives you, you can retrieve the its documentElement and then that element's innerHTML.
console.log(html.documentElement.innerHTML);
Note that the markup may become normalized to make it a valid document, so you may end with more than you started:
var markup = '<span>Foo</span>';
var parser = new DOMParser();
var doc = parser.parseFromString(markup, 'text/html');
console.log(doc.documentElement.innerHTML);
// "<head></head><body><span>Foo</span></body>"
Or, have corrections made for you:
var markup = '<table><div>Foo</div></table>';
// ...
console.log(doc.documentElement.innerHTML);
// "<head></head><body><div>Foo</div><table></table></body>"

You seem to be creating an entire Document, not sure if that is intentional. But, this should work with that you have now:
var parser = new DOMParser();
var html = parser.parseFromString('<b>ok</b>', 'text/html');
document.write(html.body.innerHTML);

How to replace text between two XML tags using jQuery or JavaScript?

I have a XML mark-up/code like the following. I want to replace the text inside one of the tags (in this case <begin>...</begin>) using JavaScript or jQuery.
<part>
<begin>A new beginning</begin>
<framework>Stuff here...</framework>
</part>
The source is inside a textarea. I have the following code, but it is obviously not doing what I want.
code=$("xml-code").val(); // content of XML source
newBegin = "The same old beginning"; // new text inside <begin> tags
newBegin = "<begin>"+newBegin +"</begin>";
code=code.replace("<begin>",newBegin); // replace content
This is just appending to the existing text inside the begin tags. I have a feeling this can be done only using Regex, but unfortunately I have no idea how to do it.

You can use the parseXML() jQuery function, then just replace the appropriate node with .find()/.text()
var s = "<part><begin>A new beginning</begin><framework>Stuff here...</framework></part>";
var xmlDoc = $($.parseXML(s));
xmlDoc.find('begin').text('New beginning');
alert(xmlDoc.text());
http://jsfiddle.net/x3aJc/

Similar to the other answer, using the $.parseXML() function, you could do this:
var xml = $.parseXML($("xml-code").val());
xml.find('begin').text('The same old beginning');
Note that there is no need to replace a whole node, just change it's text. Also, this works if there are multiple <begin> nodes that need the text as well.

You can user regular expression but better dont do it. Use DOM parsers.
var code = $('xml-code').html(); // content of XML source
var newBegin = "The same old beginning"; // new text inside <begin> tags
var regexp = new Regexp('(<part>)[^~]*(<\/part>)', i);
code = code.replace(regexp, '$1' + newBegin + '$2');

Is there a way to convert HTML into normal text without actually write it to a selector with Jquery?

I understand so far that in Jquery, with html() function, we can convert HTML into text, for example,
$("#myDiv").html(result);
converts "result" (which is the html code) into normal text and display it in myDiv.
Now, my question is, is there a way I can simply convert the html and put it into a variable?
for example:
var temp;
temp = html(result);
something like this, of course this does not work, but how can I put the converted into a variable without write it to the screen? Since I'm checking the converted in a loop, thought it's quite and waste of resource if keep writing it to the screen for every single loop.
Edit:
Sorry for the confusion, for example, if result is " <p>abc</p> " then $(#mydiv).html(result) makes mydiv display "abc", which "converts" html into normal text by removing the <p> tags. So how can I put "abc" into a variable without doing something like var temp=$(#mydiv).text()?

Here is no-jQuery solution:
function htmlToText(html) {
var temp = document.createElement('div');
temp.innerHTML = html;
return temp.textContent; // Or return temp.innerText if you need to return only visible text. It's slower.
}
Works great in IE ≥9.

No, the html method doesn't turn HTML code into text, it turns HTML code into DOM elements. The browser will parse the HTML code and create elements from it.
You don't have to put the HTML code into the page to have it parsed into elements, you can do that in an independent element:
var d = $('<div>').html(result);
Now you have a jQuery object that contains a div element that has the elements from the parsed HTML code as children. Or:
var d = $(result);
Now you have a jQuery object that contains the elements from the parsed HTML code.

You could simply strip all HTML tags:
var text = html.replace(/(<([^>]+)>)/g, "");

Why not use .text()
$("#myDiv").html($(result).text());

you can try:
var tmp = $("<div>").attr("style","display:none");
var html_text = tmp.html(result).text();
tmp.remove();
But the way with modifying string with regular expression is simpler, because it doesn't use DOM traversal.
You may replace html to text string with regexp like in answer of user Crozin.
P.S.
Also you may like the way when <br> is replacing with newline-symbols:
var text = html.replace(/<\s*br[^>]?>/,'\n')
.replace(/(<([^>]+)>)/g, "");

var temp = $(your_selector).html();
the variable temp is a string containing the HTML

$("#myDiv").html(result); is not formatting text into html code. You can use .html() to do a couple of things.
if you say $("#myDiv").html(); where you are not passing in parameters to the `html()' function then you are "GETTING" the html that is currently in that div element.
so you could say,
var whatsInThisDiv = $("#myDiv").html();
console.log(whatsInThisDiv); //will print whatever is nested inside of <div id="myDiv"></div>
if you pass in a parameter with your .html() call you will be setting the html to what is stored inside the variable or string you pass. For instance
var htmlToReplaceCurrent = '<div id="childOfmyDiv">Hi! Im a child.</div>';
$("#myDiv").html(htmlToReplaceCurrent);
That will leave your dom looking like this...
<div id="myDiv">
<div id="childOfmyDiv">Hi! Im a child.</div>
</div>

Easiest, safe solution - use Dom Parser
For more advanced usage - I suggest you try Dompurify
It's cross-browser (and supports Node js). only 19kb gziped
Here is a fiddle I've created that converts HTML to text
const dirty = "Hello <script>in script<\/script> <b>world</b><p> Many other <br/>tags are stripped</p>";
const config = { ALLOWED_TAGS: [''], KEEP_CONTENT: true, USE_PROFILES: { html: true } };
// Clean HTML string and write into the div
const clean = DOMPurify.sanitize(dirty, config);
document.getElementById('sanitized').innerText = clean;
Input: Hello <script>in script<\/script> <b>world</b><p> Many other <br/>tags are stripped</p>
Output: Hello world Many other tags are stripped

Using the dom has several disadvantages. The one not mentioned in the other answers: Media will be loaded, causing network traffic.
I recommend using a regular expression to remove the tags after replacing certain tags like br, p, ol, ul, and headers into \n newlines.

We Keep Coding

JavaScript is the programming language of the Web.

How to find multiple different matches using regex in Javascript? [duplicate] - javascript

With IE 10 and above, you can use the DOM Parser object to parse DOM directly from HTML. var parser = new DOMParser(); var doc = parser.parseFromString(html, "text/html"); var paragraphs = doc.querySelectorAll('p');

You can create temporary element, append html to it and run querySelectorAll var element = document.createElement('div'); element.insertAdjacentHTML('beforeend', '<p>sup</p>'); element.querySelectorAll('p')

Related

How to remove a complete tag from a string using javascript? [duplicate]

Looping on Document nodes (Including childrens )

How to convert from a DOM Object to html?

How to replace text between two XML tags using jQuery or JavaScript?

Is there a way to convert HTML into normal text without actually write it to a selector with Jquery?

Categories

Resources