Prevent jQuery from downloading resources when parsing HTML string

Prevent jQuery from downloading resources when parsing HTML string - javascript

var strHTML = "<div><img src='/fake/path/fakeImage.jpg'/><span id='target'>text to extract</span></div>";
var dom = $(strHTML);
var extractedText = dom.find("#target").text();
alert(extractedText);
When I convert the HTML string to a jQuery object, jQuery makes GET request to retrieve pictures as you can see in the network tab in the developer tools.
JsFiddle
How can I convert a HTML string to jQuery object without downloading any resources from the parsed string ?
Note : jQuery.parseHTML does not return a jQuery object, you cannot use .find() for example.

I don't think this is possible, since its not jQuery (or javascript) that does the image loading but the browser - as soon as a src attribute is set on an img element the browser will attempt to download it.
One thing you can do is change the element name from img to something else before building the dom, or change the src attribute to something else, for example:
// change the <img> elements to <my_img> to avoid image fetching
strHtml = strHtml.replace(/<img /gi, "<my_img ").replace(/<\/img>/gi, "</my_img>");
// or the 2nd alternative: change the src attribute of images
strHtml = strHtml.replace(/<img([^>]*?) src=/gi, "<img$1 my_src=")
// now its safe to parse into DOM - no images will be fetched
var dom = $(strHtml);
note this simple "search and replace" may replace texts other than the elements you want, but it may be sufficient for your use case.

You can feed it through $.parseXML first:
var strHTML = "<div><img src='/fake/path/fakeImage.jpg'/><span id='target'>text to extract</span></div>";
var dom = $($.parseXML(strHTML));
var extractedText = dom.find("#target").text();
alert(extractedText);

Related

Remove tag and content using Javascript

What I want is to get the content of a specific #id, remove a few tags and its contents, add some html before and after it, and then set that finished content as the body content.
I now have the following code, containing a mixture of Javascript and jQuery, although obviously not the right one - resulting in a [object Object]-message.
My code looks like this:
var printContents = jQuery("#"+id).clone();
var printContentsBefore = '<html><head></head><body><table><tr>';
var printContentsAfter = '</tr></table></body></html>';
var mainContents = printContents.find(".td_print").remove();
document.body.innerHTML = printContentsBefore + mainContents + printContentsAfter;
Any ideas of how to make this work?

Your code does not convert the cloned jquery object to a string. Modify your code as follows:
document.body.innerHTML = printContentsBefore + mainContents.html() + printContentsAfter;
Beware that the .html method output will not include the html representation of the container element itself (ie. of the #id clone in your case).

getting the src of a link with jquery

I have a link like :
<a rel="Test Images" class="thickbox preview_link" href="http://www.localhost.com:8080/testwp/wp-content/uploads/2013/12/2013-10-02_1728.png">
i need to get the url of that image inside a javascript file loaded on the same page.
i tried something like:
image_src = jQuery('a[class=thickbox preview_link]').attr('href');
but all i get is Uncaught Error: Syntax error, unrecognized expression: a[class=thickbox preview_link] in the console.
i am using jquery 1.10.2 on the site

You would do:
jQuery('a.thickbox.preview_link').attr('href');
Your attribute selector syntax is incorrect since it has space you need to wrap them in quotes ('a[class="thickbox preview_link"]'), but you can always use class selector which would be mostly faster than the attribute selector and the order doesn't matter as well.

Just in case you need it, here's the vanilla Javascript version
Get the image(s)
var image = document.getElementsByClassName('thickbox preview_link');
Getting the href (of the first image)
var image_href = image[0].getAttribute('href');
Better version
// Declare the image_href variable
var image_href;
// Getting a nodeList of all the applicable images
var image = document.getElementsByClassName('thickbox preview_link');
// If there's only 1 image and/or you only want the first one's href
if(image[0] !== undefined) {
// if condition to check whether or not the DOM has the images in the first place
// if yes, update the image_href variable with the href attribute
image_href = image[0].getAttribute('href');
}
Best of luck!

thickbox preview_link is actually 2 class tokens, so you're looking for a.thickbox.preview_link or (for that specific attribute) a[class="thickbox preview_link"] (notice the quotes)
image_src = jQuery('a.thickbox.preview_link').attr('href');
// or
image_src = jQuery('a[class="thickbox preview_link"]').attr('href');

How can I use this dom xss?

In this case:
var count = $(count_el).data("count");
$(target).html("hello"+count);
Does it mean that only if we can change the data('count') in the url can we use this dom xss?
like <script>codes</script> etc.

jQuery.data stores the information on the element or object itself (if it is an element it first pulls the data from the data-* tags).
$el = $("<div data-count='1'></div>")
$el.data('count') // 1
$el.data() // {"count":1}
If, however, you are worried about XSS and somehow malicious HTML is embedded in your data tags, you could use .text() instead of .html().
$parent = $("<div></div>");
$child = $("<div data-count='<script>alert(\"malicious code\");</script>'></div>");
count = $child.data('count')
$parent.html(count) // this will run the embedded script
$parent.text(count) // this will not

Why does innerHTML not change src of an image?

I have to set one src to an image object. Then I change it.
But if I add something to the element (content of element), such as
meaning.innerHTML += ")";
(where meaning is parent element of image), then if change the src of object it won't affect the document.
Example: http://jsfiddle.net/WcnCB/3/
Could you explain me why it happens, and how to fix it?

meaning.innerHTML += ')'; does more than you think. Visually it just appends a ) character, but behind the scenes what happens is:
meaning.innerHTML = meaning.innerHTML + ')';
So, you're first converting the DOM to a string representation (HTML), then adding a ) character, and finally have convert it back from HTML to the DOM. All elements the HTML represents are created again, and meaning is replaced by those new elements. So your old one is distroyed.
The simplest solution is to use createTextNode: http://jsfiddle.net/WcnCB/4/.
meaning.appendChild(document.createTextNode(")"));

By writing innerHTML += ... you are overwriting the previous HTML and destroying every reference to it - including the actual_button variable.
Why are you using innerHTML += ... anyway? You should be doing:
meaning.appendChild(document.createTextNode("(Something"));

When you do the greatest sin of all, that is .innerHTML += (specifically innerHTML combined with +=, neither of them are bad alone), what happens is:
Serialize the element's DOM subtree into a html string.
Concatenate some stuff into that html string
Remove all elements from the target element
Parse the html resulted above into a new DOM subtree. This means all the elements are new.
Append that into the target element
So given this, actual_button refers to a detached dom element. Not to the another img element created from parsing html.

Works if you set the image ID and get it after changing innerHTML :
var meaning = document.getElementById('meaning');
meaning.innerHTML += 'Something ...';
var actual_button = document.createElement('img');
actual_button.id = 'actual_button';
actual_button.src = 'http://www.pawelbrewczynski.tk/images/add.png';
actual_button.className = 'add_word';
meaning.appendChild(actual_button);
meaning.innerHTML += " ... and another.";
var actual_button= document.getElementById('actual_button');
actual_button.src = 'http://www.pawelbrewczynski.tk/images/loading.gif';
http://jsfiddle.net/j8yEG/1/

Is there a way to convert HTML into normal text without actually write it to a selector with Jquery?

I understand so far that in Jquery, with html() function, we can convert HTML into text, for example,
$("#myDiv").html(result);
converts "result" (which is the html code) into normal text and display it in myDiv.
Now, my question is, is there a way I can simply convert the html and put it into a variable?
for example:
var temp;
temp = html(result);
something like this, of course this does not work, but how can I put the converted into a variable without write it to the screen? Since I'm checking the converted in a loop, thought it's quite and waste of resource if keep writing it to the screen for every single loop.
Edit:
Sorry for the confusion, for example, if result is " <p>abc</p> " then $(#mydiv).html(result) makes mydiv display "abc", which "converts" html into normal text by removing the <p> tags. So how can I put "abc" into a variable without doing something like var temp=$(#mydiv).text()?

Here is no-jQuery solution:
function htmlToText(html) {
var temp = document.createElement('div');
temp.innerHTML = html;
return temp.textContent; // Or return temp.innerText if you need to return only visible text. It's slower.
}
Works great in IE ≥9.

No, the html method doesn't turn HTML code into text, it turns HTML code into DOM elements. The browser will parse the HTML code and create elements from it.
You don't have to put the HTML code into the page to have it parsed into elements, you can do that in an independent element:
var d = $('<div>').html(result);
Now you have a jQuery object that contains a div element that has the elements from the parsed HTML code as children. Or:
var d = $(result);
Now you have a jQuery object that contains the elements from the parsed HTML code.

You could simply strip all HTML tags:
var text = html.replace(/(<([^>]+)>)/g, "");

Why not use .text()
$("#myDiv").html($(result).text());

you can try:
var tmp = $("<div>").attr("style","display:none");
var html_text = tmp.html(result).text();
tmp.remove();
But the way with modifying string with regular expression is simpler, because it doesn't use DOM traversal.
You may replace html to text string with regexp like in answer of user Crozin.
P.S.
Also you may like the way when <br> is replacing with newline-symbols:
var text = html.replace(/<\s*br[^>]?>/,'\n')
.replace(/(<([^>]+)>)/g, "");

var temp = $(your_selector).html();
the variable temp is a string containing the HTML

$("#myDiv").html(result); is not formatting text into html code. You can use .html() to do a couple of things.
if you say $("#myDiv").html(); where you are not passing in parameters to the `html()' function then you are "GETTING" the html that is currently in that div element.
so you could say,
var whatsInThisDiv = $("#myDiv").html();
console.log(whatsInThisDiv); //will print whatever is nested inside of <div id="myDiv"></div>
if you pass in a parameter with your .html() call you will be setting the html to what is stored inside the variable or string you pass. For instance
var htmlToReplaceCurrent = '<div id="childOfmyDiv">Hi! Im a child.</div>';
$("#myDiv").html(htmlToReplaceCurrent);
That will leave your dom looking like this...
<div id="myDiv">
<div id="childOfmyDiv">Hi! Im a child.</div>
</div>

Easiest, safe solution - use Dom Parser
For more advanced usage - I suggest you try Dompurify
It's cross-browser (and supports Node js). only 19kb gziped
Here is a fiddle I've created that converts HTML to text
const dirty = "Hello <script>in script<\/script> <b>world</b><p> Many other <br/>tags are stripped</p>";
const config = { ALLOWED_TAGS: [''], KEEP_CONTENT: true, USE_PROFILES: { html: true } };
// Clean HTML string and write into the div
const clean = DOMPurify.sanitize(dirty, config);
document.getElementById('sanitized').innerText = clean;
Input: Hello <script>in script<\/script> <b>world</b><p> Many other <br/>tags are stripped</p>
Output: Hello world Many other tags are stripped

Using the dom has several disadvantages. The one not mentioned in the other answers: Media will be loaded, causing network traffic.
I recommend using a regular expression to remove the tags after replacing certain tags like br, p, ol, ul, and headers into \n newlines.

We Keep Coding

JavaScript is the programming language of the Web.

Prevent jQuery from downloading resources when parsing HTML string - javascript

You can feed it through $.parseXML first: var strHTML = "<div><img src='/fake/path/fakeImage.jpg'/><span id='target'>text to extract</span></div>"; var dom = $($.parseXML(strHTML)); var extractedText = dom.find("#target").text(); alert(extractedText);

Related

Remove tag and content using Javascript

getting the src of a link with jquery

How can I use this dom xss?

Why does innerHTML not change src of an image?

Is there a way to convert HTML into normal text without actually write it to a selector with Jquery?

Categories

Resources