jQuery get html in div without any markup - javascript

I have some script written using the jQuery framework.
var site = {
link: $('#site-link').html()
}
This gets the html in the div site-link and assigns it to link. I later save link to the DB.
My issue is I don't want the html as I see this as being to dangerous, maybe?
I have tried:
link: $('#site-link').val()
... but this just gives me a blank value.
How can I get the value inside the div without any markup?

Try doing this:
$('#site-link').text()
From the jQuery API Documentation:
Get the combined text contents of each element in the set of matched
elements, including their descendants, or set the text contents of the
matched elements.

Use the .text() jquery method like this:
var site = {
link: $('#site-link').text()
}
Here is an example of what .val(), .html() and .text() do: jsfiddle example

Use the text() method.
Get the combined text contents of each element in the set of matched elements, including their descendants, or set the text contents of the matched elements.

Use the .text() function of jQuery to get the only text.
var site = {
link: $('#site-link').text()
}

to avoid html, you will be required to use text() method of jquery.
var site = {
link: $('#site-link').text()
}
http://api.jquery.com/text/

If you are planning to store the result in the database and you are concerned about HTML, than using something like .text() rather than .html() is just an illusion of security.
NEVER EVER trust anything that comes from the client side!
Everything on the client side is replaceble, hijackable by the client rather easily. With the Tamper Data firefox plugin for example, even my mother could change the data sent to the server. She could send in anything in place of the link. Like malicious scripts, whole websites, etc...
It is important that before saving the "link" to the database you validate it on the server side. You can write a regex to check if a string is a valid url, or just replace everything that is html.
It's also a good idea to html encode it before outputting. This way even if html gets into your database, after encoding it will be just a harmless string (well there are other stuff to be aware of like UTF-7, but the web is a dangerous place).

Related

Clone HTML and remove some nodes using jQuery

I'm developing a little tool for live editing using Chrome DevTools, and I have a little button "Save" which grabs the HTML and sends it to server to update the static file (.html) using Ajax. Very simple indeed.
My problem is that I need to filter the HTML code before sending it to the server, I need to remove some nodes and I'm trying to achive this using jQuery, something like this:
// I grab all the HTML code
var html = $('<div>').append($('html').clone()).html();
// Now I need to remove some nodes using jQuery
$(html).find('#some-node').remove();
// Send the filtered HTML to server
$.post('url/to/server/blahblahblah');
I already tried this Using jQuery to search a string of HTML with no success. I can't achieve to use jQuery on my cloned HTML code.
Any idea about how to do this?
The DOM is not a string of HTML. With jQuery, you do DOM manipulation, not string manipulation.
What you're doing is
cloning the document (unnecessary because you convert it to HTML anyway),
appending that cloned document to a new div for some reason
converting the content of that div to an HTML string
converting that HTML back to DOM nodes $(html) (so we're back to the first point above)
finding and removing an element in those nodes
presumably posting the html variable to the server.
Unfortunately, the html string has not changed because you manipulated DOM nodes, not the string.
Hopefully you can see above that you're doing all sorts of conversions that have little to do with what you ultimately want.
I don't know wny you'd need to do this, but all you need is to do a .clone(), then the .find().remove(), then .html()
var result = $("html").clone(false);
result.find("#some-node").remove();
var html = result.html();
Maybe like this?
var html = $('html').clone();
html.find('#some-node').remove();

jQuery: html() function does not match real HTML

I am trying to get the EXACT html content of a div.
When using the html() function from jQuery, the result does not match the actual content.
Please check this fiddle and click on the black square:
http://jsfiddle.net/qRska/6/
The code:
<div id="mydiv" style="width:100px; height: 100px; background-color:#000000; cursor:pointer;">
<div id="INSIDE" style="background-color:#ffffff; border-style:none;"></div>
</div>
$('#mydiv').click(function() {
alert($(this).html());
});
jQuery change the color to RGB format and remove the border-style attribute.
How can I solve this problem?
The browser consumes the HTML, generates a DOM, then discards the HTML. innerHTML (which is what .html() eventually hits) gives a serialisation of the DOM back to HTML.
If you want to get the raw HTML, then you'll need to use XMLHttpRequest to fetch the source code of the current URL and then process it yourself.
What you want to do is unfortunately not possible. The original HTML is not available after it is parsed by the browser, so you have to jump through some hoops to prevent the browser from processing it.
One possible solution that I've used before is to wrap the HTML in comment tags, which would remain unchanged by the browser. You can then extract the comment using jQuery's .text() method; strip out the comment tags with string replacement; make the necessary changes to the markup; and then inject it back into the document.
The other alternative is to use AJAX to load the HTML. Make sure you set the contentType to 'text' so it doesn't get processed by the browser.

jQuery parse HTML without loading images

I load HTML from other pages to extract and display data from that page:
$.get('http://example.org/205.html', function (html) {
console.log( $(html).find('#c1034') );
});
That does work but because of the $(html) my browser tries to load images that are linked in 205.html. Those images do not exist on my domain so I get a lot of 404 errors.
Is there a way to parse the page like $(html) but without loading the whole page into my browser?
Actually if you look in the jQuery documentation it says that you can pass the "owner document" as the second argument to $.
So what we can then do is create a virtual document so that the browser does not automatically load the images present in the supplied HTML:
var ownerDocument = document.implementation.createHTMLDocument('virtual');
$(html, ownerDocument).find('.some-selector');
Use regex and remove all <img> tags
html = html.replace(/<img[^>]*>/g,"");
Sorry for resuscitating an old question, but this is the first result when searching for how to try to stop parsed html from loading external assets.
I took Nik Ahmad Zainalddin's answer, however there is a weakness in it in that any elements in between <script> tags get wiped out.
<script>
</script>
Inert text
<script>
</script>
In the above example Inert text would be removed along with the script tags. I ended up doing the following instead:
html = html.replace(/<\s*(script|iframe)[^>]*>(?:[^<]*<)*?\/\1>/g, "").replace(/(<(\b(img|style|head|link)\b)(([^>]*\/>)|([^\7]*(<\/\2[^>]*>)))|(<\bimg\b)[^>]*>|(\b(background|style)\b=\s*"[^"]*"))/g, "");
Additionally I added the capability to remove iframes.
Hope this helps someone.
Using the following way to parse html will load images automatically.
var wrapper = document.createElement('div'),
html = '.....';
wrapper.innerHTML = html;
If use DomParser to parse html, the images will not be loaded automatically. See https://github.com/panzi/jQuery-Parse-HTML/blob/master/jquery.parsehtml.js for details.
You could either use jQuerys remove() method to select the image elements
console.log( $(html).find('img').remove().end().find('#c1034') );
or remove then from the HTML string. Something like
console.log( $(html.replace(/<img[^>]*>/g,"")) );
Regarding background images, you could do something like this:
$(html).filter(function() {
return $(this).css('background-image') !== '';
}).remove();
The following regex replace all occurance of <head>, <link>, <script>, <style>, including background and style attribute from data string returned by ajax load.
html = html.replace(/(<(\b(img|style|script|head|link)\b)(([^>]*\/>)|([^\7]*(<\/\2[^>]*>)))|(<\bimg\b)[^>]*>|(\b(background|style)\b=\s*"[^"]*"))/g,"");
Test regex: https://regex101.com/r/nB1oP5/1
I wish there is a a better way to work around (other than using regex replace).
Instead of removing all img elements altogether, you can use the following regex to delete all src attributes instead:
html = html.replace(/src="[^"]*"/ig, "");

Most secure javascript JSON Inline technique

I'm using varnish+esi to return external json content from a RESTFul API.
This technique allows me to manage request and refresh data without using webserver resources for each request.
e.g:
<head>
....
<script>
var data = <esi:include src='apiurl/data'>;
</script>
...
After include the esi varnish will return:
var data = {attr:1, attr2:'martin'};
This works fine, but if the API returns an error, this technique will generate a parse error.
var data = <html><head><script>...api js here...</script></head><body><h1 ... api html ....
I solved this problem using a hidden div to parse and catch the error:
...
<b id=esi-data style=display:none;><esi:include src='apiurl/data'></b>
<script>
try{
var data = $.parseJSON($('#esi-data').html());
}catch{ alert('manage the error here');}
....
I've also tried using a script type text/esi, but the browser renders the html inside the script tag (wtf), e.g:
<script id=esi-data type='text/esi'><esi:include src='apiurl/data'></script>
Question:
Is there any why to wrap the tag and avoid the browser parse it ?
Let me expand upon the iframe suggestion I made in my comment—it's not quite what you think!
The approach is almost exactly the same as what you're doing already, but instead of using a normal HTML element like a div, you use an iframe.
<iframe id="esi-data" src="about:blank"><esi:include src="apiurl/data"></iframe>
var $iframe = $('#esi-data');
try {
var data = $.parseJSON($iframe.html());
} catch (e) { ... }
$iframe.remove();
#esi-data { display: none; }
How is this any different from your solution? Two ways:
The data/error page are truly hidden from your visitors. An iframe has an embedded content model, meaning that any content within the <iframe>…</iframe> tags gets completely replaced in the DOM—but you can still retrieve the original content using innerHTML.
It's valid HTML5… sort-of. In HTML5, markup inside iframe elements is treated as text. Sure, you're meant to be able to parse it as a fragment, and it's meant to contain only phrasing content (and no script elements!), but it's essentially just treated as text by the validator—and by browsers.
Scripts from the error page won't run. The content gets parsed as text and replaced in the DOM with another document—no chance for any script elements to be processed.
Take a look at it in action. If you comment out the line where I remove the iframe element and inspect the DOM, you can confirm that the HTML content is being replaced with an empty document. Also note that the embedded script tag never runs.
Important: this approach could still break if the third party added an iframe element into their error page for some reason. Unlikely as this may be, you can bulletproof the approach a little more by combining your technique with this one: surround the iframe with a hidden div that you remove when you're finished parsing.
Here I go with another attempt.
Although I believe you already have the possibly best solution for this, I could only imagine that you work around it with a fairly low-performance method of calling esi:insert in a separate HTML window, then retrieve the contents as if you were using AJAX on the server. Perhaps similar to this? Then check the contents you retrieved, maybe by using json_decode and on success generate an error JSON string.
The greatest downside I see to this is that I believe this would be very consuming and most likely even delays your requests as the separate page is called as if your server yourself was a client, parsed, then sent back.
I'd honestly stick to your current solution.
this is a rather tricky problem with no real elegant solution, if not with no solution at all
I asked you if it was an HTML(5) or XHTML(5) document, because in the later case a CDATA section can be used to wrap the content, changing slightly your solution to something like this :
...
<b id='esi-data' style='display:none;'>
<![CDATA[ <esi:include src='apiurl/data'> ]]>
</b>
<script>
try{
var data = $.parseJSON($('#esi-data').html());
}catch{ alert('manage the error here');}
....
Of crouse this solution works if :
you're using XHTML5 and
the error contains no CDATA section (because CDATA section nesting is impossible).
I don't know if switching from one serialization to the other is an option, but I wanted to clarify the intent of my question. It will hopefully help you out :).
Can't you simply change your API to return JSON { "error":"error_code_or_text" } on error? You can even do something meaningful in your interface to alert user about error if you do it that way.
<script>var data = 999;</script>
<script>
data = <esi:include src='apiurl/data'>;
</script>
<script>
if(data == 999) alert("there was an error");
</script>
If there is an error and "data" is not JSON, then a javascript error will be thrown. The next script block will pick that up.

Let a user copy the source code... of an element with jquery

I can understand basic javascript and jquery but I'm having a hard time understanding how to allow a user to see the source code of an element for example.
If I have an element on a webpage like this
`<p>Hi I'm an element</p>`
every body knows it will be displayed as this
Hi I'm an element
but I want a user to see this in its source code form
`<p>Hi I'm an element</p>`
How on earth is this done??
The basic idea is to get the HTML of an element, then show it somewhere as plain-text. We can use .html() to get the HTML and then .text() to output the same HTML as plain-text:
//on the click of a link
​$('a')​.on('click', function () {
//append a container with the plain-text HTML of an element
$('body').append($('<div />').text($('form').html()));
});​​
Here is the demo: http://jsfiddle.net/YbJfs/
Note that this does not get the actual <form> tag, but you could place the form in a container, select the container, and then use the .html() if that container and you'll have the <form> tag as well.
Also, if you want to add the HTML to a form input or text-area, you can use .val() rather than .text().
Here is a demo: http://jsfiddle.net/YbJfs/1/
You can use...
element.outerHTML;
...though it isn't technically the "source code". It's the HTML rendered by the browser, which may have some differences.
Also, you need a shim for Firefox 10 and lower.
function outerHTML(el) {
return el.outerHMTL || document.createElement('div')
.appendChild(el.cloneNode(true))
.parentNode
.innerHTML;
}
to grab the html of an element either use native javascripts innerHTML, or if you want to use jQuery use html() method. Examples ...
javascript:
var html = document.getElementById('myOb').innerHTML;
jQuery:
var html = $('#myOb').html();

Categories