HTML Purifier to clean event attributes

HTML Purifier to clean event attributes - javascript

I'm working to address some Stored XSS vulnerabilities and I am using HTMLPurifier. I have an input box on the page and if I type '" onclick="alert(1);" the code is saved to the database and executed on the client. This is happening even after running the input and output through purifier. It seems as if HTMLpurifier only strips these attr when included within html tag. I'm wondering if there is some config for purifier that will strip just the event attr's or any other suggestions on how to cleans these up.

HTML Purifier is purely intended for use on content which will be used as HTML on a page. It is not appropriate for validating content which, for example, will go in an attribute for an HTML element.
You can use some internal APIs of HTML Purifier to validate content for this case. However, for the example quoted in the comments, all you need is htmlspecialchars to do the right thing. The right choice of validator depends on what attribute you put the content in.

Related

Disable javascript for some parth of my page

How can I disable javascript for some part of my page. For examle I have next structure
html
head
my js files
head
body
div
my components(they use javascript)
div
div
some untrusted content(may be some elements with javascript triggers 'on load' or smt. like that
div
body
html
I don`t want to process this content and only give it 'AS IS' but dont be vulnerable for XSS attack.
update
I want to build small service for posting text information from simple form and saving to the database. And I want to show it for user in preview mode on the html page(include two elements - header and body ).

Try replacing the <'s in your untrusted content with <'s:
var somewhatSanitizedContent = untrustedContent.replace("<", "<");
This should make all HTML tags appear as plain text, disabling <script> tags as well.
However, it may be a good idea to sanitize your input before storing it, instead of after.

How does a browser render this inline JavaScript within an encoded tag?

I was trying to perform a Reflective XSS attack on a tutorial website. The webpage basically consists of a form with an input field and a submit button. On submitting the form, the content of the input field are displayed on the same webpage.
I figured out that the website is blacklisting script tag and some of the JavaScript methods in order to prevent an XSS attack. So, I decided to encode my input and then tried submitting the form. I tried 2 different inputs and one of them worked and the other one didn't.
When I tried:
<body onload="&#97lert('Hi')"></body>
It worked and an alert box was displayed. However, I when encoded some characters in the HTML tag, something like:
&#60body onload="&#97lert('Hi')"&#62&#60/body&#62
It didn't work! It simply printed <body onload="alert('Hi')"></body> as it is on the webpage!
I know that the browsers execute inline JavaScript as they parse an HTML document (please correct me if I'm wrong). But, I'm not able to understand why did the browser show different behavior for the different inputs that I've mentioned.
-------------------------------------------------------------Edit---------------------------------------------------------
I tired the same with a more basic XSS tutorial with no XSS protection. Again:
<script>alert("Hi")</script> -> Worked!
&#60s&#99ript&#62&#97lert("Hi")&#60/s&#99ript&#62 -> Didn't work! (Got printed as string on the Web Page)
So basically, if I encode anything in JavaScript, it works. But if I'm encoding anything that is HTML, it's not executing the JavaScript within that HTML!

I can't come up with words to describe the properly, so i'll just give you an example. Lets say we have this string:
<div>Hello World! <span id="foo">Foobar</span></div>
When this gets parsed, you end up with a div element that contains the text:
Hello World! <span id="foo">Foobar</span>
Note, while there is something that looks like html inside the text, it is still just text, not html. For that text to become html, it would have to be parsed again.
Attributes work a little bit differently, html entities in attributes do get parsed the first time.
tl;dr:
if the service you are using is stripping out tags, there's nothing you can do about it unless the script is poorly written in a way that results in the string getting parsed twice.
Demo: http://jsfiddle.net/W6UhU/ note how after setting the div's inner html equal to it's inner text, the span becomes an html element rather than a string.

When an HTML page says &#60body It treats it the same as if it said <body
That is, it just displays the encoded characters, doesn't parse them as HTML. So you're not creating a new tag with onload attributes http://jsfiddle.net/SSfNw/1/
alert(document.body.innerHTML);
// When an HTML page says <body It treats it the same as if it said <body
So in your case, you're never creating a body tag, just content that ends up getting moved into the body tag http://jsfiddle.net/SSfNw/2/
alert(document.body.innerHTML)
// <body onload="alert('Hi')"></body>
In the case <body onload="&#97lert('Hi')"></body>, the parser is able to create the body tag, once within the body tag, it's also able to create the onload attribute. Once within the attribute, everything gets parsed as a string.

How to remove all javascript from an html string (with javascript or jquery)?

I want to display html provided by a user in a page. My page is almost entirely dynamic (JS code), and I was wondering if there's an easy way to sanitize it?
Like, maybe I could remove all the <script> and <iframe> tags and unbind all the events contained in the string (or remove any html attribute starting by 'on') in order to not have any javascript code from the string possibly executed?
Can the users possibly insert javascript with a css 'content' property in a style attribute?
The jquery $(...).text(...) function doesn't help me, since I want to preserve any html mark-up or css styling.
If there's no easy solution i'm ready to live with a whitelist of html tags (table span div img a b u i strong...), but i'd rather not have to white-list the attributes too.

The more foolproof way to show user content safely is to embed it in an iframe who's origin is a different domain than your host web page. This is what jsFiddle does. The main page is served from jsfiddle.net, but the user scripts are served from fiddle.jshell.net. This lets the user content do what it would normally do, but the browser's cross-origin protection keeps the user content from messing with the host page or domain or cookies, etc....
Trying to strip all possible places that scripts could be in the content is a risky proposition which you will probably forever be chasing new attack vectors. I'd personally much rather let the browser be in that business and put the user content on a different domain. Plus, allowing the user content to have it's normal JS will also let it work as desired.

Possible to create custom "DOMs" by loading HTML from string in Javascript?

I'm trying to parse HTML in the browser. The browser receives 2 HTML files as strings, eg. HTML1 and HTML2.
I now need to parse these "documents" just as one would parse the current document. This is why I was wondering if it is possible to create custom documents based on these HTML strings (these strings are provided by the server or user).
So that for example the following would be valid:
$(html1Document).$("#someDivID")...
If anything is unclear, please ask me to clarify more.
Thanks.

var $docFragment = $(htmlString);
$docFragment.find("a"); // all anchors in the HMTL string
Note that this ignores any document structure tags (<html>, <head> and <body>), but any contained tags will be available.

With jQuery you can do this:
$(your_document_string).someParsingMethod().another();

You can always append your html to some hidden div (though innerHTML or jQuery .html(..)). It won't be treated exactly as a new document, but still will be able to search its contents.
It has a few side-effects, though. For example, if your html defines any script tags, they'll be loaded. Also, browser may (and probably will) remove html, body and similar tags.
edit
If you specifically need title and similar tags, you may try iframe loading content from your server.

Get raw HTML from a div using js?

I'm working on a website where users can create and save their own HTML forms. Instead of inserting form elements and ids one by one in the database I was thinking to use js (preferably jquery) to just get the form's HTML (in code source format) and insert it in a text row via mysql.
For example I have a form in a div
<div class="new_form">
<form>
Your Name:
<input type="text" name="something" />
About You:
<textarea name=about_you></textarea>
</form>
</div>
With js is it possible to get the raw HTML within the "new_form" div?

To get all HTML inside the div
$(".new_form").html()
To get only the text it would be
$(".new_form").text()
You might need to validate the HTML, this question might help you (it's in C# but you can get the idea)

Yes, it is. You use the innerHTML property of the div. Like this:
var myHTML = document.getElementById('new_form').innerHTML;

Note when you use innerHTML or html() as above you won't get the exact raw HTML you put in. You'll get the web browser's idea of what the current document objects should look like serialised into HTML.
There will be browser differences in the exact format that comes out, in areas like name case, spacing, attribute order, which characters are &-escaped, and attribute quoting. IE, in particular, can give you invalid HTML where attributes that should be quoted aren't. IE will also, incorrectly, output the current values of form fields in their value attributes.
You should also be aware of the cross-site-scripting risks involved in letting users submit arbitrary HTML. If you are to make this safe you will need some heavy duty HTML ‘purification’.

We Keep Coding

JavaScript is the programming language of the Web.

HTML Purifier to clean event attributes - javascript

Related

Disable javascript for some parth of my page

How does a browser render this inline JavaScript within an encoded tag?

How to remove all javascript from an html string (with javascript or jquery)?

Possible to create custom "DOMs" by loading HTML from string in Javascript?

Get raw HTML from a div using js?

Categories

Resources