Javascript implementation of anti-XSS escaping functions - javascript

The OWASP XSS (Cross Site Scripting) Prevention Cheat Sheet lists rules to prevent XSS attacks by escaping data appropriately, and it contains links to reference implementations of these escaping methods in the Java language (HTML Escape, Attribute Escape, Javsacript Escape, CSS Escape, URL Escape).
Is there an implementation anywhere of these in Javascript, or do I have to 'roll my own'?
UPDATE: I mean Javascript running in the browser. For example, for escaping text rendered with the jQuery html() method (though of course text() is safer), or escaping data rendered using a template engine such as EJS.
UPDATE2: ESAPI JavaScript seems to be what I was looking for, though it's still only "Alpha Quality"

Since you tend to work with the DOM in (client-side) JavaScript, there is no need for HTML and HTML attribute escaping. For example, given untrusted input input,
var el = document.createElement('div');
el.setAttribute('title', input);
el.appendChild(document.createTextNode(input));
is perfectly safe, since you are never constructing (serialized) HTML in the first place.
If you are writing custom JavaScript or CSS from JavaScript code, you are doing something wrong (including using document.write or some data URI script src abominations), so there is no escaping provided for either. You can simply write your code or styles beforehand and then call the appropriate functions or set the appropriate classes.
encodeURI and encodeURIComponent can be used to encode URIs or their components.

You can use js-xss library. For me it worked against test cases I've been using for injecting scripts into HTML.

Related

How to protect against Encoded URL XSS Attack

I got following 2 questions as:
1) How to protect from this kind of XSS Attacks?
https://www.example.com/index.php?&#0000106&#0000097&#0000118&#0000097&#0000115&#0000099&#0000114&#0000105&#0000112&#0000116&#0000058&#0000097&#0000108&#0000101&#0000114&#0000116&#0000040&#0000039&#0000088&#0000083&#0000083&#0000039&#0000041
If suppose for some reason, the query parameter is embedded in an image load event then it would be like this
<img src=x onload="&#0000106&#0000097&#0000118&#0000097&#0000115&#0000099&#0000114&#0000105&#0000112&#0000116&#0000058&#0000097&#0000108&#0000101&#0000114&#0000116&#0000040&#0000039&#0000088&#0000083&#0000083&#0000039&#0000041">
//And browser treats as
<img src=x onload="javascript:alert('XSS')">
I am already using PHP's htmlspecialchars() and Filtar_var() with URL Sanitization, but this kind of encoded XSS will easily get pass through these functions
How can we defend such Encoded XSS or neutralize any such attack?
2) Is it necessary for an XSS attack to get embedded in HTML Page, Javascript or CSS etc in order to get triggered? or there can be a way where XSS does not need to be embedded?
htmlspecialchars is a perfectly good defence against XSS when you are inserting user input into an HTML document.
It stops any HTML syntax in the user input from breaking out of where you intend for it to go and being treated as JavaScript.
Your problem has nothing to do with the fact the attack is encoded. The problem is that are are putting user input somewhere that JavaScript is expected (and onload attribute) so it is already being treated as JavaScript.
json_encode is the usual solution here (and then htmlspecialchars because the JavaScript is inside an HTML attribute).
However, that only works when you are taking user input and putting it into a script to be used as data. Here it seems that you are taking user input and just treating the whole thing as a JavaScript function.
If you do that then you are going to be vulnerable to XSS. You can mitigate it to some degree by implementing defenses against CSRF, but you almost certainly shouldn't be doing this in the first place.

How to check if a string contains JavaScript code?

I'm doing a forum like web app. Users are allowed to submit rich html text to server such as p tag, div tag, etc. In order to keep the format, server will write these tags back to the users' browser directly(without html encoded). So, I must do a potential dangerous script check to avoid XSS. Any JavaScript code is supposed to be dangerous and not allowed. So, How to detect them or any other better solution?
dangerous example 1:
<script>alert('1')</script>
dangerous example 2:
<script src="..."></script>
dangerous example 3:
click me
Use an HTML Parser
Your requirements are straightforward:
You must disallow all <script> tags, but keep certain rich HTML tags.
You must be able to escape inline Javascript in links. i.e. stringify it or strip the unsafe attributes altogether.
The correct way to handle all of these is to employ a modern standards-compliant HTML parser that is able to syntactically analyse the structure of the rich HTML sent over, identifying the tags sent over and discovering the raw values in attributes. This is, in fact, how sanitisation, as one of the comments mentions, is done.
There are a number of pre-existing HTML parsers that are designed to target XSS-unsafe input. The npm library js-xss, for example, appears to be able to do exactly what you want:
Whitelisting only specific tags
Modify unsafe attributes to return a default value
You can even run this server-side as a command line utility.
Similar libraries already exist for most languages, and you should do a thorough search of your preferred language's package repository. Alternatively, you can launch a subprocess and collect your results directly from js-xss from the command line.
Avoid using regular expressions to parse HTML naively - while it is true most HTML parsers end up using regular expressions under the hood, they do so in a fairly limited fashion for strictly well-defined grammars after correctly lexing them.
Use this regex
<script([^'"]|"(\\.|[^"\\])*"|'(\\.|[^'\\])*')*?<\/script>
for detecting all types of <script> tag
but I suggest using a iframe in sandbox mode to show ALL html code, by doing that you prevent javascript code from being able to do anything bad.
http://www.w3schools.com/tags/att_iframe_sandbox.asp
I hope this helps!

What HTML tags would be considered dangerous if stored in SQL Server?

Considering issues like CSRF, XSS, SQL Injection...
Site: ASP.net, SQL Server 2012
I'm reading a somewhat old page from MS: https://msdn.microsoft.com/en-us/library/ff649310.aspx#paght000004_step4
If I have a parametrized query, and one of my fields is for holding HTML, would a simple replace on certain tags do the trick?
For example, a user can type into a WYSIWYG textarea, make certain things bold, or create bullets, etc.
I want to be able to display the results from a SELECT query, so even if I HTMLEncoded it, it'll have to be HTMLDecoded.
What about a UDF that cycles through a list of scenarios? I'm curious as to the best way to deal with the seemingly sneaky ones mentioned on that page:
Quote:
An attacker can use HTML attributes such as src, lowsrc, style, and href in conjunction with the preceding tags to inject cross-site scripting. For example, the src attribute of the tag can be a source of injection, as shown in the following examples.
<img src="javascript:alert('hello');">
<img src="java
script:alert('hello');">
<img src="java
script:alert('hello');">
An attacker can also use the <style> tag to inject a script by changing the MIME type as shown in the following.
<style TYPE="text/javascript">
alert('hello');
</style>
So ultimately two questions:
Best way to deal with this from within the INSERT statement itself.
Best way to deal with this from code-behind.
Best way to deal with this from within the INSERT statement itself.
None. That's not where you should do it.
Best way to deal with this from code-behind.
Use a white-list, not a black-list. HTML encode everything, then decode specific tags that are allowed.
It's reasonable to be able to specify some tags that can be used safely, but it's not reasonable to be able to catch every possible exploit.
What HTML tags would be considered dangerous if stored in SQL Server?
None. SQL Server does not understand, nor try to interpret HTML tags. A HTML tag is just text.
However, HTML tags can be dangerous if output to a HTML page, because they can contain script.
If you want a user to be able to enter rich text, the following approaches should be considered:
Allow users (or the editor they are using) to generate BBCode, not HTML directly. When you output their BBCode markup, you convert any recognised tags to HTML without attributes that contain script, and any HTML to entities (& to &, etc).
Use a tried and tested HTML sanitizer to remove "unsafe" markup from your stored input in combination with a Content Security Policy. You must do both otherwise any gaps (and there will be gaps) in the sanitizer could allow an attack, and not all browsers full support CSP yet (IE).
Note that these should be both be done on point of output. Store the text "as is" in your database, simply encode and process for the correct format when output to the page.
Sanitize html both on the client and on the server before you stuff any strings into SQL.
Client side:
TinyMCE - does this automatically
CKEditor - does this automatically
Server side:
Pretty easy to do this with Node, or the language/platform of your choice.
https://www.realwebsite.com
the link above shows www.realwebsite.com while it actually takes you to www.dangerouswebsite.com...
<a '
href="https://www.dangerouswebsite.com">
https://www.realwebsite.com
<'/a>
do not include the random ' in the code I put it there to bypass activating the code so you can see the code instead of just the link. (btw most websites block this or anything if you add stuff like onload="alert('TEXT')" but it can still be used to trick people into going to dangerous websites... (although its real website pops up on the bottom of your browser, some people don't check it or don't understand what it means.))

Is there any reason to do a solution using a dynamically created <script> tag instead of eval()?

There's some good reasons to avoid the eval() function in JavaScript, namely security risks when including user input in the eval() code. However, in a situation where the eval() code does not include anything affected by user input (in my particular situation, we have dynamic templates defined in XML files - these templates can also specify complex validation functions, javascript code that is embedded in the XML, which is then received by the client via AJAX), is there any reason to avoid the eval() function?
I came up (I'm probably not the first, but I haven't seen this done) with a solution using a dynamically created inline tag instead of eval():
$(scriptObject).text(strJSCode);
A simple example can be seen at http://jsfiddle.net/H7EG9/1/ (I know this example does use user input, but that's just to make it easy to demonstrate).
Is there any reason to do this instead of eval()? The outcome is basically the same, although this option might appear less "scary" to the die-hard foes of eval().
I would use eval instead of creating script tags.
Script tags create overhead (they are DOM elements) but more importantly, you will need to use some sort of global variable to access the script in the script tags. If you use eval, you can simply do
var evalFunction = eval("(function(){...})"); // wrap function in () to make it an expression
var result = evalFunction(val);
IE8 and below do not allow scripts to be served in data: format. In that regard, eval() is more reliable.
That being said, if you are using AJAX to download a JS file from which you are getting that string, you could just set scriptObject.src = 'path/to/script.js'; The browser will have the file in cache and will therefore load it immediately.
eval is easy to write, but it is nearly as easy to add a script element to the head or body and append the js text to the new script element. Some edge cases, like variable hoisting, behave a little oddly when evaling a script.

Is createTextNode completely safe from HTML injection & XSS?

I'm working on a single page webapp. I'm doing the rendering by directly creating DOM nodes. In particular, all user-supplied data is added to the page by creating text nodes with document.createTextNode("user data").
Does this approach avoid any possibility of HTML injection, cross site scripting (XSS), and all the other evil things users could do?
It creates a plain text node, so yes, as far as it goes.
It is possible to create an XSS problem by using an unsafe method to get the data from whatever channel it is being input into to createTextNode though.
e.g. The following would be unsafe:
document.createTextNode('<?php echo $_GET['xss']; ?>');
… but the danger is from the PHP echo, not the JavaScript createTextNode.
Yes, it's XSS safe, as would be using someElement.innerText = "...".
(The sibling answer adds confusion by including the XSS-vulnerable PHP snippet.)

Categories