Preventing DOM XSS - javascript

Preventing DOM XSS - javascript

We recently on-boarded someone else's code which has since been tested, and failed, for DOM XSS attacks.
Basically the url fragments are being passed directly into jQuery selectors and enabling JavaScript to be injected, like so:
"http://website.com/#%3Cimg%20src=x%20onerror=alert%28/XSSed/%29%3E)"
$(".selector [thing="+window.location.hash.substr(1)+"]");
The problem is that this is occurring throughout their scripts and would need a lot of regression testing to fix e.g. if we escape the data if statements won't return true any more as the data won't match.
The JavaScript file in question is concatenated at build time from many smaller files so this becomes even more difficult to fix.
Is there a way to prevent these DOM XSS attacks with some global code without having to go through and debug each instance.
I proposed that we add a little regular expression at the top of the script to detect common chars used in XSS attacks and to simply kill the script if it returns true.
var xss = window.location.href.match(/(javascript|src|onerror|%|<|>)/g);
if(xss != null) return;
This appears to work but I'm not 100% happy with the solution. Does anyone have a better solution or any useful insight they can offer?

If you stick to the regular expression solution, which is far from ideal but may be the best choice given your constraints:
Rather than defining a regular expression matching malicious hashes (/(javascript|src|onerror|%|<|>)/g), I would define a regular expression matching sound hashes (e.g. /^[\w_-]*$/).
It will avoid false-positive errors (e.g. src_records), make it clear what is authorized and what isn't, and block more complex injection mechanisms.

Your issue is caused by that jQuery's input string may be treated as HTML, not only as selector.
Use native document.querySelector() instead of jQuery.
If support for IE7- is important for you, you can try Sizzle selector engine which likely, unlike jQuery and similar to native querySelector(), does not interpret input string as something different from a selector.

Related

client potential XSS jQuery

I have the following problem, Client Potential XSS and tried to fix it
$('#plazo input[value=' + $('#plazoActual').text() + ']').prop('checked', true);
but I don't understand why .text() is a vulnerability
Hope someone experienced can help, how to properly sanitize the above line.

The vulnerability is not $('#plazoActual').text(), but the fact that you take that value from the page (probably editable by a user), and use it in $(). It is reported, because $() can create elements, and if it's created from concatenated user input, it allows for injection (xss).
At first sight this particular case does not seem vulnerable though, because $() above does not create an element, it is just a selector for an input field (the first character is a # and there is a space). I think this cannot be exploited in general, but jQuery was a long time ago with all kinds of vulnerabilities also depending on the exact version. Also the input can potentially be manipulated so the selector finds something else in the page, which might be useful in a more complex attack.
A more robust and more secure way would be to find all inputs in #plazo (something like $('#plazo input')), iterate through them, and set checked where the value is $('#plazoActual').text(). So the point is to not have dynamic content in the selector.

Is it safe to save user created javascript in database?

I'm working on a code playground type of application where a user(web developer/designer) can input HTML, CSS and Javascript and view the result on an iframe. The inputted code will be saved in the database (MySQL) and rendered back again in an iframe on a show_results view/action.
Now the question: Is it safe to save javascripts directly in the database? If not, then where/how should I save it?

The database is not going to be your problem here. It's fairly trivial to use prepared statements to allow all kinds of characters to be stored safely in the database. Using anything other than prepared statements to store user input is insufficient, and essentially never recommended.
But you're talking about allowing arbitrary javascript to be executed, which is always going to be a security problem. As a commenter above implies, you're going to be replicating the complexities of jsfiddle.net without the security experience, the development know-how, or the express wish to keep on patching the vulnerabilities that will keep on cropping up.
Certainly you should be aware that what you're doing will completely compromise any domain that you set it up on, so that essentially that javascript should be only written on a throw-away domain or subdomain that you don't use for any other purpose. Of course, it's going to be trivial in such an environment to simply framebreak and pull a viewer off of the site that hosts the frame as well.
I'm sure this just scratches the surface of the potential abuses that arbitrary javascript execution (aka intentional self cross-site-scripting) will bring with it.
Since you're essentially re-inventing a very dangerous wheel with this concept, why not simply use some of the embedding services that already exist out there? codepen.io for example, allows you to embed it's snippets.

Yes it is safe as an architectural decision iff you are executing the javascript on the client side.
On any website you can use tools such as chrome's "inspect element" to manipulate the html, javascript etc on the client. Your system cannot assume that items on the client are not manipulated. This is why server side validation is still so important.
I completely disagree with kzqai.
If this was the case then fiddler would be in serious trouble.
There are potential problems that can be exposed more easily with what you are doing, but those problems already exist and are just obscure.
IFF you are executing javascript on the server side, this is a very complex decision. I would personally avoid it if possible because the game you are playing is that you are able to catch every possible scenario for trouble vs a bad guy being able to catch the 1 you did not.

It is safe as long as you correctly escape certain characters when inserting the value in the SQL statement. For example, if your Javascript code is:
var foo = 'hello world';
Then you will have to escape the single quotes when building the SQL statement:
INSERT INTO snippets (code) VALUES ('var foo = ''hello world'';')
In the statement above, two single quotes ('') are the way to represent just a single quote in a string enclosed by single quotes.
See the link below for further information on escaping characters:
http://dev.mysql.com/doc/refman/5.0/en/string-literals.html
EDIT
As Stephen P correctly points out, if you use prepared statements on the server side code then the framework under the hood will replace those characters for you.

Accurately unescape HTML entities in javascript

In javascript, I need to take a string and HTML un-escape it.
This question over here asks the same question, and the most popular answer involves populating a temporary div.
I've used this as well, but I think I've found a bug.
Simple example, correct behavior
If you have this string: Cats>Dogs
Unescaped, it should be: Cats>Dogs
Malformed example, wrong behavior
If you remove the semicolon and use this instead:Cats&gtDogs
You will get this as a result: Cats>Dogs
Isn't that wrong?
This struck me as odd. From what I understand, an escaped string requires the presence of a terminating semicolon, otherwise it's not escaped. After all, what if I had a store called guitars&amps? For all we know, this company exists but gets no business because it causes null reference exceptions everywhere it has records.
Any ideas on how I could perform escaping while knowingly avoiding escaping when the semicolon is missing? Currently, all I can think to do is perform the unescaping myself.
(The WYSIWYG preview in StackOverflow exhibits a similar unusual behavior, by the way. Try entering &ampgt;, this renders as >!)

Isn't that wrong?
Successful HTML parsers are tolerant. This is one of the things distinguishing them from, say, XML parsers. They don't necessarily stick to strict rules about markup, for the simple reason that there's a lot of incorrect markup out there. So they try to figure out what the markup is meant to represent. &gtDogs is more likely to mean >Dogs than &gtDogs, so that's what the parser goes with.

Javascript safely use user regex

I want to safely use regexp that the user inputs in order to parse text. I don't want to use a sandbox or anything. Can I just plug-in the regexp into String.match or would that cause problems? If not how might this be avoided?
The usecase is that of a writer who has a lot of text. The writer will transform that text with various regexes. Other users will run that author's regexes in order to get the intended output.

This should not cause any security errors. Worse case scenario you are using an invalid regEx string. Unless I am misunderstanding the question.
EDIT Worse case scenario is locking up your own browser. Thanks Anirudh for bringing this point up.

User Expression -> Same User
If you have the user input in a string, you can pass that directly to the constructor.
demo
var s = "This is my text";
$('input#expression').keyup(function(){
var e = new RegExp($(this).val());
$('#match').text(
JSON.stringify(s.match(e))
);
}).trigger("keyup");
It's much better than eval. There aren't any security problems that I know of. If there are any, it's probably a bug in the browser.
User Expression -> Other User(s)
As pointed out in the comments, it would be very complicated to differentiate between safe, and computationally intense/infeasible expressions in JavaScript.
If you don't trust the users to play nice, don't let them run expressions on each other's computers. At the very least, make sure user data is saved, in the case of the page needing to be refreshed, and don't run them without them being explicitly being invoked by the user.

If you are using Javascript it is running on their machine - so the sandbox is not involved.
They are free to mess up their machine as much as they want! Besides the worst that can happen is that the script breaks on their browser.

jQuery sanitizing comments and linkifying URLs

In terms of jQuery (or Javascript), what happens behind the scenes when a person posts a comment on Facebook, Twitter, or a blog?
For instance, do they sanitize the text first, and then pattern match URL's into an actual link? Are there other items of concern that the client-side should check in addition to doing some checks on the backend?
I have found a few regex's for turning URL's into links, but I'm not sure if there are better solutions.
I'm trying to wrap my head around the problem, but I'm having a difficult time knowing where to start. Any guidance you can provide is greatly appreciated!

This is a matter of opinion (in my opinion) so I'll CW this answer. Here's my opnion as a bona-fide citizen of the Internet:
There are two broad kinds of "sanitization": one is semantic sanitization, where input is checked to make sure it's what it's supposed to be (phone number, postal code, currency amount, whatever). The other is defensive sanitization, which is (again, in my opinion) a generally misguided, user-hostile activity.
Really, input is never really scary until it touches something: the database server, an HTML renderer, a JavaScript interpreter, and so on. The list is long.
As to point 1, I think that defensive sanitization is misguided because it ignores point 2 above: without knowing what environment you're defending from malicious input, you can't really sanitize it without greatly restricting the input alphabet, and even then the process may be fighting against itself. It's user-hostile because it needlessly restricts what legitimate users can do with the data they want to keep in their account. Who is to say that me wanting to include in my "comments" or "nickname" or "notes" fields characters that look like XML, or SQL, or any other language's special characters? If there's no semantic reason to filter inputs, why do that to your users?
Point 2 is really the crux of this. User input can be dangerous because server-side code (or client-side code, for that matter) can hand it over directly to unsuspecting interpretation environments where meta-characters important to each distinct environment can cause unexpected behavior. If you hand untouched user input directly to SQL by pasting it directly into a query template, then special SQL meta-characters like quotes can be used by a malicious user to control the database in ways you definitely don't want. However, that alone is no reason to prevent me from telling you that my name is "O'Henry".
The key issue with point 2 is that there are many different interpretation environments, and each of them is completely distinct as far as the threat posed by user input. Let's list a few:
SQL - quote marks in user input are a big potential problem; specific DB servers may have other exploitable syntax conventions
HTML - when user input is dropped straight into HTML, the browser's HTML parser will happily obey whatever embedded markup tells it to do, including run scripts, load tracker images, and whatever else. The key meta-characters are "<", ">", and "&" (the latter not so much because of attacks, but because of the mess they cause). It's probably also good to worry about quotes here too because user input may need to go inside HTML element attributes.
JavaScript - if a page template needs to put some user input directly into some running JavaScript code, the things to worry about are probably quotes (if the input is to be treated as a JavaScript string). If the user input needs to go into a regular expression, then a lot more scrubbing is necessary.
Logfiles - yes, logfiles. How do you look at logfiles? I do it on a simple command-line window on my Linux box. Such command-line "console" applications generally obey ancient "escape sequences" that date back to old ASCII terminals, for controlling cursor position and various other things. Well, embedded escape sequences in cleverly crafted user input can be used for crazy attacks that leverage those escape sequences; the general idea is to have some user input get dropped into some log file (maybe as part of a page error log) and trick an administrator into scrolling through the logfile in an xterm window. Wild, huh?
The key point here is that the exact techniques necessary to protect those environments from malformed or malicious input differ significantly from one to the next. Protecting your SQL server from malicious quotes is a completely different problem from guarding those quotes in HTML or JavaScript (and note that both of those are totally different from each other too!).
The bottom line: my opinion, therefore, is that the proper focus of attention when worrying about potentially malformed or malicious input is the process of writing user data, not reading it. As each fragment of user-supplied data is used by your software in cooperation with each interpreting environment, a "quoting" or "escaping" operation has to be done, and it has to be an operation specific to the target environment. How exactly that's arranged may vary all over the place. Traditionally in SQL, for example, one uses prepared statements, though there are times when the deficiencies of prepared statements make that approach difficult. When spitting out HTML, most server-side frameworks have all sorts of built-in hooks for HTML or XML escaping with entity notation (like & for "&"). Nowadays, the simplest way to protect things for Javascript is to leverage a JSON serializer, though of course there are other ways to go.

We Keep Coding

JavaScript is the programming language of the Web.