As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
In other words, what are the most-used techniques to sanitize input and/or output nowadays? What do people in industrial (or even just personal-use) websites use to combat the problem?
You should refer to the excellent OWASP website for a summary of attacks (including XSS) and defenses against them. Here's the simplest explanation I could come up with, which might actually be more readable than their web page (but probably nowhere nearly as complete).
Specifying a charset. First of all, ensure that your web page specifies the UTF-8 charset in the headers or in the very beginning of the head element HTML encode all inputs to prevent a UTF-7 attack in Internet Explorer (and older versions of Firefox) despite other efforts to prevent XSS.
HTML escaping. Keep in mind that you need to HTML-escape all user input. This includes replacing < with <, > with >, & with & and " with ". If you will ever use single-quoted HTML attributes, you need to replace ' with ' as well. Typical server-side scripting languages such as PHP provide functions to do this, and I encourage you to expand on these by creating standard functions to insert HTML elements rather than inserting them in an ad-hoc manner.
Other types of escaping. You still, however, need to be careful to never insert user input as an unquoted attribute or an attribute interpreted as JavaScript (e.g. onload or onmouseover). Obviously, this also applies to script elements unless the input is properly JavaScript-escaped, which is different from HTML escaping. Another special type of escaping is URL escaping for URL parameters (do it before the HTML escaping to properly include a parameter in a link).
Validating URLs and CSS values. The same goes for URLs of links and images (without validating based on approved prefixes) because of the javascript: URL scheme, and also CSS stylesheet URLs and data within style attributes. (Internet Explorer allows inserting JavaScript expressions as CSS values, and Firefox is similarly problematic with its XBL support.) If you must include a CSS value from an untrusted source, you should safely and strictly validate or CSS escape it.
Not allowing user-provided HTML. Do not allow user-provided HTML if you have the option. That is an easy way to end up with an XSS problem, and so is writing a "parser" for your own markup language based on simple regex substitutions. I would only allow formatted text if the HTML output were generated in an obviously safe manner by a real parser that escapes any text from the input using the standard escaping functions and individually builds the HTML elements. If you have no choice over the matter, use a validator/sanitizer such as AntiSamy.
Preventing DOM-based XSS. Do not include user input in JavaScript-generated HTML code and insert it into the document. Instead, use the proper DOM methods to ensure that it is processed as text, not HTML.
Obviously, I cannot cover every single case in which an attacker can insert JavaScript code. In general, HTTP-only cookies can be used to possibly make an XSS attack a bit harder (but by no means prevent one), and giving programmers security training is essential.
There are two kinds of XSS attack. One is where your site allows HTML to be injected somehow. This is not that hard to defend against: either escape all user input data, or strip all <> tags and support something like UBB-code instead. Note: URLs may still open you up to rick-rolling type attacks.
The more insiduous one is where some third-party site contains an IFRAME, SCRIPT or IMG tag or the like that hits a URL on your site, and this URL will use whatever authentication the user currently has towards your site. Thus, you should never, ever take any direct action in response to a GET request. If you get a GET request that attempts to do anything (update a profile, check out a shopping cart, etc), then you should respond with a form that in turn requires a POST to be accepted. This form should also contain a cross-site request forgery token, so that nobody can put up a form on a third party site that's set up to submit to your site using hidden fields (again, to avoid a masquerading attack).
There are only two major areas in your code which need to be addressed properly to avoid xss issues.
before using any user input value in queries, use the database helper functions like mysql_escape_string over the data and then use it in query. It will gurantee xss safety.
before displaying user input values back into form input fields, pass them through htmlspecialchars or htmlentities. This will convert all xss prone values into characters that the browser can display without being compromised.
Once you have done the above, you are more than 95% safe from xss attacks. Then you can go on and learn advanced techniques from security websites and apply additional security on your site.
What most frameworks do is that they discourage you to directly write html form code or do queries in string form, so that using the framework helper functions your code remains clean, while any serious problem can be addressed quickly by just updating one or two lines of code in the framework. You can simply write a little library of your own with common functions and reuse them in all your projects.
If you are developing in .NET one of the most effective ways to avoid XSS is to use the Microsoft AntiXSS Library. It's a very effective way to sanitize your input.
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
In other words, what are the most-used techniques to sanitize input and/or output nowadays? What do people in industrial (or even just personal-use) websites use to combat the problem?
You should refer to the excellent OWASP website for a summary of attacks (including XSS) and defenses against them. Here's the simplest explanation I could come up with, which might actually be more readable than their web page (but probably nowhere nearly as complete).
Specifying a charset. First of all, ensure that your web page specifies the UTF-8 charset in the headers or in the very beginning of the head element HTML encode all inputs to prevent a UTF-7 attack in Internet Explorer (and older versions of Firefox) despite other efforts to prevent XSS.
HTML escaping. Keep in mind that you need to HTML-escape all user input. This includes replacing < with <, > with >, & with & and " with ". If you will ever use single-quoted HTML attributes, you need to replace ' with ' as well. Typical server-side scripting languages such as PHP provide functions to do this, and I encourage you to expand on these by creating standard functions to insert HTML elements rather than inserting them in an ad-hoc manner.
Other types of escaping. You still, however, need to be careful to never insert user input as an unquoted attribute or an attribute interpreted as JavaScript (e.g. onload or onmouseover). Obviously, this also applies to script elements unless the input is properly JavaScript-escaped, which is different from HTML escaping. Another special type of escaping is URL escaping for URL parameters (do it before the HTML escaping to properly include a parameter in a link).
Validating URLs and CSS values. The same goes for URLs of links and images (without validating based on approved prefixes) because of the javascript: URL scheme, and also CSS stylesheet URLs and data within style attributes. (Internet Explorer allows inserting JavaScript expressions as CSS values, and Firefox is similarly problematic with its XBL support.) If you must include a CSS value from an untrusted source, you should safely and strictly validate or CSS escape it.
Not allowing user-provided HTML. Do not allow user-provided HTML if you have the option. That is an easy way to end up with an XSS problem, and so is writing a "parser" for your own markup language based on simple regex substitutions. I would only allow formatted text if the HTML output were generated in an obviously safe manner by a real parser that escapes any text from the input using the standard escaping functions and individually builds the HTML elements. If you have no choice over the matter, use a validator/sanitizer such as AntiSamy.
Preventing DOM-based XSS. Do not include user input in JavaScript-generated HTML code and insert it into the document. Instead, use the proper DOM methods to ensure that it is processed as text, not HTML.
Obviously, I cannot cover every single case in which an attacker can insert JavaScript code. In general, HTTP-only cookies can be used to possibly make an XSS attack a bit harder (but by no means prevent one), and giving programmers security training is essential.
There are two kinds of XSS attack. One is where your site allows HTML to be injected somehow. This is not that hard to defend against: either escape all user input data, or strip all <> tags and support something like UBB-code instead. Note: URLs may still open you up to rick-rolling type attacks.
The more insiduous one is where some third-party site contains an IFRAME, SCRIPT or IMG tag or the like that hits a URL on your site, and this URL will use whatever authentication the user currently has towards your site. Thus, you should never, ever take any direct action in response to a GET request. If you get a GET request that attempts to do anything (update a profile, check out a shopping cart, etc), then you should respond with a form that in turn requires a POST to be accepted. This form should also contain a cross-site request forgery token, so that nobody can put up a form on a third party site that's set up to submit to your site using hidden fields (again, to avoid a masquerading attack).
There are only two major areas in your code which need to be addressed properly to avoid xss issues.
before using any user input value in queries, use the database helper functions like mysql_escape_string over the data and then use it in query. It will gurantee xss safety.
before displaying user input values back into form input fields, pass them through htmlspecialchars or htmlentities. This will convert all xss prone values into characters that the browser can display without being compromised.
Once you have done the above, you are more than 95% safe from xss attacks. Then you can go on and learn advanced techniques from security websites and apply additional security on your site.
What most frameworks do is that they discourage you to directly write html form code or do queries in string form, so that using the framework helper functions your code remains clean, while any serious problem can be addressed quickly by just updating one or two lines of code in the framework. You can simply write a little library of your own with common functions and reuse them in all your projects.
If you are developing in .NET one of the most effective ways to avoid XSS is to use the Microsoft AntiXSS Library. It's a very effective way to sanitize your input.
Are all known javascript framekillers vulnerable to XSS?
If yes, whould it be enough to sanitize window.location before breaking out of an iframe?
What would be the best way to do it?
Could you please give an example of possible XSS attack?
Thanks!
UPD: The reason I'm asking is because I got a vulnerability scan alert saying that JS framekiller code containing top.location.replace(document.location) is XSS vulnerable as document.location is controlled by the user.
What was right in their description: variables like 'document.location', 'window.location', 'self.location' are (partially) controlled by non-trusted user. This is because the choice of (sub)string in non-trusted domain and page location ('http://non.trusted.domain.com/mypage') and non-trusted request string ('http://my.domain.com/?myrequest') are formed according to user's intention that may not always be good for you.
What was wrong: this user-dependency is not necessarily XSS vulnerability. In fact, to form XSS you would need to have some code that effectively uses the content controlled by non-trusted user somewhere in your output stream for your page. In the example of simple framekiller like top.location.replace(window.location) there's no danger of XSS.
One example where we could talk about XSS would be code like
document.write('Click here')
Constructing URI like http://test.com/?dummy"<script>alert("Test")</script>"dummy and substituting instead of document.location by you code will trigger non-trusted script in trusted webpage's context. As constructing such URI and passing it unescaped is a challenge, real XSS would work in some more complex scenarios involving inserting non-trusted variables verbatim into flow of language directives, be it HTML, CSS, JS, PHP, etc.
Another well-known example of XSS-unaware development was history of inventing JSON. While JSON has got strong popularity (having me among its proponents too), initially it was intended as "quick-n-dirty" way of storing JS data as pieces of plain JS-formatted data structures. In order to "parse" JSON blocks, it was enough just to eval() them. Fortunately, people quickly recognised how flawed was this whole idea, so nowadays any knowledgeable developer in sane mind will always use proper safe JSON parser instead.
I have a textarea in which I have put validation code not to allow <script> tags and Javascript tags, but the user can enter descriptions like <strong onmouseover=alert(2)>.
So when someone hovers over this string tag JS alert box shows up.
How can I stop this kind of javascript injection?
You'll need to properly sanitize the HTML you allow. This is non-trivial, as you've discovered. (You probably need to disallow iframe and several other elements.)
Proper sanitizing requires a whitelist of elements, and within those a whitelist of attributes allowed on each. Obviously the various onXyz attributes would not be on the whitelist.
Sanitizing must happen server-side, because anything client-side can be bypassed. So without knowing what server technology you're using, one can't recommend something. For instance, JSoup is a well-known one for Java, but of course, that's not useful to you if you aren't using Java. :-) For .Net, there's the HTML Agility Pack or the Microsoft Anti-XSS library, but this is a very incomplete list.
There are a lot of tools called html purifiers. You can try this for example.
The easy answer is replace(/</g,'<');, but of course that prevents any HTML from being used. This is why BBCode, Markdown and other such languages exist: to provide formatting features without granting the user permission to post arbitrary code.
Alternatively, just search for things of the pattern /\bon[a-z]+=/i
I am currently doing some research into XSS prevention but I am a bit confused about DOM based attacks. Most papers I have read about the subject give the example of injecting JavaScript through URL parameters to modify the DOM if the value is rendered in the page by JavaScript instead of server-side code.
However, it appears that all modern browsers encode all special characters given through URL parameters if rendered by JavaScript.
Does this mean DOM based XSS attacks cannot be performed unless against older browsers such as IE6?
They are absolutely possible. If you don't filter output that originated from your users, that output can be anything, including scripts. The browser doesn't have a way to know whether it is a legitimate script controlled by you or not.
It's not a matter of modern browsers, it's the basic principle that the browser treats every content that comes from your domain as legitimate to execute.
There are other aspects that are indeed blocked (sometimes, not always) by modern browsers (although security flaws always exist) like cross-domain scripting, 3rd party access to resources etc.
Forget about those old-school XSS exampls from 10 years ago. Programmers who write javascript to render page by taking something unescaped from query params have either been fired or switched to frameworks like angular/backbone a long time ago.
However, reflected/stored XSS still widely exists. This requires proper escaping from both server side and client side. Modern frameworks all provide good support for escaping sensitive characters when rendering the HTML. For example, when rendering views from model data, angular has $sce(strict contextual escaping) service (https://docs.angularjs.org/api/ng/service/$sce) to address possible XSS threats. backbone models also have methods like "model.escape(attribute)" (http://backbonejs.org/#Model-escape) to eliminate the XSS threats.
A persistent follow-up of an admittedly similar question I had asked: What security restrictions should be implemented in allowing a user to upload a Javascript file that directs canvas animation?
I like to think I know JS decent enough, and I see common characters in all the XSS examples I've come accoss, which I am somewhat familiar with. I am lacking good XSS examples that could bypass a securely sound, rationally programmed system. I want people to upload html5 canvas creations onto my site. Any sites like this yet? People get scared about this all the time it seems, but what if you just wanted to do it for fun for yourself and if something happens to the server then oh well it's just an animation site and information is spread around like wildfire anyway so if anyone cares then i'll tell them not to sign up.
If I allow a single textarea form field to act as an IDE using JS for my programming language written in JS, and do string replacing, filtering, and validation of the user's syntax before finally compiling it into JS to be echoed by PHP, how bad could it get for me to host that content? Please show me how you could bypass all of my combined considerations, with also taking into account the server-side as well:
If JavaScript is disabled, preventing any POST from getting through, keeping constant track of user session.
Namespacing the Class, so they can only prefix their functions and methods with EXAMPLE.
Making instance
Storing my JS Framework in an external (immutable in the browser?) JS file, which needs to be at the top of the page for the single textarea field in the form to be accepted, as well as a server-generated key which must follow it. On the page that hosts the compiled user-uploaded canvas game/animation (1 per page ONLY), the server will verify the correct JS filename string before echoing the rest out.
No external script calls! String replacing on client and server.
Allowing ONLY alphanumeric characters, dashes and astericks.
Removing alert, eval, window, XMLHttpRequest, prototyping, cookie, obvious stuff. No native JS reserved words or syntax.
Obfuscating and minifying another external JS file that helps to serve the IDE and recognize the programming language's uniquely named Canvas API methods.
When Window unloads, store the external JS code in to two dynamically generated form fields to be checked by the server in POST. All the original code will be cataloged in the DB thoroughly for filtering purposes.
Strict variable naming conventions ('example-square1-lengthPROPERTY', 'example-circle-spinMETHOD')
Copy/Paste Disabled, setInterval to constantly check if enabled by the user. If so, then trigger a block to the database, change window.location immediately and check the session ID through POST to confirm in case JS becomes disabled between that timeframe.
I mean, can I do it then? How can one do harm if they can't use HEX or ASCII and stuff like that?
I think there are a few other options.
Good places to go for real-life XSS tests, by the way, are the XSS Cheat Sheet and HTML5 Security Cheetsheet (newer). The problem with that, however, is that you want to allow Javascript but disallow bad Javascript. This is a different, and more complex, goal than the usual way of preventing XSS, by preventing all scripts.
Hosting on a separate domain
I've seen this referred to as an "iframe jail".
The goal with XSS attacks is to be able to run code in the same context as your site - that is, on the same domain. This is because the code will be able to read and set cookies for that domain, intiate user actions or redress your design, redirect, and so forth.
If, however, you have two separate domains - one for your site, and another which only hosts the untrusted, user-uploaded content, then that content will be isolated from your main site. You could include it in an iframe, and yet it would have no access to the cookies from your site, no access to redress or alter the design or links outside its iframe, and no access to the scripting variables of your main window (since it is on a different domain).
It could, of course, set cookies as much as it likes, and even read back the ones that it set. But these would still be isolated from the cookies for your site. It would not be able to affect or read your main site's cookies. It could also include other code which could annoy/harrass the user, such as pop-up windows, or could attempt to phish (you'd need to make it visually clear in your out-of-iframe UI that the content served is not part of your site). However, this is still sandboxed from your main site, where you own personal payload - your session cookies and the integrity of your overarching page design and scripts, is preserved. It would carry no less but no more risk than any site on the internet that you could embed in an iframe.
Using a subset of Javascript
Subsets of Javascript have been proposed, which provide compartmentalisation for scripts - the ability to load untrusted code and have it not able to alter or access other code if you don't give it the scope to do so.
Look into things like Google CAJA - whose aim is to enable exactly the type of service that you've described:
Caja allows websites to safely embed DHTML web applications from third parties, and enables rich interaction between the embedding page and the embedded applications. It uses an object-capability security model to allow for a wide range of flexible security policies, so that the containing page can effectively control the embedded applications' use of user data and to allow gadgets to prevent interference between gadgets' UI elements.
One issue here is that people submitting code would have to program it using the CAJA API. It's still valid Javascript, but it won't have access to the browser DOM, as CAJA's API mediates access. This would make it difficult for your users to port some existing code. There is also a compilation phase. Since Javascript is not a secure language, there is no way to ensure code cannot access your DOM or other global variables without running it through a parser, so that's what CAJA does - it compiles it from Javascript input to Javascript output, enforcing its security model.
htmlprufier consists of thousands of regular expressions that attempt "purify" html into a safe subset that is immune to xss. This project is bypassesed very few months, because it isn't nearly complex enough to address the problem of XSS.
Do you understand the complexity of XSS?
Do you know that javascript can exist without letters or numbers?
Okay, they very first thing I would try is inserting a meta tag that changes the encoding to I don't know lets say UTF-7 which is rendered by IE. Within this utf-7 enocded html it will contain javascript. Did you think of that? Well guess what there is somewhere between a hundred thousand and a a few million other vectors I didn't think of.
The XSS cheat sheet is so old my grandparents are immune to it. Here is a more up to date version.
(Oah and by the way you will be hacked because what you are trying to do fundamentally insecure.)