I have a textarea in which I have put validation code not to allow <script> tags and Javascript tags, but the user can enter descriptions like <strong onmouseover=alert(2)>.
So when someone hovers over this string tag JS alert box shows up.
How can I stop this kind of javascript injection?
You'll need to properly sanitize the HTML you allow. This is non-trivial, as you've discovered. (You probably need to disallow iframe and several other elements.)
Proper sanitizing requires a whitelist of elements, and within those a whitelist of attributes allowed on each. Obviously the various onXyz attributes would not be on the whitelist.
Sanitizing must happen server-side, because anything client-side can be bypassed. So without knowing what server technology you're using, one can't recommend something. For instance, JSoup is a well-known one for Java, but of course, that's not useful to you if you aren't using Java. :-) For .Net, there's the HTML Agility Pack or the Microsoft Anti-XSS library, but this is a very incomplete list.
There are a lot of tools called html purifiers. You can try this for example.
The easy answer is replace(/</g,'<');, but of course that prevents any HTML from being used. This is why BBCode, Markdown and other such languages exist: to provide formatting features without granting the user permission to post arbitrary code.
Alternatively, just search for things of the pattern /\bon[a-z]+=/i
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
In other words, what are the most-used techniques to sanitize input and/or output nowadays? What do people in industrial (or even just personal-use) websites use to combat the problem?
You should refer to the excellent OWASP website for a summary of attacks (including XSS) and defenses against them. Here's the simplest explanation I could come up with, which might actually be more readable than their web page (but probably nowhere nearly as complete).
Specifying a charset. First of all, ensure that your web page specifies the UTF-8 charset in the headers or in the very beginning of the head element HTML encode all inputs to prevent a UTF-7 attack in Internet Explorer (and older versions of Firefox) despite other efforts to prevent XSS.
HTML escaping. Keep in mind that you need to HTML-escape all user input. This includes replacing < with <, > with >, & with & and " with ". If you will ever use single-quoted HTML attributes, you need to replace ' with ' as well. Typical server-side scripting languages such as PHP provide functions to do this, and I encourage you to expand on these by creating standard functions to insert HTML elements rather than inserting them in an ad-hoc manner.
Other types of escaping. You still, however, need to be careful to never insert user input as an unquoted attribute or an attribute interpreted as JavaScript (e.g. onload or onmouseover). Obviously, this also applies to script elements unless the input is properly JavaScript-escaped, which is different from HTML escaping. Another special type of escaping is URL escaping for URL parameters (do it before the HTML escaping to properly include a parameter in a link).
Validating URLs and CSS values. The same goes for URLs of links and images (without validating based on approved prefixes) because of the javascript: URL scheme, and also CSS stylesheet URLs and data within style attributes. (Internet Explorer allows inserting JavaScript expressions as CSS values, and Firefox is similarly problematic with its XBL support.) If you must include a CSS value from an untrusted source, you should safely and strictly validate or CSS escape it.
Not allowing user-provided HTML. Do not allow user-provided HTML if you have the option. That is an easy way to end up with an XSS problem, and so is writing a "parser" for your own markup language based on simple regex substitutions. I would only allow formatted text if the HTML output were generated in an obviously safe manner by a real parser that escapes any text from the input using the standard escaping functions and individually builds the HTML elements. If you have no choice over the matter, use a validator/sanitizer such as AntiSamy.
Preventing DOM-based XSS. Do not include user input in JavaScript-generated HTML code and insert it into the document. Instead, use the proper DOM methods to ensure that it is processed as text, not HTML.
Obviously, I cannot cover every single case in which an attacker can insert JavaScript code. In general, HTTP-only cookies can be used to possibly make an XSS attack a bit harder (but by no means prevent one), and giving programmers security training is essential.
There are two kinds of XSS attack. One is where your site allows HTML to be injected somehow. This is not that hard to defend against: either escape all user input data, or strip all <> tags and support something like UBB-code instead. Note: URLs may still open you up to rick-rolling type attacks.
The more insiduous one is where some third-party site contains an IFRAME, SCRIPT or IMG tag or the like that hits a URL on your site, and this URL will use whatever authentication the user currently has towards your site. Thus, you should never, ever take any direct action in response to a GET request. If you get a GET request that attempts to do anything (update a profile, check out a shopping cart, etc), then you should respond with a form that in turn requires a POST to be accepted. This form should also contain a cross-site request forgery token, so that nobody can put up a form on a third party site that's set up to submit to your site using hidden fields (again, to avoid a masquerading attack).
There are only two major areas in your code which need to be addressed properly to avoid xss issues.
before using any user input value in queries, use the database helper functions like mysql_escape_string over the data and then use it in query. It will gurantee xss safety.
before displaying user input values back into form input fields, pass them through htmlspecialchars or htmlentities. This will convert all xss prone values into characters that the browser can display without being compromised.
Once you have done the above, you are more than 95% safe from xss attacks. Then you can go on and learn advanced techniques from security websites and apply additional security on your site.
What most frameworks do is that they discourage you to directly write html form code or do queries in string form, so that using the framework helper functions your code remains clean, while any serious problem can be addressed quickly by just updating one or two lines of code in the framework. You can simply write a little library of your own with common functions and reuse them in all your projects.
If you are developing in .NET one of the most effective ways to avoid XSS is to use the Microsoft AntiXSS Library. It's a very effective way to sanitize your input.
I'm trying to display a html email on a html page. Technically I can do it but security is a concern, it's possible an attacker could form malicious code and put into a html email.
I've tried using the Microsoft XSS library to sanitize the html but it strips out so much it basically not worth it.
I'm wondering if there's a better solution with iframes or something. eg, is there a way to secure data within an iframe?
GMail seems to display html emails, they must have a good html sanitizer.
Your basic options are:
sanitize the HTML (use a whitelist approach, for safety)
use an iframe with a src on a different domain, or with the html5 sandbox attribute
Both can be done effectively and there are lots of variations in the detail.
Regarding sanitizing so much it wasn't worth it: good libraries like https://github.com/guardian/html-janitor/ (javascript) and https://github.com/jsocol/bleach (python) have the ability to customize the whitelist. It depends if you're just trying to present typical user-generated HTML emails with basic formatting or if you're trying to display fully "designed" newsletters with lots of images, tables, etc. If just the former, in some quick experimenting with bleach I was able to make most emails look good by simply adding br and div to the list of approved tags, so the whitespace didn't get wiped.
A persistent follow-up of an admittedly similar question I had asked: What security restrictions should be implemented in allowing a user to upload a Javascript file that directs canvas animation?
I like to think I know JS decent enough, and I see common characters in all the XSS examples I've come accoss, which I am somewhat familiar with. I am lacking good XSS examples that could bypass a securely sound, rationally programmed system. I want people to upload html5 canvas creations onto my site. Any sites like this yet? People get scared about this all the time it seems, but what if you just wanted to do it for fun for yourself and if something happens to the server then oh well it's just an animation site and information is spread around like wildfire anyway so if anyone cares then i'll tell them not to sign up.
If I allow a single textarea form field to act as an IDE using JS for my programming language written in JS, and do string replacing, filtering, and validation of the user's syntax before finally compiling it into JS to be echoed by PHP, how bad could it get for me to host that content? Please show me how you could bypass all of my combined considerations, with also taking into account the server-side as well:
If JavaScript is disabled, preventing any POST from getting through, keeping constant track of user session.
Namespacing the Class, so they can only prefix their functions and methods with EXAMPLE.
Making instance
Storing my JS Framework in an external (immutable in the browser?) JS file, which needs to be at the top of the page for the single textarea field in the form to be accepted, as well as a server-generated key which must follow it. On the page that hosts the compiled user-uploaded canvas game/animation (1 per page ONLY), the server will verify the correct JS filename string before echoing the rest out.
No external script calls! String replacing on client and server.
Allowing ONLY alphanumeric characters, dashes and astericks.
Removing alert, eval, window, XMLHttpRequest, prototyping, cookie, obvious stuff. No native JS reserved words or syntax.
Obfuscating and minifying another external JS file that helps to serve the IDE and recognize the programming language's uniquely named Canvas API methods.
When Window unloads, store the external JS code in to two dynamically generated form fields to be checked by the server in POST. All the original code will be cataloged in the DB thoroughly for filtering purposes.
Strict variable naming conventions ('example-square1-lengthPROPERTY', 'example-circle-spinMETHOD')
Copy/Paste Disabled, setInterval to constantly check if enabled by the user. If so, then trigger a block to the database, change window.location immediately and check the session ID through POST to confirm in case JS becomes disabled between that timeframe.
I mean, can I do it then? How can one do harm if they can't use HEX or ASCII and stuff like that?
I think there are a few other options.
Good places to go for real-life XSS tests, by the way, are the XSS Cheat Sheet and HTML5 Security Cheetsheet (newer). The problem with that, however, is that you want to allow Javascript but disallow bad Javascript. This is a different, and more complex, goal than the usual way of preventing XSS, by preventing all scripts.
Hosting on a separate domain
I've seen this referred to as an "iframe jail".
The goal with XSS attacks is to be able to run code in the same context as your site - that is, on the same domain. This is because the code will be able to read and set cookies for that domain, intiate user actions or redress your design, redirect, and so forth.
If, however, you have two separate domains - one for your site, and another which only hosts the untrusted, user-uploaded content, then that content will be isolated from your main site. You could include it in an iframe, and yet it would have no access to the cookies from your site, no access to redress or alter the design or links outside its iframe, and no access to the scripting variables of your main window (since it is on a different domain).
It could, of course, set cookies as much as it likes, and even read back the ones that it set. But these would still be isolated from the cookies for your site. It would not be able to affect or read your main site's cookies. It could also include other code which could annoy/harrass the user, such as pop-up windows, or could attempt to phish (you'd need to make it visually clear in your out-of-iframe UI that the content served is not part of your site). However, this is still sandboxed from your main site, where you own personal payload - your session cookies and the integrity of your overarching page design and scripts, is preserved. It would carry no less but no more risk than any site on the internet that you could embed in an iframe.
Using a subset of Javascript
Subsets of Javascript have been proposed, which provide compartmentalisation for scripts - the ability to load untrusted code and have it not able to alter or access other code if you don't give it the scope to do so.
Look into things like Google CAJA - whose aim is to enable exactly the type of service that you've described:
Caja allows websites to safely embed DHTML web applications from third parties, and enables rich interaction between the embedding page and the embedded applications. It uses an object-capability security model to allow for a wide range of flexible security policies, so that the containing page can effectively control the embedded applications' use of user data and to allow gadgets to prevent interference between gadgets' UI elements.
One issue here is that people submitting code would have to program it using the CAJA API. It's still valid Javascript, but it won't have access to the browser DOM, as CAJA's API mediates access. This would make it difficult for your users to port some existing code. There is also a compilation phase. Since Javascript is not a secure language, there is no way to ensure code cannot access your DOM or other global variables without running it through a parser, so that's what CAJA does - it compiles it from Javascript input to Javascript output, enforcing its security model.
htmlprufier consists of thousands of regular expressions that attempt "purify" html into a safe subset that is immune to xss. This project is bypassesed very few months, because it isn't nearly complex enough to address the problem of XSS.
Do you understand the complexity of XSS?
Do you know that javascript can exist without letters or numbers?
Okay, they very first thing I would try is inserting a meta tag that changes the encoding to I don't know lets say UTF-7 which is rendered by IE. Within this utf-7 enocded html it will contain javascript. Did you think of that? Well guess what there is somewhere between a hundred thousand and a a few million other vectors I didn't think of.
The XSS cheat sheet is so old my grandparents are immune to it. Here is a more up to date version.
(Oah and by the way you will be hacked because what you are trying to do fundamentally insecure.)
I want to allow user contributed Javascript in areas of my website.
Is this completely insane?
Are there any Javascript sanitizer scripts or good regex patterns out there to scan for alerts, iframes, remote script includes and other malicious Javascript?
Should this process be manually authorized (by a human checking the Javascript)?
Would it be more sensible to allow users to only use a framework (like jQuery) rather than giving them access to actual Javascript? This way it might be easier to monitor.
Thanks
I think the correct answer is 1.
As soon as you allow Javascript, you open yourself and your users to all kinds of issues. There is no perfect way to clean Javascript, and people like the Troll Army will take it as their personal mission to mess you up.
1. Is this completely insane?
Don't think so, but near. Let's see.
2. Are there any Javascript sanitizer scripts or good regex patterns out there to scan for alerts, iframes, remote script includes and other malicious Javascript?
Yeah, at least there are Google Caja and ADSafe to sanitize the code, allowing it to be sandboxed. I don't know up to what degree of trustworthiest they provide, though.
3. Should this process be manually authorized (by a human checking the Javascript)?
It may be possible that sandbox fails, so it would be a sensible solution, depending on the risk and the trade-off of being attacked by malicious (or faulty) code.
4. Would it be more sensible to allow users to only use a framework (like jQuery) rather than giving them access to actual Javascript? This way it might be easier to monitor.
JQuery is just plain Javascript, so if you're trying to protect from attacks, it won't help at all.
If it is crucial to prevent these kind of attacks, you can implement a custom language, parse it in the backend and produce the controlled, safe javascript; or you may consider another strategy, like providing an API and accessing it from a third-party component of your app.
Take a look at Google Caja:
Caja allows websites to safely embed DHTML web applications from third parties, and enables rich interaction between the embedding page and the embedded applications. It uses an object-capability security model to allow for a wide range of flexible security policies, so that the containing page can effectively control the embedded applications' use of user data and to allow gadgets to prevent interference between gadgets' UI elements.
Instead of checking for evil things like script includes, I would go for regex-based whitelisting of the few commands you expect to be used. Then involve a human to authorize and add new acceptable commands to the whitelist.
Think about all of the things YOU can do with javascript. Then think about the things you would do if you could do it on someone elses site. These are things that people will do just because they can, or to find out if they can. I don't think it is a good idea at all.
It might be safer to design/implement your own restricted scripting language, which can be very similar to JavaScript, but which is under the control of your own interpreter.
Probably. The scope for doing bad things is going to be much greater than it is when you simply allow HTML but try to avoid alloing JavaScript.I do not know.Well, two things: do you really want to spend your time doing this, and if you do this you had better make sure they see the javascript code rather than actual live JavaScript!I can't see why this would make any difference, unless you do have someone approving posts and that person happens to be more at home with jQuery than plain JavaScript.
Host it on a different domain. Same-origin security policy in browsers will then prevent user-submitted JS from attacking your site.
It's not enough to host it on a different subdomain, because subdomains can set cookies on higher-level domain, and this could be used for session fixation attacks.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
In other words, what are the most-used techniques to sanitize input and/or output nowadays? What do people in industrial (or even just personal-use) websites use to combat the problem?
You should refer to the excellent OWASP website for a summary of attacks (including XSS) and defenses against them. Here's the simplest explanation I could come up with, which might actually be more readable than their web page (but probably nowhere nearly as complete).
Specifying a charset. First of all, ensure that your web page specifies the UTF-8 charset in the headers or in the very beginning of the head element HTML encode all inputs to prevent a UTF-7 attack in Internet Explorer (and older versions of Firefox) despite other efforts to prevent XSS.
HTML escaping. Keep in mind that you need to HTML-escape all user input. This includes replacing < with <, > with >, & with & and " with ". If you will ever use single-quoted HTML attributes, you need to replace ' with ' as well. Typical server-side scripting languages such as PHP provide functions to do this, and I encourage you to expand on these by creating standard functions to insert HTML elements rather than inserting them in an ad-hoc manner.
Other types of escaping. You still, however, need to be careful to never insert user input as an unquoted attribute or an attribute interpreted as JavaScript (e.g. onload or onmouseover). Obviously, this also applies to script elements unless the input is properly JavaScript-escaped, which is different from HTML escaping. Another special type of escaping is URL escaping for URL parameters (do it before the HTML escaping to properly include a parameter in a link).
Validating URLs and CSS values. The same goes for URLs of links and images (without validating based on approved prefixes) because of the javascript: URL scheme, and also CSS stylesheet URLs and data within style attributes. (Internet Explorer allows inserting JavaScript expressions as CSS values, and Firefox is similarly problematic with its XBL support.) If you must include a CSS value from an untrusted source, you should safely and strictly validate or CSS escape it.
Not allowing user-provided HTML. Do not allow user-provided HTML if you have the option. That is an easy way to end up with an XSS problem, and so is writing a "parser" for your own markup language based on simple regex substitutions. I would only allow formatted text if the HTML output were generated in an obviously safe manner by a real parser that escapes any text from the input using the standard escaping functions and individually builds the HTML elements. If you have no choice over the matter, use a validator/sanitizer such as AntiSamy.
Preventing DOM-based XSS. Do not include user input in JavaScript-generated HTML code and insert it into the document. Instead, use the proper DOM methods to ensure that it is processed as text, not HTML.
Obviously, I cannot cover every single case in which an attacker can insert JavaScript code. In general, HTTP-only cookies can be used to possibly make an XSS attack a bit harder (but by no means prevent one), and giving programmers security training is essential.
There are two kinds of XSS attack. One is where your site allows HTML to be injected somehow. This is not that hard to defend against: either escape all user input data, or strip all <> tags and support something like UBB-code instead. Note: URLs may still open you up to rick-rolling type attacks.
The more insiduous one is where some third-party site contains an IFRAME, SCRIPT or IMG tag or the like that hits a URL on your site, and this URL will use whatever authentication the user currently has towards your site. Thus, you should never, ever take any direct action in response to a GET request. If you get a GET request that attempts to do anything (update a profile, check out a shopping cart, etc), then you should respond with a form that in turn requires a POST to be accepted. This form should also contain a cross-site request forgery token, so that nobody can put up a form on a third party site that's set up to submit to your site using hidden fields (again, to avoid a masquerading attack).
There are only two major areas in your code which need to be addressed properly to avoid xss issues.
before using any user input value in queries, use the database helper functions like mysql_escape_string over the data and then use it in query. It will gurantee xss safety.
before displaying user input values back into form input fields, pass them through htmlspecialchars or htmlentities. This will convert all xss prone values into characters that the browser can display without being compromised.
Once you have done the above, you are more than 95% safe from xss attacks. Then you can go on and learn advanced techniques from security websites and apply additional security on your site.
What most frameworks do is that they discourage you to directly write html form code or do queries in string form, so that using the framework helper functions your code remains clean, while any serious problem can be addressed quickly by just updating one or two lines of code in the framework. You can simply write a little library of your own with common functions and reuse them in all your projects.
If you are developing in .NET one of the most effective ways to avoid XSS is to use the Microsoft AntiXSS Library. It's a very effective way to sanitize your input.