Prevent user-entered scripts from running in webpage

Prevent user-entered scripts from running in webpage - javascript

In my application, there is a comment box. If someone enters a comment like
<script>alert("hello")</script>
then an alert appears when I load that page.
Is there anyway to prevent this?

There are several ways to address this, but since you haven't mentioned which back-end technology you are using, it is hard to give anything but rough answers.
Also, you haven't mentioned if you want to allow, or deny, the ability to enter regular HTML in the box.
Method 1:
Sanitize inputs on the way in. When you accept something at the server, look for the script tags and remove them.
This is actually far more difficult to get right then might be expected.
Method 2:
Escape the data on the way back down to the server. In PHP there is a function called
htmlentities which will turn all HTML into which renders as literally what was typed.
The words <script>alert("hello")</script> would appear on your page.
Method 3
White-list
This is far beyond the answer of a single post and really required knowing your back-end system, but it is possible to allow some HTML characters with disallowing others.
This is insanely difficult to get right and you really are best using a library package that has been very well tested.

You should treat user input as plain text rather than HTML. By correctly escaping HTML entities, you can render what looks like valid HTML text without having the browser try to execute it. This is good practice in general, for your client-side code as well as any user provided values passed to your back-end. Issues arising from this are broadly referred to as script injection or cross-site scripting.
Practically on the client-side this is pretty easy since you're using jQuery. When updating the DOM based on user input, rely on the text method in place of the html method. You can see a simple example of the difference in this jsFiddle.

The best way is replace <script> with other string.For example in C#use:
str.replace("<script>","O_o");
Other options has a lot of disadvantage.
1.Block javascript: It cause some validation disabled too.those validation that done in frontend.Also after retrive from database it works again.I mean attacker can inject script as input in forms and it saved in database.after you return records from database in another page it render as script!!!!
2.render as text. In some technologies it needs third-party packages that it is risk in itself.Maybe these packages has backdoor!!!

convert value into string ,it solved in my case
example
var anything

Related

Can an attacker insert malicious code in html5 data attributes

So I learned earlier that using the data attribute in html5 you could insert values to be handled in a javascript file. e.g
Hey
the handling javascript file will have a line to handle that link tag which might do this
var value=$('.check').data('name');
window.location.href="http://www.example.com/'+value+'";
Now I was wondering, can a malicious coder exploit this? Do you need to sanitize the value before using it for a redirect?

It really depends.
An attacker can modify anything he wants in his browser, so it doesn't matter how much sanitization you put in the front-end, an attacker can work his way around all your javascript functions and the like to circumvent your front-end code.
I'm not saying that you shouldn't sanitize your input in the front-end because it will always help in terms of usability and experience for a legitimate user.
If the address that you're redirecting your user to uses that data attribute to do something with the server, then yes by all means sanitize it in both places: front and back end. Otherwise, you shouldn't worry, the worst case scenario is that a malicious user (or a knowledgable one) will end up in a 404 page.
** EDIT **
After reading your comment in this answer, here's my updated answer:
The dangers reside in how you're using that piece of information. Take as an example google analytics script:
Google provides with you a script that will help you track your visitors actions and behaviors through google analytics interface.
If you change any value in google's script, google analytics won't work, and there's no way you can hack google through the analytics script.
How does google achieve this? They put all their security in the backend, and they sanitize modifiable user input that will be rendered in a website, stored in a database or somehow interacts with the server.
Back to your case:
If you're going to use that data attribute to do a document.write(), an eval, do a database lookup or any sensitive operation (delete, update, retrieve data) then yes by all means: sanitize it.
How are you going to sanitize it? That's problem specific and more than likely you should ask a new question.

If the HTML is taken from user input or generated from user input, yes, you should definitely perform sanitation. However, if you're asking if data attributes are somehow vulnerable in a way other attributes aren't, the answer is no.

A user with access to the browser (e.g. via XSS) can insert anything into a data attribute. But (s)he can just redirect anywhere at anytime, so this trivial case is irrelevant.
If the value is set by a user via some other means, then the link could be set somewhere other than intended within the same domain. That might be annoying but it shouldn't be a security risk.
If you're doing something else, like including a javascript string for eval in the attribute and that comes from a user (e.g. via a database value), then you will create an XSS vulnerability. But you should never, ever, ever, trust user supplied values anyway. Nothing special about html data attributes there.

Do you need to sanitize the value before using it for a redirect?
No need to sanitize before, but you need to sanitize after.
In your example, if you are not sanitizing data - you can get a victim of classic XSS.
I.e: http://www.example.com/ + value, where value is search?q=<script>alert(1)</script>, and where search page actually outputs raw query to the browser.
p.s.: this is not specific to data-attributes. It will work the same with normal attributes.

Is it possible to use JavaScript to break the HTML of a page?

I've been asked at work whether it is possible to write, on purpose or by accident, JavaScript that will remove specific characters from a HTML document and thus break the HTML. An example would be adding some JavaScript that removes the < symbol in the page. I've tried searching online and I know JavaScript can replace strings, but my knowledge of the language is negligible.
I've been asked to look into it as a way of hopefully addressing why a site I work on needs to have controls over who can add bespoke functionality to the page. I'm hoping it's not possible but would be grateful for the peace of mind!

Yes, and in fact you can do things far more insidious with javascript as well.
http://en.wikipedia.org/wiki/Cross-site_scripting

yes, thats possible. the easiest example is
var body = document.getElemetsByTagName('body')[0];
body.innerHTML = 'destroyed';
wich will remove the whole page and just write "destroyed" instead. to get back to your example: in the same way it's possible to replace <:
var body = document.getElemetsByTagName('body')[0];
body.innerHTML = body.innerHTML.replace('<','some other character');
such "extreme" cases are very unlikely to happen by accident, but it's absolutely possible (particularly for inexperienced javascript-developers) to break things on a site that usually shouldn't be affected by javascript.
note that this will only mess op the displayed page in the clients browser and doesn't change your html-file on the server in any way. just find and remove/fix the "bad" lines of code and everything is fine again.

Any client/browser can manipulate how the page is viewed at any time, for instance in chrome hit F12 and then you can write whatever you want in the html and you will see the changes immediately. But that's not to worry about...
The scary part is when JavaScript on the site communicates with the back-end server and supplies it with some input parameters that are not being sanitized on the server side before it is processed in some way. SQL Injection can also happen this way if the back-end utilizes a database which they almost always do, and so on...
A webpage can be manipulated in two ways, either its none-persistent or its persistent.
[none-persistent]: this way you can manipulate your access to a webpage but, this won't affect other users in it self, but you can do harm once your in.
[persistent]: this way the server side code will permanently be affected by the injected code, and most likely affect other users.
Key thing here is to always sanitize the input a back-end server used before it processes anything.

You could definitely write some javascript function to modify the contents of a file. If that file is your HTML page, then sure.
If you want to prevent this from happening, you can just set the permissions of that HTML file to be read-only, though.

you could:
Overwrite the page,
Mess with the innerHTML of the body tag (almost the same),
Insert illegal elements.

Yes. In the least, you could use it to write CSS that sets any element, class, ID... even the body to display:none;

I am using jquery validator, and need help writing my own addMethod( name, method, message,) to valid a promotional code?

I want to be able to validate a form field called promotional codes, without using a data base.
There are two valid codes and the forms field needs to match either one of these. They are codes like this 'VK2012'.
I've tried the equalto with a hidden form field but this doesn't quite work.
Any suggestions greatly appreciated.

First, the comments are right. You should do this on the server side. Client side validation of this sort really ought to be reserved for the case where you can safely assume that your users are acting in good faith (and as soon as you're talking about things like promotional codes, you cannot assume that). As far as a non-database solution goes, it's ugly and maintains poorly, but you could always hardwire the strings to compare to into the code on the server side. Alternately, for a somewhat less ugly (but somewhat more involved) version, you could put them into config files, which would let you change the codes without recompiling.

JavaScript non-persistent security question

Despite my paranoia I've never really gotten around to understanding web security more, so my lack of knowledge is causing me a bit of confusion for this.
Example: Let's say you have 2 text boxes, both are for user input.
The user types in whatever they want into those two text boxes and clicks a button, the button then uses a bit of JavaScript and concatenates whatever is in those two text boxes and displays it out in a div.
My question is, in this case, since it's using JavaScript client side, do you need to really sanitize user input?
What if it outputted to a text box instead of a div? Or as an alert?
I understand that when it comes to forms/PHP you always want to sanitize input, but I'm not really familiar with JavaScript security precautions.
It's my understanding that since this is client-side, and no data is being saved by the server, that whatever the user does (tries to throw in some malicious code or whatnot) won't affect anyone but that user, correct?

No this is not a security issue. The reason why is because an attacker has to force a victim's the browser into making this action in order for it to be XSS.
However, if you grab input from something like document.location and then print it to the page using document.write() then this is DOM based XSS. But this is very a uncommon form of XSS.

You don't have to sanitize anything that is not going to the server.
If people want to do something to their instance of your page, the only one they can hurt is themselves. Look at everything you can do with an extension like GreaseMonkey ... we're talking a lot more than just concatenating strings and displaying them.

What precautions should I take to prevent XSS on user submitted HTML?

I'm planning on making a web app that will allow users to post entire web pages on my website. I'm thinking of using HTML Purifier but I'm not sure because HTML Purifier edits the HTLM and it's important that the HTML is maintained just how it was posted. So I was thinking making some regex to get rid of all script tags and all the javascript attributes like onload, onclick, etc.
I saw a Google video a while ago that had a solution for this. Their solution was to use another website to post javascript in so the original website cannot be accessed by it. But I don't wanna purchase a new domain just for this.

be careful with homebrew regexes for this kind of thing
A regex like
s/(<.*?)onClick=['"].*?['"](.*?>)/$1 $3/
looks like it might get rid of onclick events, but you can circumvent it with
<a onClick<a onClick="malicious()">="malicious()">
running the regex on that will get you something like
<a onClick ="malicious()">
You can fix it by repeatedly running the regex on that string until it doesn't match, but that's just one example of how easy it is to get around simple regex sanitizers.

The most critical error people make when doing this is validating things on input.
Instead, you should validate on display.
The context matters when determing what is XSS and what isn't. Therefore, you can happily accept any input, as long as you pass it through appropriate cleaning functions when displaying it.
Consider that something that constitutes 'XSS' will be different when the input is placed in a '<a href="HERE"> as opposed to <a>here!</a>.
Thus, all you need to do, is make sure that any time you write user data, you consider, very carefully, where you are displaying it, and make sure that it can't escape the context you are writing it to.

If you can find any other way of letting users post content, that does not involve HTML, do that. There are plenty of user-side light markup systems you can use to generate HTML.
So I was thinking making some regex to get rid of all script tags and all the javascript attributes like onload, onclick, etc.
Forget it. You cannot process HTML with regex in any useful way. Let alone when security is involved and attackers might be deliberately throwing malformed markup at you.
If you can convince your users to input XHTML, that's much easier to parse. You still can't do it with regex, but you can throw it into a simple XML parser, and walk over the resulting node tree to check that every element and attribute is known-safe, and delete any that aren't, then re-serialise.
HTML Purifier edits the HTLM and it's important that the HTML is maintained just how it was posted.
Why?
If it's so they can edit it in their original form, then the answer is simply to purify it on the way out to be displayed in the browser, not on the way in at submit-time.
If you must let users input their own free-form HTML — and in general I'd advise against it — then HTML Purifier, with a whitelist approach (ban all elements/attributes that aren't known-safe) is about as good as it gets. It's very very complicated and you may have to keep it up to date when hacks are found, but it's streets ahead of anything you're going to hack up yourself with regexes.
But I don't wanna purchase a new domain just for this.
You can use a subdomain, as long as any authentication tokens (in particular, cookies) can't cross between subdomains. (Which for cookies they can't by default as the domain parameter is set to only the current hostname.)
Do you trust your users with scripting capability? If not don't let them have it, or you'll get attack scripts and iframes to Russian exploit/malware sites all over the place...

Make sure that user content doesn't contain anything that could cause Javascript to be ran on your page.
You can do this by using an HTML stripping function that gets rid of all HTML tags (like strip_tags from PHP), or by using another similar tool. There are actually many reasons besides XSS to do this. If you have user submitted content, you want to make sure that it doesn't break the site layout.
I belive you can simply use a sub-domain of your current domain to host Javascript, and you will get the same security benefits for AJAX. Not cookies however.
In your specific case, filtering out the <script> tag and Javascript actions is probably going to be your best bet.

1) Use clean simple directory based URIs to serve user feed data.
Make sure when you dynamically create URIs to address the user's uploaded data, service account, or anything else off your domain make sure you don't post information as parameters to the URI. That is an extremely easy point of manipulation that could be used to expose flaws in your server security and even possibly inject code onto your server.
2) Patch your server.
Ensure you keep your server up to date on all the latest security patches for all the services running on that server.
3) Take all possible server-side protections against SQL injection.
If somebody can inject code to your SQL database that can execute from services on your box that person will own your box. At that point they can then install malware onto your webserver to be feed back to your users or simple record data from the server and send it out to a malicious party.
4) Force all new uploads into a protected sandboxed area to test for script execution.
No matter how you try to remove script tags from submitted code there will be a way to circumvent your safeguards to execute script. Browsers are sloppy and do all kinds of stupid crap they are not supposed to do. Test your submissions in a safe area before you publish them for public consumption.
5) Check for beacons in submitted code.
This step requires the previous step and can be very complicated, because it can occur in script code that requires a browser plugin to execute, such as Action Script, but is just as much a vulnerability as allowing JavaScript to execute from user submitted code. If a user can submit code that can beacon out to a third party then your users, and possibly your server, is completely exposed to data loss to a malicious third party.

You should filter ALL HTML and whitelist only the tags and attributes that are safe and semantically useful. WordPress is great at this and I assume that you will find the regular expressions used by WordPress if you search their source code.

We Keep Coding

JavaScript is the programming language of the Web.