I'm working on a code playground type of application where a user(web developer/designer) can input HTML, CSS and Javascript and view the result on an iframe. The inputted code will be saved in the database (MySQL) and rendered back again in an iframe on a show_results view/action.
Now the question: Is it safe to save javascripts directly in the database? If not, then where/how should I save it?
The database is not going to be your problem here. It's fairly trivial to use prepared statements to allow all kinds of characters to be stored safely in the database. Using anything other than prepared statements to store user input is insufficient, and essentially never recommended.
But you're talking about allowing arbitrary javascript to be executed, which is always going to be a security problem. As a commenter above implies, you're going to be replicating the complexities of jsfiddle.net without the security experience, the development know-how, or the express wish to keep on patching the vulnerabilities that will keep on cropping up.
Certainly you should be aware that what you're doing will completely compromise any domain that you set it up on, so that essentially that javascript should be only written on a throw-away domain or subdomain that you don't use for any other purpose. Of course, it's going to be trivial in such an environment to simply framebreak and pull a viewer off of the site that hosts the frame as well.
I'm sure this just scratches the surface of the potential abuses that arbitrary javascript execution (aka intentional self cross-site-scripting) will bring with it.
Since you're essentially re-inventing a very dangerous wheel with this concept, why not simply use some of the embedding services that already exist out there? codepen.io for example, allows you to embed it's snippets.
Yes it is safe as an architectural decision iff you are executing the javascript on the client side.
On any website you can use tools such as chrome's "inspect element" to manipulate the html, javascript etc on the client. Your system cannot assume that items on the client are not manipulated. This is why server side validation is still so important.
I completely disagree with kzqai.
If this was the case then fiddler would be in serious trouble.
There are potential problems that can be exposed more easily with what you are doing, but those problems already exist and are just obscure.
IFF you are executing javascript on the server side, this is a very complex decision. I would personally avoid it if possible because the game you are playing is that you are able to catch every possible scenario for trouble vs a bad guy being able to catch the 1 you did not.
It is safe as long as you correctly escape certain characters when inserting the value in the SQL statement. For example, if your Javascript code is:
var foo = 'hello world';
Then you will have to escape the single quotes when building the SQL statement:
INSERT INTO snippets (code) VALUES ('var foo = ''hello world'';')
In the statement above, two single quotes ('') are the way to represent just a single quote in a string enclosed by single quotes.
See the link below for further information on escaping characters:
http://dev.mysql.com/doc/refman/5.0/en/string-literals.html
EDIT
As Stephen P correctly points out, if you use prepared statements on the server side code then the framework under the hood will replace those characters for you.
Related
In my application, there is a comment box. If someone enters a comment like
<script>alert("hello")</script>
then an alert appears when I load that page.
Is there anyway to prevent this?
There are several ways to address this, but since you haven't mentioned which back-end technology you are using, it is hard to give anything but rough answers.
Also, you haven't mentioned if you want to allow, or deny, the ability to enter regular HTML in the box.
Method 1:
Sanitize inputs on the way in. When you accept something at the server, look for the script tags and remove them.
This is actually far more difficult to get right then might be expected.
Method 2:
Escape the data on the way back down to the server. In PHP there is a function called
htmlentities which will turn all HTML into which renders as literally what was typed.
The words <script>alert("hello")</script> would appear on your page.
Method 3
White-list
This is far beyond the answer of a single post and really required knowing your back-end system, but it is possible to allow some HTML characters with disallowing others.
This is insanely difficult to get right and you really are best using a library package that has been very well tested.
You should treat user input as plain text rather than HTML. By correctly escaping HTML entities, you can render what looks like valid HTML text without having the browser try to execute it. This is good practice in general, for your client-side code as well as any user provided values passed to your back-end. Issues arising from this are broadly referred to as script injection or cross-site scripting.
Practically on the client-side this is pretty easy since you're using jQuery. When updating the DOM based on user input, rely on the text method in place of the html method. You can see a simple example of the difference in this jsFiddle.
The best way is replace <script> with other string.For example in C#use:
str.replace("<script>","O_o");
Other options has a lot of disadvantage.
1.Block javascript: It cause some validation disabled too.those validation that done in frontend.Also after retrive from database it works again.I mean attacker can inject script as input in forms and it saved in database.after you return records from database in another page it render as script!!!!
2.render as text. In some technologies it needs third-party packages that it is risk in itself.Maybe these packages has backdoor!!!
convert value into string ,it solved in my case
example
var anything
I am designing a SQL engine. From the UI the user will select certain parameters and based on those parameter I will design a SQL statement. User will have option to join tables, apply conditions, create temp tables, ability to apply many SQL in-build functions etc. I will write many functions which will look at the input and based that perform some action which will ultimately give a SQL statement.
I can do this code using any server side language but we want to try JavaScript.
Obviously this will require lot of string manipulation.
I am just worried if this will result "“A script on this page is causing Internet Explorer to run slowly. If it continues to run, your compute may become unresponsive." on IE or "A script on this page may be busy, or it may have stopped responding. You can stop the script now, open the script in the debugger, or let the script continue" on Firefox.
I know I am not sharing any code & the question is little subjective. But I am sure some of you may have faced similar issues/challenge in your previous assignment & your valuable suggestions will be a big help.
Unless I have misunderstood your question, you are asking if there is a risk of getting that dreaded message when doing a couple of string manipulations.
If you want to only construct the query in the browser, then I think you are fine, as you won't have to, say, concatenate 1000 strings, or search a string amongst 1000. The only risk I would say is if you intend to do complicated regexp operations on big strings.
If I misunderstood, a few example of the string manipulations you intend to do would help.
Of course, I am assuming you will send and run the SQL queries in an asynchronous manner.
I've been asked at work whether it is possible to write, on purpose or by accident, JavaScript that will remove specific characters from a HTML document and thus break the HTML. An example would be adding some JavaScript that removes the < symbol in the page. I've tried searching online and I know JavaScript can replace strings, but my knowledge of the language is negligible.
I've been asked to look into it as a way of hopefully addressing why a site I work on needs to have controls over who can add bespoke functionality to the page. I'm hoping it's not possible but would be grateful for the peace of mind!
Yes, and in fact you can do things far more insidious with javascript as well.
http://en.wikipedia.org/wiki/Cross-site_scripting
yes, thats possible. the easiest example is
var body = document.getElemetsByTagName('body')[0];
body.innerHTML = 'destroyed';
wich will remove the whole page and just write "destroyed" instead. to get back to your example: in the same way it's possible to replace <:
var body = document.getElemetsByTagName('body')[0];
body.innerHTML = body.innerHTML.replace('<','some other character');
such "extreme" cases are very unlikely to happen by accident, but it's absolutely possible (particularly for inexperienced javascript-developers) to break things on a site that usually shouldn't be affected by javascript.
note that this will only mess op the displayed page in the clients browser and doesn't change your html-file on the server in any way. just find and remove/fix the "bad" lines of code and everything is fine again.
Any client/browser can manipulate how the page is viewed at any time, for instance in chrome hit F12 and then you can write whatever you want in the html and you will see the changes immediately. But that's not to worry about...
The scary part is when JavaScript on the site communicates with the back-end server and supplies it with some input parameters that are not being sanitized on the server side before it is processed in some way. SQL Injection can also happen this way if the back-end utilizes a database which they almost always do, and so on...
A webpage can be manipulated in two ways, either its none-persistent or its persistent.
[none-persistent]: this way you can manipulate your access to a webpage but, this won't affect other users in it self, but you can do harm once your in.
[persistent]: this way the server side code will permanently be affected by the injected code, and most likely affect other users.
Key thing here is to always sanitize the input a back-end server used before it processes anything.
You could definitely write some javascript function to modify the contents of a file. If that file is your HTML page, then sure.
If you want to prevent this from happening, you can just set the permissions of that HTML file to be read-only, though.
you could:
Overwrite the page,
Mess with the innerHTML of the body tag (almost the same),
Insert illegal elements.
Yes. In the least, you could use it to write CSS that sets any element, class, ID... even the body to display:none;
In terms of jQuery (or Javascript), what happens behind the scenes when a person posts a comment on Facebook, Twitter, or a blog?
For instance, do they sanitize the text first, and then pattern match URL's into an actual link? Are there other items of concern that the client-side should check in addition to doing some checks on the backend?
I have found a few regex's for turning URL's into links, but I'm not sure if there are better solutions.
I'm trying to wrap my head around the problem, but I'm having a difficult time knowing where to start. Any guidance you can provide is greatly appreciated!
This is a matter of opinion (in my opinion) so I'll CW this answer. Here's my opnion as a bona-fide citizen of the Internet:
There are two broad kinds of "sanitization": one is semantic sanitization, where input is checked to make sure it's what it's supposed to be (phone number, postal code, currency amount, whatever). The other is defensive sanitization, which is (again, in my opinion) a generally misguided, user-hostile activity.
Really, input is never really scary until it touches something: the database server, an HTML renderer, a JavaScript interpreter, and so on. The list is long.
As to point 1, I think that defensive sanitization is misguided because it ignores point 2 above: without knowing what environment you're defending from malicious input, you can't really sanitize it without greatly restricting the input alphabet, and even then the process may be fighting against itself. It's user-hostile because it needlessly restricts what legitimate users can do with the data they want to keep in their account. Who is to say that me wanting to include in my "comments" or "nickname" or "notes" fields characters that look like XML, or SQL, or any other language's special characters? If there's no semantic reason to filter inputs, why do that to your users?
Point 2 is really the crux of this. User input can be dangerous because server-side code (or client-side code, for that matter) can hand it over directly to unsuspecting interpretation environments where meta-characters important to each distinct environment can cause unexpected behavior. If you hand untouched user input directly to SQL by pasting it directly into a query template, then special SQL meta-characters like quotes can be used by a malicious user to control the database in ways you definitely don't want. However, that alone is no reason to prevent me from telling you that my name is "O'Henry".
The key issue with point 2 is that there are many different interpretation environments, and each of them is completely distinct as far as the threat posed by user input. Let's list a few:
SQL - quote marks in user input are a big potential problem; specific DB servers may have other exploitable syntax conventions
HTML - when user input is dropped straight into HTML, the browser's HTML parser will happily obey whatever embedded markup tells it to do, including run scripts, load tracker images, and whatever else. The key meta-characters are "<", ">", and "&" (the latter not so much because of attacks, but because of the mess they cause). It's probably also good to worry about quotes here too because user input may need to go inside HTML element attributes.
JavaScript - if a page template needs to put some user input directly into some running JavaScript code, the things to worry about are probably quotes (if the input is to be treated as a JavaScript string). If the user input needs to go into a regular expression, then a lot more scrubbing is necessary.
Logfiles - yes, logfiles. How do you look at logfiles? I do it on a simple command-line window on my Linux box. Such command-line "console" applications generally obey ancient "escape sequences" that date back to old ASCII terminals, for controlling cursor position and various other things. Well, embedded escape sequences in cleverly crafted user input can be used for crazy attacks that leverage those escape sequences; the general idea is to have some user input get dropped into some log file (maybe as part of a page error log) and trick an administrator into scrolling through the logfile in an xterm window. Wild, huh?
The key point here is that the exact techniques necessary to protect those environments from malformed or malicious input differ significantly from one to the next. Protecting your SQL server from malicious quotes is a completely different problem from guarding those quotes in HTML or JavaScript (and note that both of those are totally different from each other too!).
The bottom line: my opinion, therefore, is that the proper focus of attention when worrying about potentially malformed or malicious input is the process of writing user data, not reading it. As each fragment of user-supplied data is used by your software in cooperation with each interpreting environment, a "quoting" or "escaping" operation has to be done, and it has to be an operation specific to the target environment. How exactly that's arranged may vary all over the place. Traditionally in SQL, for example, one uses prepared statements, though there are times when the deficiencies of prepared statements make that approach difficult. When spitting out HTML, most server-side frameworks have all sorts of built-in hooks for HTML or XML escaping with entity notation (like & for "&"). Nowadays, the simplest way to protect things for Javascript is to leverage a JSON serializer, though of course there are other ways to go.
I'm planning on making a web app that will allow users to post entire web pages on my website. I'm thinking of using HTML Purifier but I'm not sure because HTML Purifier edits the HTLM and it's important that the HTML is maintained just how it was posted. So I was thinking making some regex to get rid of all script tags and all the javascript attributes like onload, onclick, etc.
I saw a Google video a while ago that had a solution for this. Their solution was to use another website to post javascript in so the original website cannot be accessed by it. But I don't wanna purchase a new domain just for this.
be careful with homebrew regexes for this kind of thing
A regex like
s/(<.*?)onClick=['"].*?['"](.*?>)/$1 $3/
looks like it might get rid of onclick events, but you can circumvent it with
<a onClick<a onClick="malicious()">="malicious()">
running the regex on that will get you something like
<a onClick ="malicious()">
You can fix it by repeatedly running the regex on that string until it doesn't match, but that's just one example of how easy it is to get around simple regex sanitizers.
The most critical error people make when doing this is validating things on input.
Instead, you should validate on display.
The context matters when determing what is XSS and what isn't. Therefore, you can happily accept any input, as long as you pass it through appropriate cleaning functions when displaying it.
Consider that something that constitutes 'XSS' will be different when the input is placed in a '<a href="HERE"> as opposed to <a>here!</a>.
Thus, all you need to do, is make sure that any time you write user data, you consider, very carefully, where you are displaying it, and make sure that it can't escape the context you are writing it to.
If you can find any other way of letting users post content, that does not involve HTML, do that. There are plenty of user-side light markup systems you can use to generate HTML.
So I was thinking making some regex to get rid of all script tags and all the javascript attributes like onload, onclick, etc.
Forget it. You cannot process HTML with regex in any useful way. Let alone when security is involved and attackers might be deliberately throwing malformed markup at you.
If you can convince your users to input XHTML, that's much easier to parse. You still can't do it with regex, but you can throw it into a simple XML parser, and walk over the resulting node tree to check that every element and attribute is known-safe, and delete any that aren't, then re-serialise.
HTML Purifier edits the HTLM and it's important that the HTML is maintained just how it was posted.
Why?
If it's so they can edit it in their original form, then the answer is simply to purify it on the way out to be displayed in the browser, not on the way in at submit-time.
If you must let users input their own free-form HTML — and in general I'd advise against it — then HTML Purifier, with a whitelist approach (ban all elements/attributes that aren't known-safe) is about as good as it gets. It's very very complicated and you may have to keep it up to date when hacks are found, but it's streets ahead of anything you're going to hack up yourself with regexes.
But I don't wanna purchase a new domain just for this.
You can use a subdomain, as long as any authentication tokens (in particular, cookies) can't cross between subdomains. (Which for cookies they can't by default as the domain parameter is set to only the current hostname.)
Do you trust your users with scripting capability? If not don't let them have it, or you'll get attack scripts and iframes to Russian exploit/malware sites all over the place...
Make sure that user content doesn't contain anything that could cause Javascript to be ran on your page.
You can do this by using an HTML stripping function that gets rid of all HTML tags (like strip_tags from PHP), or by using another similar tool. There are actually many reasons besides XSS to do this. If you have user submitted content, you want to make sure that it doesn't break the site layout.
I belive you can simply use a sub-domain of your current domain to host Javascript, and you will get the same security benefits for AJAX. Not cookies however.
In your specific case, filtering out the <script> tag and Javascript actions is probably going to be your best bet.
1) Use clean simple directory based URIs to serve user feed data.
Make sure when you dynamically create URIs to address the user's uploaded data, service account, or anything else off your domain make sure you don't post information as parameters to the URI. That is an extremely easy point of manipulation that could be used to expose flaws in your server security and even possibly inject code onto your server.
2) Patch your server.
Ensure you keep your server up to date on all the latest security patches for all the services running on that server.
3) Take all possible server-side protections against SQL injection.
If somebody can inject code to your SQL database that can execute from services on your box that person will own your box. At that point they can then install malware onto your webserver to be feed back to your users or simple record data from the server and send it out to a malicious party.
4) Force all new uploads into a protected sandboxed area to test for script execution.
No matter how you try to remove script tags from submitted code there will be a way to circumvent your safeguards to execute script. Browsers are sloppy and do all kinds of stupid crap they are not supposed to do. Test your submissions in a safe area before you publish them for public consumption.
5) Check for beacons in submitted code.
This step requires the previous step and can be very complicated, because it can occur in script code that requires a browser plugin to execute, such as Action Script, but is just as much a vulnerability as allowing JavaScript to execute from user submitted code. If a user can submit code that can beacon out to a third party then your users, and possibly your server, is completely exposed to data loss to a malicious third party.
You should filter ALL HTML and whitelist only the tags and attributes that are safe and semantically useful. WordPress is great at this and I assume that you will find the regular expressions used by WordPress if you search their source code.