Javascript safely use user regex

Javascript safely use user regex - javascript

I want to safely use regexp that the user inputs in order to parse text. I don't want to use a sandbox or anything. Can I just plug-in the regexp into String.match or would that cause problems? If not how might this be avoided?
The usecase is that of a writer who has a lot of text. The writer will transform that text with various regexes. Other users will run that author's regexes in order to get the intended output.

This should not cause any security errors. Worse case scenario you are using an invalid regEx string. Unless I am misunderstanding the question.
EDIT Worse case scenario is locking up your own browser. Thanks Anirudh for bringing this point up.

User Expression -> Same User
If you have the user input in a string, you can pass that directly to the constructor.
demo
var s = "This is my text";
$('input#expression').keyup(function(){
var e = new RegExp($(this).val());
$('#match').text(
JSON.stringify(s.match(e))
);
}).trigger("keyup");
It's much better than eval. There aren't any security problems that I know of. If there are any, it's probably a bug in the browser.
User Expression -> Other User(s)
As pointed out in the comments, it would be very complicated to differentiate between safe, and computationally intense/infeasible expressions in JavaScript.
If you don't trust the users to play nice, don't let them run expressions on each other's computers. At the very least, make sure user data is saved, in the case of the page needing to be refreshed, and don't run them without them being explicitly being invoked by the user.

If you are using Javascript it is running on their machine - so the sandbox is not involved.
They are free to mess up their machine as much as they want! Besides the worst that can happen is that the script breaks on their browser.

Related

Is it safe to save user created javascript in database?

I'm working on a code playground type of application where a user(web developer/designer) can input HTML, CSS and Javascript and view the result on an iframe. The inputted code will be saved in the database (MySQL) and rendered back again in an iframe on a show_results view/action.
Now the question: Is it safe to save javascripts directly in the database? If not, then where/how should I save it?

The database is not going to be your problem here. It's fairly trivial to use prepared statements to allow all kinds of characters to be stored safely in the database. Using anything other than prepared statements to store user input is insufficient, and essentially never recommended.
But you're talking about allowing arbitrary javascript to be executed, which is always going to be a security problem. As a commenter above implies, you're going to be replicating the complexities of jsfiddle.net without the security experience, the development know-how, or the express wish to keep on patching the vulnerabilities that will keep on cropping up.
Certainly you should be aware that what you're doing will completely compromise any domain that you set it up on, so that essentially that javascript should be only written on a throw-away domain or subdomain that you don't use for any other purpose. Of course, it's going to be trivial in such an environment to simply framebreak and pull a viewer off of the site that hosts the frame as well.
I'm sure this just scratches the surface of the potential abuses that arbitrary javascript execution (aka intentional self cross-site-scripting) will bring with it.
Since you're essentially re-inventing a very dangerous wheel with this concept, why not simply use some of the embedding services that already exist out there? codepen.io for example, allows you to embed it's snippets.

Yes it is safe as an architectural decision iff you are executing the javascript on the client side.
On any website you can use tools such as chrome's "inspect element" to manipulate the html, javascript etc on the client. Your system cannot assume that items on the client are not manipulated. This is why server side validation is still so important.
I completely disagree with kzqai.
If this was the case then fiddler would be in serious trouble.
There are potential problems that can be exposed more easily with what you are doing, but those problems already exist and are just obscure.
IFF you are executing javascript on the server side, this is a very complex decision. I would personally avoid it if possible because the game you are playing is that you are able to catch every possible scenario for trouble vs a bad guy being able to catch the 1 you did not.

It is safe as long as you correctly escape certain characters when inserting the value in the SQL statement. For example, if your Javascript code is:
var foo = 'hello world';
Then you will have to escape the single quotes when building the SQL statement:
INSERT INTO snippets (code) VALUES ('var foo = ''hello world'';')
In the statement above, two single quotes ('') are the way to represent just a single quote in a string enclosed by single quotes.
See the link below for further information on escaping characters:
http://dev.mysql.com/doc/refman/5.0/en/string-literals.html
EDIT
As Stephen P correctly points out, if you use prepared statements on the server side code then the framework under the hood will replace those characters for you.

Possible XSS Attack in Java Script

Fortify on demand shows me this line of code as possible XSS problem
if (window.location.search != '') {
window.location.href = window.location.href.substr(0,baseurl.length+1)+'currencyCode='+event.getCurrencyCode()+'&'+window.location.href.substr(baseurl.length+1);
} else {
window.location.href = window.location.href.substr(0,baseurl.length)+'?currencyCode='+event.getCurrencyCode()+window.location.href.substr(baseurl.length);
}
I'm far from being JavaScript expert, but I need to fix this code.
Can you please help?

I think Fortify has found that event.getCurrencyCode() could be any length string and may contain a cross-site scripting attack that might send an unsuspecting user to a malicious site or cause the browser to load JavaScript that does bad things to the user. You might be able to tell this by looking at the details tab of the finding in Fortify's Audit Workbench tool.
Assuming that the potentially malicious data could be supplied by event.getCurrencyCode, you need to whitelist validate this value either when the event is sourced or here in this code. I'm going to bet that the spectrum values of currency code in this application is relatively small and each are of limited length so it should be directly possible to whitelist this value using Javascript's built in regex functionality.
As it stands now JavaScript will happily add a practically unlimited length string in that URL if it is supplied by the event, and with the UTF-8 character set there is a lot that an attacker can do (inlined JavaScript, etc.)
Hope this helps. Good luck.

Unfortunately the existing answer is incorrect, and ivy_lynx's comment doesn't really address the question.
Fortify On Demand is reporting a data flow vulnerability. This is a case where some data originates anywhere except from the programmer - and is reflected to the unaware end user's browser.
The potentially dangerous data comes from:
event.getCurrencyCode()
You haven't posted enough source for us to know what this is, but a pretty good guess is that the function is supposed to return nothing other than an ISO currency code, or rather three, uppercase letters. ("EUR" or "JPY" etc.) Note I am making a big assumption here; I cannot see the code.
The potentially dangerous data ends up going into the browser's location.
The problem is that the developer has no guarantee what event will be sent, or that unexpected data might appear in that currency code.
The simplest fix is for you to transform the return value from "event.getCurrencyCode()" into guaranteed three uppercase letters. There is no known attack that can express in three such uppercase letters. So you could replace:
event.getCurrencyCode()
with
/^[A-Z][A-Z][A-Z]$/.exec( event.getCurrencyCode() )
(reference: http://www.w3schools.com/jsref/jsref_regexp_exec.asp )
That will correctly build your URL if and only if event.getCurrencyCode() resolves to three uppercase letters like "USD". Otherwise, "null" will go into the URL at the point where the currency code was expected.
Obviously, you need to work with a real JavaScript developer to implement such a fix so that no further problems are introduced.

Preventing DOM XSS

We recently on-boarded someone else's code which has since been tested, and failed, for DOM XSS attacks.
Basically the url fragments are being passed directly into jQuery selectors and enabling JavaScript to be injected, like so:
"http://website.com/#%3Cimg%20src=x%20onerror=alert%28/XSSed/%29%3E)"
$(".selector [thing="+window.location.hash.substr(1)+"]");
The problem is that this is occurring throughout their scripts and would need a lot of regression testing to fix e.g. if we escape the data if statements won't return true any more as the data won't match.
The JavaScript file in question is concatenated at build time from many smaller files so this becomes even more difficult to fix.
Is there a way to prevent these DOM XSS attacks with some global code without having to go through and debug each instance.
I proposed that we add a little regular expression at the top of the script to detect common chars used in XSS attacks and to simply kill the script if it returns true.
var xss = window.location.href.match(/(javascript|src|onerror|%|<|>)/g);
if(xss != null) return;
This appears to work but I'm not 100% happy with the solution. Does anyone have a better solution or any useful insight they can offer?

If you stick to the regular expression solution, which is far from ideal but may be the best choice given your constraints:
Rather than defining a regular expression matching malicious hashes (/(javascript|src|onerror|%|<|>)/g), I would define a regular expression matching sound hashes (e.g. /^[\w_-]*$/).
It will avoid false-positive errors (e.g. src_records), make it clear what is authorized and what isn't, and block more complex injection mechanisms.

Your issue is caused by that jQuery's input string may be treated as HTML, not only as selector.
Use native document.querySelector() instead of jQuery.
If support for IE7- is important for you, you can try Sizzle selector engine which likely, unlike jQuery and similar to native querySelector(), does not interpret input string as something different from a selector.

How Do I Sanitize JS eval Input?

a="79 * 2245 + (79 * 2 - 7)";
b="";
c=["1","2","3","4","5","6","7","8","9","0","+","-","/","*"];
for (i=1;i<a.length;i++){
for (ii=1;i<c.length;i++){
b=(a.substring(0,i))+(c[ii])+(a.substring(i+1,a.length));
alert(eval(b.replace(" ","")));
}
}
I need to find out how to make it so that when I use eval, I know that the input will not stop the script, and if it would normally crash the script to just ignore it. I understand that eval is not a good function to use, but I want a quick and simple method by which I can solve this. The above code tries to output all of the answers with all of the possible replacements for any digit, sign or space in the above. i represents the distance through which it has gone in the string and ii represents the symbol that it is currently checking. a is the original problem and b is the modified problem.

Try catching the exception eval might throw, like this:
try{
alert(eval(b.replace(" ","")));
} catch (e){
//alert(e);
}

You can check for a few special cases and avoid some behaviors with regex or the like, but there is definitely no way to 'if it would normally crash just ignore it'
That is akin to the halting problem, as mellamokb refers to. And theres no way to know ipositively f a script runs to completion besides running it.
One should be very careful to vet any strings that go to eval, and keep user input out of them as much as possibl except for real simple and verifiable things like an integer value. If you can find a way around eval altogether than all the better.
For the calculation example you show its probably best to parse it properly into tokens and go from there rather than evaluate in string form.
PS - if you really want to check out these one-off's to the expression in a, it is a somewhat interesting use of eval eespite its faults. cam you explain why you are trimming the whitespace imediately before evaluation? i dont believe i can think of a situation where it effects the results. for (at least most) valid expressions it makes no difference, and while it might alter some of the invalid cases i cant think of a case where it does so meaningfully

jQuery sanitizing comments and linkifying URLs

In terms of jQuery (or Javascript), what happens behind the scenes when a person posts a comment on Facebook, Twitter, or a blog?
For instance, do they sanitize the text first, and then pattern match URL's into an actual link? Are there other items of concern that the client-side should check in addition to doing some checks on the backend?
I have found a few regex's for turning URL's into links, but I'm not sure if there are better solutions.
I'm trying to wrap my head around the problem, but I'm having a difficult time knowing where to start. Any guidance you can provide is greatly appreciated!

This is a matter of opinion (in my opinion) so I'll CW this answer. Here's my opnion as a bona-fide citizen of the Internet:
There are two broad kinds of "sanitization": one is semantic sanitization, where input is checked to make sure it's what it's supposed to be (phone number, postal code, currency amount, whatever). The other is defensive sanitization, which is (again, in my opinion) a generally misguided, user-hostile activity.
Really, input is never really scary until it touches something: the database server, an HTML renderer, a JavaScript interpreter, and so on. The list is long.
As to point 1, I think that defensive sanitization is misguided because it ignores point 2 above: without knowing what environment you're defending from malicious input, you can't really sanitize it without greatly restricting the input alphabet, and even then the process may be fighting against itself. It's user-hostile because it needlessly restricts what legitimate users can do with the data they want to keep in their account. Who is to say that me wanting to include in my "comments" or "nickname" or "notes" fields characters that look like XML, or SQL, or any other language's special characters? If there's no semantic reason to filter inputs, why do that to your users?
Point 2 is really the crux of this. User input can be dangerous because server-side code (or client-side code, for that matter) can hand it over directly to unsuspecting interpretation environments where meta-characters important to each distinct environment can cause unexpected behavior. If you hand untouched user input directly to SQL by pasting it directly into a query template, then special SQL meta-characters like quotes can be used by a malicious user to control the database in ways you definitely don't want. However, that alone is no reason to prevent me from telling you that my name is "O'Henry".
The key issue with point 2 is that there are many different interpretation environments, and each of them is completely distinct as far as the threat posed by user input. Let's list a few:
SQL - quote marks in user input are a big potential problem; specific DB servers may have other exploitable syntax conventions
HTML - when user input is dropped straight into HTML, the browser's HTML parser will happily obey whatever embedded markup tells it to do, including run scripts, load tracker images, and whatever else. The key meta-characters are "<", ">", and "&" (the latter not so much because of attacks, but because of the mess they cause). It's probably also good to worry about quotes here too because user input may need to go inside HTML element attributes.
JavaScript - if a page template needs to put some user input directly into some running JavaScript code, the things to worry about are probably quotes (if the input is to be treated as a JavaScript string). If the user input needs to go into a regular expression, then a lot more scrubbing is necessary.
Logfiles - yes, logfiles. How do you look at logfiles? I do it on a simple command-line window on my Linux box. Such command-line "console" applications generally obey ancient "escape sequences" that date back to old ASCII terminals, for controlling cursor position and various other things. Well, embedded escape sequences in cleverly crafted user input can be used for crazy attacks that leverage those escape sequences; the general idea is to have some user input get dropped into some log file (maybe as part of a page error log) and trick an administrator into scrolling through the logfile in an xterm window. Wild, huh?
The key point here is that the exact techniques necessary to protect those environments from malformed or malicious input differ significantly from one to the next. Protecting your SQL server from malicious quotes is a completely different problem from guarding those quotes in HTML or JavaScript (and note that both of those are totally different from each other too!).
The bottom line: my opinion, therefore, is that the proper focus of attention when worrying about potentially malformed or malicious input is the process of writing user data, not reading it. As each fragment of user-supplied data is used by your software in cooperation with each interpreting environment, a "quoting" or "escaping" operation has to be done, and it has to be an operation specific to the target environment. How exactly that's arranged may vary all over the place. Traditionally in SQL, for example, one uses prepared statements, though there are times when the deficiencies of prepared statements make that approach difficult. When spitting out HTML, most server-side frameworks have all sorts of built-in hooks for HTML or XML escaping with entity notation (like & for "&"). Nowadays, the simplest way to protect things for Javascript is to leverage a JSON serializer, though of course there are other ways to go.

We Keep Coding

JavaScript is the programming language of the Web.