security considerations when using (end-user-defined) JavaScript code inside Java

security considerations when using (end-user-defined) JavaScript code inside Java - javascript

I am working on a Java project. In it, we want to enable an end-user to define variables which are calculated based on a set of given variables of primitive types or strings. At some point, all given variables are set to specific values, and then the calculations should be carried out. All resulting calculated variables must then be sent to Java.
I am in the process of evaluating ways for the end-user to define his calculations. The (current) idea is to let him write JavaScript and let that code be interpreted/executed inside the Java program. I know of two ways for this to be done: Either use the javax.scripting API or GraalVM/Truffle. In both, we would do it like this:
The given variables are given into the script. In javax.scripting via ScriptEngine.put, in Graal/Truffle via Value.putMember.
The end-user can define variables in the global context (whose names must not collide with the ones coming from Java). How he sets their values is up to him - he can set them directly (to a constant, to one of the given variables, to the sum of some of them ...) or define objects and functions and set the values by calling those.
When the time comes where the given variables have a fixed value, the script is executed.
All variables that were defined in the global context by the script will be sent to Java. In javax.scripting via ScriptEngine.get, in Graal/Truffle via Value.getMember.
NOTE: We would not grant the script access to any Java classes or methods. In javax.scripting via check if the script contains the string Java.type (and disallow such a script), in Graal/Truffle via using the default Context (which has allowAllAccess=false).
The internet is full of hints and tips regarding JavaScript security issues and how to avoid them. On the one hand, I have the feeling that none of them apply here (explanation below). On the other hand, I don't know JavaScript well - I have never used it for anything else than pure, side-effect-free calculations.
So I am looking for some guidance here: What kind of security issues could be present in this scenario?
Why I cannot see any security issues in this scenario:
This is pure JavaScript. It does not even allow creating Blobs (which are part of WebAPI, not JavaScript) which could be used to e.g. create a file on disk. I understand that JavaScript does not contain any functionality to escape its sandbox (like file access, threads, streams...), it is merely able to manipulate the data that is given into its sandbox. See this part of https://262.ecma-international.org/11.0/#sec-overview:
ECMAScript is an object-oriented programming language for performing
computations and manipulating computational objects within a host
environment. ECMAScript as defined here is not intended to be
computationally self-sufficient; indeed, there are no provisions in
this specification for input of external data or output of computed
results. Instead, it is expected that the computational environment of
an ECMAScript program will provide not only the objects and other
facilities described in this specification but also certain
environment-specific objects, whose description and behaviour are
beyond the scope of this specification except to indicate that they
may provide certain properties that can be accessed and certain
functions that can be called from an ECMAScript program.
The sandbox in our scenario only gets some harmless toys (i.e. given variables of primitive types or strings) put into it, and after the child has played with them (the script has run), the resulting buildings (user-defined variables) are taken out of it to preserve them (used inside Java program).

(1) Code running in a virtual machine might be able to escape. Even for well known JS implementations such as V8 this commonly happens. By running untrusted code on your server, whenever such a vulnerability becomes known, you are vulnerable. You should definitely prepare for that, do a risk assessment, e.g. which other data is accessible on the (virtual) machine the engine runs on (other customers data?, secrets?), and additionally harden your infrastructure against that.
(2) Does it halt? What happens if a customer runs while(true); ? Does that crash your server? One can defend against that by killing the execution after a certain timeout (don't try to validate the code, this will never work reliably).
(3) Are the resources used limited (memory)? With a = ""; while(true) a += "memory"; one can easily allocate a lot of memory, with negative impact on other programs. One should make sure that also the memory usage is limited in such a way that the program is killed before resources are exhausted.

Just some thoughts. You're essentially asking if you can trust your sandbox/vitual machine, for that you should either assume that you're using a good one or the only way to be really sure is to read through all its source code yourself. If you choose a trusted and well known sandbox, I'd guess you can just trust it (javascript shouldn't be able to affect file system stuff outside of it).
On the other hand why aren't you just doing all this calculations client side and then sending the result to your backend, it seems like a lot of setup just to be able to run javascript server side. If the argument for this is "not cheating" or something similar, then you can't avoid that even if your code is sent to the server (you have no idea who's sending you that javascript). In my opinion doing this setup just to run it server side doesn't make sense, just run it client side.
If you do need to use it server side then you need to consider if your java is running with root permissions (in which case it will likely also invoke the sandbox with root permissions). On my setup my nodejs is executing under ~/home so even if a worst case happens and someone manages to delete everything the worst they can do is wipe out the home directory. If you're running javascript server side then I'd strongly suggest at the very least never do so under root. It shouldn't be able to do anything outside that sandbox but at least then even in the worst case it can't wipe out your server.
Something else I'd consider (since I have no idea what your sandbox allows or limits) is whether you can request and make API calls with javascript in that sandbox (or anything similar), because if it's running under root and allows that it would give someone root access to your infrastructure (your infrastructure thinking it's your server making requests when it's actually malicious JS code).
You could also make a mistake or start up your VM with an incorrect argument or missing config option and it suddenly allows a vulnerability without you being aware of it, so you'll have to make sure you're setting it up correctly.
Something else is that if you ever store that JS in some database, instead of just executing it, then you have to make sure that it's not made directly available to any other users without checking it otherwise you'd have XSS happening. For example you build an app for "coding tests" and store the result of their test in a database, then you want to show that result to a potential employer, if you just directly display that result to them you'll execute malicious code in their browser.
But I don't really see a reason why you should care about any of this, just run it client side.

Related

Found fs.open with non literal argument at index 0 when using url built from arguments

I'm trying to do something like this (typescript)
window.open(`https://somelink/certificate/${regNumber}?registrationNumber=${type}`);
where regNumber and type are very dynamic.
ESLint is giving me an error
Found fs.open with non literal argument at index 0 security/detect-non-literal-fs-filename
I know this is https://owasp.org/www-community/attacks/Path_Traversal path traversal vulnerability, but simply cannot figure out how to go around it. Any ideas? Thanks
Screenshot:

In your case, this rule can be safely ignored.
What the rule does is, it makes a list of these object keys, which includes open, and then checks whether any property accessed in the code (or, more specifically, any MemberExpression) matches one of those keys.
So, while it'll generate a warning for fs.writeFile, and fs.open, for example, it'll also generate one for window.open - despite the fact that the client-side window object is completely different from fs.
fs methods allow for broad manipulation of the server's filesystem. Allowing arbitrary access to this is a bad idea.
window.open only allows a client's browser to open a window to another address, which is nearly innocuous and has very little chance of harming anything.
There's still a potential small vulnerability, but the potential vulnerability will exist regardless of your window.open code - if the webserver is set up improperly and allows arbitrary URL accesses to do something improper (which would be pretty unlikely), that means there's a big issue to fix on the server - but it's not an issue that client-side code should try to deal with.
If the project you're working on does not contain any server-side code, feel free to disable the security/detect-non-literal-fs-filename rule for the whole project.

Sandboxing Node.js modules - can it be done?

I'm learning Node.js (-awesome-), and I'm toying with the idea of using it to create a next-generation MUD (online text-based game). In such games, there are various commands, skills, spells etc. that can be used to kill bad guys as you run around and explore hundreds of rooms/locations. Generally speaking, these features are pretty static - you can't usually create new spells, or build new rooms. I however would like to create a MUD where the code that defines spells and rooms etc. can be edited by users.
That has some obvious security concerns; a malicious user could for example upload some JS that forks the child process 'rm -r /'. I'm not as concerned with protecting the internals of the game (I'm securing as much as possible, but there's only so much you can do in a language where everything is public); I could always track code changes wiki-style, and punish users who e.g. crash the server, or boost their power over 9000, etc. But I'd like to solidly protect the server's OS.
I've looked into other SO answers to similar questions, and most people suggest running a sandboxed version of Node. This won't work in my situation (at least not well), because I need the user-defined JS to interact with the MUD's engine, which itself needs to interact with the filesystem, system commands, sensitive core modules, etc. etc. Hypothetically all of those transactions could perhaps be JSON-encoded in the engine, sent to the sandboxed process, processed, and returned to the engine via JSON, but that is an expensive endeavour if every single call to get a player's hit points needs to be passed to another process. Not to mention it's synchronous, which I would rather avoid.
So I'm wondering if there's a way to "sandbox" a single Node module. My thought is that such a sandbox would need to simply disable the 'require' function, and all would be bliss. So, since I couldn't find anything on Google/SO, I figured I'd pose the question myself.

Okay, so I thought about it some more today, and I think I have a basic strategy:
var require = function(module) {
throw "Uh-oh, untrusted code tried to load module '" + module + "'";
}
var module = null;
// use similar strategy for anything else susceptible
var loadUntrusted = function() {
eval(code);
}
Essentially, we just use variables in a local scope to hide the Node API from eval'ed code, and run the code. Another point of vulnerability would be objects from the Node API that are passed into untrusted code. If e.g. a buffer was passed to an untrusted object/function, that object/function could work its way up the prototype chain, and replace a key buffer function with its own malicious version. That would make all buffers used for e.g. File IO, or piping system commands, etc., vulnerable to injection.
So, if I'm going to succeed in this, I'll need to partition untrusted objects into their own world - the outside world can call methods on it, but it cannot call methods on the outside world. Anyone can of course feel free to please tell me of any further security vulnerabilities they can think of regarding this strategy.

Is it possible to manipulate every Javascript variables, objects while or after running?

It seems there's no way to completely hide source/encrypt something to prevent users from inspecting the logic behind a script.
Aside from viewing the source, then, is it possible to manipulate every variables, objects while a script is running?
It seems it is possible to some degree: by using Chrome's developer tools or Firebug, you can easily edit variables or even invoke functions on the global scope.
Then what about variables, functions inside of an instantiated objects or self invoked anonymous functions? Here is an example:
var varInGlobal = 'On the global scope: easily editable';
function CustomConstructor()
{
this.exposedProperty = 'Once instantiated, can be easily manipulated too.';
this.func1 = function(){return func1InConstructor();}
var var1InConstructor = 'Can be retrived by invoking func1 from an instantiated object';
// Can it be assigned a new value after this is instantiated?
function func1InConstructor()
{
return var1InConstructor;
}
}
var customObject = new CustomConstructor();
After this is ran on a browser:
// CONSOLE WINDOW
varInGlobal = 'A piece of cake!';
customObject.exposedProperty = 'Has new value now!';
customObject.var1InConstructor; // undefined: the variable can't be access this way
customObject.func1(); // This is the correct way
At this stage, is it possible for a user to edit the variable "var1InConstructor" in customObject?
Here's another example:
There is a RPG game built on Javascript. The hero in the game has two stats: strength and agility. the character's final damage is calculated by combining these two stats. It is clear that players can find out this logic by inspecting the source.
Let's assume the entire script is self invoked and stats/calculate functions are inside of objects' constructors so they can't be reached by normally after instantiated. My question is, can the players edit the character's str and agi while the game is running(by using Firebug or whatever) so they can steamroll everything and ruin the game?

The variable var1InConstructor cannot be re-bound under normal ECMAScript rules as it is visible only within the lexical scope. However, as alex (and others) rightly say, the client should not be trusted.
Here are some ways the user can exploit the assumption that the variable is read-only:
Use a JavaScript debugger (e.g. FireBug) and re-assign the variable while stopped at a breakpoint within the applicable scope.
Copy and paste the original source code, but add a setter with access to the variable. The user could even copy the entire program invalidating almost every assumption about execution.
Modify or inject a value at a usage site: an exploitation might be possible without ever actually updating the original variable (e.g. player.power = function () { return "godlike" }).
In the end, with a client-side program, there is no way to absolutely prevent a user from cheating without a centralized authority (read: server) auditing every action - and even then it still might be possible to cheat by reading additional game state, such as enemy positions.
JavaScript, being easy to read, edit, and execute dynamically is even easier to hack/fiddle with than a compiled application. Obfuscation is possible but, if someone wants to cheat, they will.

I don't think this constitutes an answer, it could be seen as anecdotal, but it's a bit long for a comment.
Everything you do when it comes to the integrity of your coding on this issue has to revolve around needing to verify that the data hasn't changed outside of the logic of your game.
My experience with game development (via flash, primarily...but could be compared to javascript) is that you need to think about everything being a handshake where possible. When you are expecting data to come to the server from the client you want to make sure that you have some form of passage of communication that lessens the chance of someone simply sending false data. Store data on the server side as much as possible and use the client side code to call for it when it's needed, and refresh this data store often.
You'll find that HTML games tend to do a lot of abstraction of the logic to the server side, even for menial tasks. Attacking an enemy, picking up an item, these are calls to functions within server-side code, and is why the game animation could carry on in some of these games while the connection times out in the background, causing error messages to pop up and refresh the interface to the server's last known valid state.
Flash was easier in this regard as you didn't have any access to alter any data or corrupt it unless it left the flash environment

Yes, anything ran on the client should be untrusted if you're using the data from it to update a server side state.

As you suggested, you can't hide the logic/client-side code. You can make it "harder" for people to read the source by obfuscating it, but it's very trivial to undo.
Assuming you're making a game from your example, the first rule of networked games is "never trust the client". You need to either run all the game logic on a server, or you need to validate all the input on a server. Never update the game state based on input from a client without validating it first.

You can't hide any variable.
Also, if the user is so good in javascript, he can easily edit your script, without editing the variables value through the console.

JS code that is injected into an HTML using Ajax is pretty darn difficult to get your hands on, but it also has it's limitations. Most notably, you can't use JS includes in injected HTML . . . only inline JS.
I've been working with some of that recently actually and it's a real pain to debug. You can't see it, step into it, or add breakpoints to it in any way that I can figure out . . . in Firebug or Chrome's built-in tool.
But, as others have said . . . I still wouldn't consider it trusted.

Javascript eval (and friends)

Some claim eval is evil.
Any regular HTML page may look like:
<script src="some-trendy-js-library.js"></script>
</body>
</html>
That is, assuming the person doing this knows his job and leaves javascript to load at the end of the page.
Here, we are basically loading a script file into the web browser. Some people have gone deeper and use this as a way to communicate with a 3rd party server...
<script src="//foo.com/bar.js"></script>
At this point, it's been found important to actually load those scripts conditionally at runtime, for whatever reason.
What is my point? While the mechanics differ, we're doing the same thing...executing a piece of plain text as code - aka eval().
Now that I've made my point clear, here goes the question...
Given certain conditions, such as an AJAX request, or (more interestingly) a websocket connection, what is the best way to execute a response from the server?
Here's a couple to get you thinking...
eval() the server's output. (did that guy over there just faint?)
run a named function returned by the server: var resp = sock.msg; myObj[resp]();
build my own parser to figure out what the server is trying to tell me without messing with the javascript directly.

Given certain conditions, such as an AJAX request, or (more interestingly) a websocket connection, what is the best way to execute a response from the server?
The main criticism of eval when used to parse message results is that it is overkill -- you are using a sledgehammer to swat a fly with all the extra risk that comes from overpowered tools -- they can bounce back and hit you.
Let's break the kinds of responses into a few different categories:
Static javascript loaded on demand
A dynamic response from a trusted source on a secure channel that includes no content specified by untrusted parties.
A dynamic response from mixed sources (maybe mostly trusted but includes encoded strings specified by untrusted parties) that is mostly data
Side-effects based on data
For (1), there is no difference between XHR+eval and <script src>, but XHR+eval has few advantages.
For (2), little difference. If you can unpack the response using JSON.parse you are likely to run into fewer problems, but eval's extra authority is less likely to be abused with data from a trusted source than otherwise so not a big deal if you've got a good positive reason for eval.
For (3), there is a big difference. eval's extra-abusable authority is likely to bite you even if you're very careful. This is brittle security-wise. Don't do it.
For (4), it's best if you can separate it into a data problem and a code problem. JSONP allows this if you can validate the result before execution. Parse the data using JSON.parse or something else with little abusable authority, so a function you wrote and approved for external use does the side-effects. This minimizes the excess abusable authority. Naive eval is dangerous here.

"Evil" does not mean "forbidden". Sometimes, there are perfectly good reasons to use so-called "evil" features. They are just called "evil" since they can be, and often are, misused.
In your case, the client-side script is only allowed to make requests to "its own" server. This is the same server the original JavaScript came from, so the dynamic response is as trusted as the original code. A perfectly valid scenario for eval().

If you're fetching code from a domain you don't control, then handing over the code "raw" to the JavaScript interpreter always means you have to completely trust that domain, or else that you have to not care whether malicious code corrupts your own pages.
If you control the domain, then do whatever you want.

The server should provide you with data, not code. You should have the server respond with JSON data that your JS code can act accordingly. Having the server send names of functions to be called with myObj[resp](); is still tightly coupling the server logic with client logic.
It's hard to provide more suggestions without some example code.

Have your server return JSON, and interpret that JSON on the client. The client will figure out what to do with the JSON, just as the server figures out what to do with requests received by the client.
If your server starts returning executable code, you have a problem. NOT because something "bad" is going to happen (although it might), but because your server is not responsible for knowing what the client is or is not suppose to do.
That's like sending code to the server and expected the server to execute it. Unless you've got a REALLY good reason (such as an in-browser IDE), that's a bad idea.
Use eval as much as you want, just make sure you're seperating responsibilites.
Edit:
I see the flaw in this logic. The server is obviously telling the client what to do, simply because it supplied the scripts that the client executes. However, my point is that the server-side code should not be generating scripts on the fly. The server should be orchestrating, not producing.

How do I safely "eval" user code in a webpage?

I'm working on a webapp to teach programming concepts. Webpages have some text about a programming concept, then let the user type in javascript code into a text editor window to try to answer a programming problem. When the user clicks "submit", I analyse the text they've typed to see if they have solved the problem. For example, I ask them to "write a function named f that adds three to its argument".
Here's what I'm doing to analyse the user's text:
Run JSLint on the text with strict settings, in particular without assuming browser or console functions.
If there are any errors, show the errors and stop.
eval(usertext);
Loop through conditions for passing the assignment, eval(condition). An example condition is "f(1)===4". Conditions come from trusted source.
Show passing/failing conditions.
My questions: is this good enough to prevent security problems? What else can I do to be paranoid? Is there a better way to do what I want?
In case it is relevant my application is on Google App Engine with Python backend, uses JQuery, has individual user accounts.

So from what I can tell if you are eval'ing a user's input only for them, this isn't a security problem. Only if their input is eval'd for other users you have a problem.
Eval'ing a user's input is no worse than them viewing source, looking at HTTP headers, using Firebug to inspect JavaScript objects, etc. They already have access to everything.
That being said if you do need to secure their code, check out Google Caja http://code.google.com/p/google-caja/

This is a trick question. There is no secure way to eval() user's code on your website.

Not clear if the eval() occurs on client or server side. For client side:
I think it's possible to eval safely in an well configured iframe (https://www.html5rocks.com/en/tutorials/security/sandboxed-iframes/)
This should be 100% safe, but needs a couple of libraries and has some limitations (no es6 support): https://github.com/NeilFraser/JS-Interpreter
There are lighter alternatives but not 100% safe like https://github.com/commenthol/safer-eval.
Alternatively, I think something similar can be implemented manually wrapping code in a with statement, overriding this, globals and arguments. Although it will never be 100% safe maybe is viable in your case.

It can't be done. Browsers offer no API to web pages to restrict what sort of code can be executed within a given context.
However, that might not matter. If you don't use any cookies whatsoever on your website, then executing arbitrary Javascript may not be a problem. After all, if there is no concept of authentication, then there's no problem with forging requests. Additionally, if you can confirm that the user meant to execute the script he/she sent, then you should also be protected from attackers, e.g., if you will only run script typed onto the page and never script submitted via GET or POST data, or if you include some kind of unique token with those requests to confirm that the request originated with your website.
Still, the answer to the core question is that it pretty much is that it can't be done, and that user input can never be trusted. Sorry :/

Your biggest issue will always be preventing infinite loops for occurring in user-provided code. You may be able to hide "private" references by running eval in the right context, e.g.:
let userInput = getUserInput();
setTimeout(() => {
let window = null;
let global = null;
let this = null;
// ... set any additional references to `null`
eval(userInput);
}, 0);
And you could wrap the above code in a try/catch to prevent syntax and logic errors from crashing outside of the controlled eval scope, but you will (provably) never be able to detect whether incoming user input defines an infinite loop that will tie up javascript's single thread, rendering its runtime context completely stalled. The only solution to a problem like this is to define your own javascript interpreter, use it to process the user's input, and provide a mechanism to limit the number of steps your javascript interpreter is willing to take. That would be a lot of trouble!

We Keep Coding

JavaScript is the programming language of the Web.