Sandboxing Node.js modules - can it be done? - javascript

I'm learning Node.js (-awesome-), and I'm toying with the idea of using it to create a next-generation MUD (online text-based game). In such games, there are various commands, skills, spells etc. that can be used to kill bad guys as you run around and explore hundreds of rooms/locations. Generally speaking, these features are pretty static - you can't usually create new spells, or build new rooms. I however would like to create a MUD where the code that defines spells and rooms etc. can be edited by users.
That has some obvious security concerns; a malicious user could for example upload some JS that forks the child process 'rm -r /'. I'm not as concerned with protecting the internals of the game (I'm securing as much as possible, but there's only so much you can do in a language where everything is public); I could always track code changes wiki-style, and punish users who e.g. crash the server, or boost their power over 9000, etc. But I'd like to solidly protect the server's OS.
I've looked into other SO answers to similar questions, and most people suggest running a sandboxed version of Node. This won't work in my situation (at least not well), because I need the user-defined JS to interact with the MUD's engine, which itself needs to interact with the filesystem, system commands, sensitive core modules, etc. etc. Hypothetically all of those transactions could perhaps be JSON-encoded in the engine, sent to the sandboxed process, processed, and returned to the engine via JSON, but that is an expensive endeavour if every single call to get a player's hit points needs to be passed to another process. Not to mention it's synchronous, which I would rather avoid.
So I'm wondering if there's a way to "sandbox" a single Node module. My thought is that such a sandbox would need to simply disable the 'require' function, and all would be bliss. So, since I couldn't find anything on Google/SO, I figured I'd pose the question myself.

Okay, so I thought about it some more today, and I think I have a basic strategy:
var require = function(module) {
throw "Uh-oh, untrusted code tried to load module '" + module + "'";
}
var module = null;
// use similar strategy for anything else susceptible
var loadUntrusted = function() {
eval(code);
}
Essentially, we just use variables in a local scope to hide the Node API from eval'ed code, and run the code. Another point of vulnerability would be objects from the Node API that are passed into untrusted code. If e.g. a buffer was passed to an untrusted object/function, that object/function could work its way up the prototype chain, and replace a key buffer function with its own malicious version. That would make all buffers used for e.g. File IO, or piping system commands, etc., vulnerable to injection.
So, if I'm going to succeed in this, I'll need to partition untrusted objects into their own world - the outside world can call methods on it, but it cannot call methods on the outside world. Anyone can of course feel free to please tell me of any further security vulnerabilities they can think of regarding this strategy.

Related

security considerations when using (end-user-defined) JavaScript code inside Java

I am working on a Java project. In it, we want to enable an end-user to define variables which are calculated based on a set of given variables of primitive types or strings. At some point, all given variables are set to specific values, and then the calculations should be carried out. All resulting calculated variables must then be sent to Java.
I am in the process of evaluating ways for the end-user to define his calculations. The (current) idea is to let him write JavaScript and let that code be interpreted/executed inside the Java program. I know of two ways for this to be done: Either use the javax.scripting API or GraalVM/Truffle. In both, we would do it like this:
The given variables are given into the script. In javax.scripting via ScriptEngine.put, in Graal/Truffle via Value.putMember.
The end-user can define variables in the global context (whose names must not collide with the ones coming from Java). How he sets their values is up to him - he can set them directly (to a constant, to one of the given variables, to the sum of some of them ...) or define objects and functions and set the values by calling those.
When the time comes where the given variables have a fixed value, the script is executed.
All variables that were defined in the global context by the script will be sent to Java. In javax.scripting via ScriptEngine.get, in Graal/Truffle via Value.getMember.
NOTE: We would not grant the script access to any Java classes or methods. In javax.scripting via check if the script contains the string Java.type (and disallow such a script), in Graal/Truffle via using the default Context (which has allowAllAccess=false).
The internet is full of hints and tips regarding JavaScript security issues and how to avoid them. On the one hand, I have the feeling that none of them apply here (explanation below). On the other hand, I don't know JavaScript well - I have never used it for anything else than pure, side-effect-free calculations.
So I am looking for some guidance here: What kind of security issues could be present in this scenario?
Why I cannot see any security issues in this scenario:
This is pure JavaScript. It does not even allow creating Blobs (which are part of WebAPI, not JavaScript) which could be used to e.g. create a file on disk. I understand that JavaScript does not contain any functionality to escape its sandbox (like file access, threads, streams...), it is merely able to manipulate the data that is given into its sandbox. See this part of https://262.ecma-international.org/11.0/#sec-overview:
ECMAScript is an object-oriented programming language for performing
computations and manipulating computational objects within a host
environment. ECMAScript as defined here is not intended to be
computationally self-sufficient; indeed, there are no provisions in
this specification for input of external data or output of computed
results. Instead, it is expected that the computational environment of
an ECMAScript program will provide not only the objects and other
facilities described in this specification but also certain
environment-specific objects, whose description and behaviour are
beyond the scope of this specification except to indicate that they
may provide certain properties that can be accessed and certain
functions that can be called from an ECMAScript program.
The sandbox in our scenario only gets some harmless toys (i.e. given variables of primitive types or strings) put into it, and after the child has played with them (the script has run), the resulting buildings (user-defined variables) are taken out of it to preserve them (used inside Java program).
(1) Code running in a virtual machine might be able to escape. Even for well known JS implementations such as V8 this commonly happens. By running untrusted code on your server, whenever such a vulnerability becomes known, you are vulnerable. You should definitely prepare for that, do a risk assessment, e.g. which other data is accessible on the (virtual) machine the engine runs on (other customers data?, secrets?), and additionally harden your infrastructure against that.
(2) Does it halt? What happens if a customer runs while(true); ? Does that crash your server? One can defend against that by killing the execution after a certain timeout (don't try to validate the code, this will never work reliably).
(3) Are the resources used limited (memory)? With a = ""; while(true) a += "memory"; one can easily allocate a lot of memory, with negative impact on other programs. One should make sure that also the memory usage is limited in such a way that the program is killed before resources are exhausted.
Just some thoughts. You're essentially asking if you can trust your sandbox/vitual machine, for that you should either assume that you're using a good one or the only way to be really sure is to read through all its source code yourself. If you choose a trusted and well known sandbox, I'd guess you can just trust it (javascript shouldn't be able to affect file system stuff outside of it).
On the other hand why aren't you just doing all this calculations client side and then sending the result to your backend, it seems like a lot of setup just to be able to run javascript server side. If the argument for this is "not cheating" or something similar, then you can't avoid that even if your code is sent to the server (you have no idea who's sending you that javascript). In my opinion doing this setup just to run it server side doesn't make sense, just run it client side.
If you do need to use it server side then you need to consider if your java is running with root permissions (in which case it will likely also invoke the sandbox with root permissions). On my setup my nodejs is executing under ~/home so even if a worst case happens and someone manages to delete everything the worst they can do is wipe out the home directory. If you're running javascript server side then I'd strongly suggest at the very least never do so under root. It shouldn't be able to do anything outside that sandbox but at least then even in the worst case it can't wipe out your server.
Something else I'd consider (since I have no idea what your sandbox allows or limits) is whether you can request and make API calls with javascript in that sandbox (or anything similar), because if it's running under root and allows that it would give someone root access to your infrastructure (your infrastructure thinking it's your server making requests when it's actually malicious JS code).
You could also make a mistake or start up your VM with an incorrect argument or missing config option and it suddenly allows a vulnerability without you being aware of it, so you'll have to make sure you're setting it up correctly.
Something else is that if you ever store that JS in some database, instead of just executing it, then you have to make sure that it's not made directly available to any other users without checking it otherwise you'd have XSS happening. For example you build an app for "coding tests" and store the result of their test in a database, then you want to show that result to a potential employer, if you just directly display that result to them you'll execute malicious code in their browser.
But I don't really see a reason why you should care about any of this, just run it client side.

How to read audio data from a 'MediaStream' object in a C++ addon

After sweating blood and tears I've finally managed to set up a Node C++ addon and shove a web-platform standard MediaStream object into one of its C++ methods for good. For compatibility across different V8 and Node.js versions, I'm using Native Abstractions for Node.js (nan):
addon.cc
NAN_METHOD(SetStream)
{
Nan::HandleScope scope;
v8::Local<v8::Object> mediaStream = info[0]->ToObject();
}
addon.js
setStream(new MediaStream());
For what it's worth, this works correctly (i.e. it does not obliterate the renderer process on sight), and I can verify the presence of the MediaStream object, e.g. by returning its constructor name from the C++ method:
addon.cc
info.GetReturnValue().Set(mediaStream->GetConstructorName());
When called from JavaScript through setStream, this would return the string MediaStream, so the object is definitely there. I can also return the mediaStream object itself and everything will work correctly, so it's indeed the object I need.
So, how would I read audio data (i.e. audio samples) from this MediaStream object in C++? As a sidenote, the actual data read (and processing) would be done in a separate std::thread.
Bounty Update
I understand this would be sort of easier/possible if I were compiling Electron and/or Chromium myself, but I'd rather not get involved in that maintenance hell.
I was wondering if it would be possible without doing that, and as far as my research goes, I'm convinced I need 2 things to get this done:
The relevant header files, for which I believe blink public should be adequate
A chromium/blink library file (?), to resolve external symbols, similarly to the node.dylib file
Also, as I said, I believe I could compile chromium/blink myself and then I would have this lib file, but that would be a maintenance hell with Electron. With this in mind, I believe this question ultimately comes down to a C++ linking question. Is there any other approach to do what I'm looking for?
Edit
ScriptProcessorNode is not an option in my case, as its performance makes it nearly unusable in production. This would require to process audio samples on the ui/main thread, which is absolutely insane.
Edit 2
AudioWorklets have been available in Electron for some time now, which, unlike the ScriptProcessorNode (or worse, the AnalyzerNode), is low-latency and very reliable for true C++ backed audio processing even in real time.
If someone wants to go ahead and write an AudioWorklet-based answer, I'll gladly accept, but beware: it's a very advanced field and a very deep rabbit hole, with countless obstacles to get through even before a very simple, general-purpose pass-through prototype (especially so because currently in Electron, Atomics-synced, buffered cross-thread audio processing is required to pull this off because https://github.com/electron/electron/issues/22503 -- although getting a native C++ addon into one audio renderer thread, let alone multiple threads at the same time, is probably equally as challenging).
The MediaStream header is part of Blink's renderer modules, and it's not obvious to me how you could retrieve this from nan plugin.
So, instead let's look at what you do have, namely a v8::Object. I believe that v8::Object exposes all the functionality you need, it has:
GetPropertyNames()
Get(context, index)
Set(context, key, value)
Has(context, key)
Unless you really need a strictly defined interface, why not avoid the issue altogether and just use the dynamic type that you already have?
For getting audio data out specifically, you would need to call getAudioTracks() on the v8::Object, which probably looks something like this?
Note: I don't think you need a context, v8 seems to be happy with it being empty: v8/src/api/api.cc
Should look something like this, plus some massaging of types in and out of v8.
v8::MaybeLocal<v8::Value> get_audio_tracks = mediaStream->Get("getAudioTracks");
// Maybe needs to be v8::Object or array?
if (!get_audio_tracks.IsEmpty()) {
v8::Local<v8::Value> audio_tracks = get_audio_tracks.ToLocalChecked()();
}

ES6 exports/imports use-case, compared to traditional namespacing

I don't understand WHY and in what scenario this would be used..
My current web setup consists of lots of components, which are just functions or factory functions, each in their own file, and each function "rides" the app namespace, like : app.component.breadcrumbs = function(){... and so on.
Then GULP just combines all the files, and I end up with a single file, so a page controller (each "page" has a controller which loads the components the page needs) can just load it's components, like: app.component.breadcrumbs(data).
All the components can be easily accessed on demand, and the single javascript file is well cached and everything. This way of work seems extremely good, never saw any problem with this way of work. of course, this can (and is) be scaled nicely.
So how are ES6 imports for functions any better than what I described?
what's the deal with importing functions instead of just attaching them to the App's namespace? it makes much more sense for them to be "attached".
Files structure
/dist/app.js // web app namespace and so on
/dist/components/breadcrumbs.js // some component
/dist/components/header.js // some component
/dist/components/sidemenu.js // some component
/dist/pages/homepage.js // home page controller
// GULP concat all above to
/js/app.js // this file is what is downloaded
Then inside homepage.js it can look like this:
app.routes.homepage = function(){
"use strict";
var DOM = { page : $('#page') };
// append whatever components I want to this page
DOM.page.append(
app.component.header(),
app.component.sidemenu(),
app.component.breadcrumbs({a:1, b:2, c:3})
)
};
This is an extremely simplified code example but you get the point
Answers to this are probably a little subjective, but I'm going to do my best.
At the end of the day, both methods allow support creating a namespace for a piece of functionality so that it does not conflict with other things. Both work, but in my view, modules, ES6 or any other, provide a few extra benefits.
Explicit dependencies
Your example seems very bias toward a "load everything" approach, but you'll generally find that to be uncommon. If your components/header.js needs to use components/breadcrumbs.js, assumptions must be made. Has that file been bundled into the overall JS file? You have no way of knowing. You're two options are
Load everything
Maintain a file somewhere that explicitly lists what needs to be loaded.
The first option is easy and in the short term is probably fine. The second is complicated for maintainability because it would be maintained as an external list, it would be very easy to stop needing one of your component file but forget to remove it.
It also means that you are essentially defining your own syntax for dependencies when again, one has now been defined in the language/community.
What happens when you want to start splitting your application into pieces? Say you have an application that is a single large file that drives 5 pages on your site, because they started out simple and it wasn't big enough to matter. Now the application has grown and should be served with a separate JS file per-page. You have now lost the ability to use option #1, and some poor soul would need to build this new list of dependencies for each end file.
What if you start using a file in a new places? How do you know which JS target files actually need it? What if you have twenty target files?
What if you have a library of components that are used across your whole company, and one of they starts relying on something new? How would that information be propagated to any number of the developers using these?
Modules allow you to know with 100% certainty what is used where, with automated tooling. You only need to package the files you actually use.
Ordering
Related to dependency listing is dependency ordering. If your library needs to create a special subclass of your header.js component, you are no longer only accessing app.component.header() from app.routes.homepage(), which would presumable be running at DOMContentLoaded. Instead you need to access it during the initial application execution. Simple concatenation offers no guarantees that it will have run yet. If you are concatenating alphabetically and your new things is app.component.blueHeader() then it would fail.
This applies to anything that you might want to do immediately at execution time. If you have a module that immediately looks at the page when it runs, or sends an AJAX request or anything, what if it depends on some library to do that?
This is another argument agains #1 (Load everything) so you start having to maintain a list again. That list is again going to be a custom things you'll have come up with instead of a standardized system.
How do you train new employees to use all of this custom stuff you've built?
Modules execute files in order based on their dependencies, so you know for sure that the stuff you depend on will have executed and will be available.
Scoping
Your solution treats everything as a standard script file. That's fine, but it means that you need to be extremely careful to not accidentally create global variables by placing them in the top-level scope of a file. This can be solved by manually adding (function(){ ... })(); around file content, but again, it's one more things you need to know to do instead of having it provided for you by the language.
Conflicts
app.component.* is something you've chosen, but there is nothing special about it, and it is global. What if you wanted to pull in a new library from Github for instance, and it also used that same name? Do you refactor your whole application to avoid conflicts?
What if you need to load two versions of a library? That has obvious downsides if it's big, but there are plenty of cases where you'll still want to trade big for non-functional. If you rely on a global object, it is now up to that library to make sure it also exposes an API like jQuery's noConflict. What if it doesn't? Do you have to add it yourself?
Encouraging smaller modules
This one may be more debatable, but I've certainly observed it within my own codebase. With modules, and the lack of boilerplate necessary to write modular code with them, developers are encouraged to look closely on how things get grouped. It is very easy to end up making "utils" files that are giant bags of functions thousands of lines long because it is easier to add to an existing file that it is to make a new one.
Dependency webs
Having explicit imports and exports makes it very clear what depends on what, which is great, but the side-effect of that is that it is much easier to think critically about dependencies. If you have a giant file with 100 helper functions, that means that if any one of those helpers needs to depend on something from another file, it needs to be loaded, even if nothing is ever using that helper function at the moment. This can easily lead to a large web of unclear dependencies, and being aware of dependencies is a huge step toward thwarting that.
Standardization
There is a lot to be said for standardization. The JavaScript community has moved heavily in the direction of reusable modules. This means that if you hope into a new codebase, you don't need to start off by figuring out how things relate to eachother. Your first step, at least in the long run, won't be to wonder whether something is AMD, CommonJS, System.register or what. By having a syntax in the language, it's one less decision to have to make.
The long and short of it is, modules offer a standard way for code to interoperate, whether that be your own code, or third-party code.
Your current process is to concatenate everything always into a single large file, only ever execute things after the whole file has loaded and you have 100% control over all code that you are executing, then you've essentially defined your own module specification based on your own assumptions about your specific codebase. That is totally fine, and no-one is forcing you to change that.
No such assumptions can be made for the general case of JavaScript code however. It is precisely the objective of modules to provide a standard in such a way as to not break existing code, but to also provide the community with a way forward. What modules offer is another approach to that, which is one that is standardized, and one that offers clearer paths for interoperability between your own code and third-party code.

Dependency Injection vs. Managed Dependencies vs. Global Object

I'm working within a Javascript + BackboneJS (an MVC framework) + RequireJS framework, but this question is somewhat OO generic.
Let me start by explaining that in Backbone, your Views are a mix of traditional Views and Controllers, and your HTML Templates are the traditional MVC Views
Been racking my head about this for a while and I'm not sure what the right/pragmatic approach should be.
I have a User object that contains user preferences (like unit system, language selection, anything else) that a lot of code depends on.
Some of my Views do most of the work without the use of templates (by using 3rd party libs, like Mapping and Graphing libs), and as such they have a dependency on the User object to take care of unit conversion, for example. I'm currently using RequireJS to manage that dependency without breaking encapsulation too much.
Some of my Views do very little work themselves, and only pass on Model data to my templating engine / templates, which do the work and DO have a dependency on the User object, again, for things like units conversion. The only way to pass this dependency into the template is by injecting it into the Model, and passing the model into the template engine.
My question is, how to best handle such a widely needed dependency?
- Create an App-wide reference/global object that is accessible everywhere? (YUK)
- Use RequireJS managed dependencies, even though it's generally only recommended to use managed dependency loading for class/object definitions rather than concrete objects.
- Or, only ever use dependency injection, and manually pass that dependency into everything that needs it?
From a purely technical point of view, I would argue that commutable globals (globals that may change), especially in javascript, are dangerous and wrong. Especially since javascript is full of parts of code that get executed asynchronously. Consider the following code:
window.loggedinuser = Users.get("Paul");
addSomeStuffToLoggedinUser();
window.loggedinuser = Users.get("Sam");
doSomeOtherStuffToLoggedinUser();
Now if addSomeStuffToLoggedinUser() executes asynchronously somewhere (e.g. it does an ajax call, and then another ajax call when the first one finishes), it may very well be adding stuff to the new loggedinuser ("Sam"), by the time it gets to the second ajax call. Clearly not what you want.
Having said that, I'm even less of a supporter of having some user object that we hand around all the time from function to function, ad infinitum.
Personally, having to choose between these two evils, I would choose a global scope for things that "very rarely change" --- unless perhaps I was building a nuclear powerstation or something. So, I tend to make the logged in user available globally in my app, taking the risk that if somehow for some reason some call runs very late, and I have a situation where one user logs out and directly the other one logs in, something strange may happen. (then again, if a meteor crashes into the datacenter that hosts my app, something strange may happen as well... I'm not protecting against that either). Actually a possible solution would be to reload the whole app as soon as someone logs out.
So, I guess it all depends on your app. One thing that makes it better (and makes you feel like you're still getting some OO karma points) is to hide your data in some namespaced singleton:
var myuser = MyApp.domain.LoggedinDomain.getLoggedinUser();
doSomethingCoolWith(myuser);
in stead of
doSomethingCoolWith(window.loggedinuser);
although it's pretty much the same thing in the end...
I think you already answered your own question, you just want someone else to say it for you : ) Use DI, but you aren't really "manually" passing that dependency into everything since you need to reference it to use it anyways.
Considering the TDD approach, how would you test this? DI is best for a new project, but JS gives you flexible options to deal with concrete global dependencies when testing, ie: context construction. Going way back, Yahoo laid out a module pattern where all modules were loosely coupled and not dependent on each other, but that it was ok to have global context. That global context can make your app construction more pragmatic for things that are constantly reused. Its just that you need to apply that judiciously/sparingly and there need be very strong cases for those things being dynamic.

multi-core programming using JavaScript?

So I have this seriously recursive function that I would like to use with my code. The issue is it doesn't really take advantage of dual core machines because js is single threaded. I have tried using webworkers but don't really know much about multicore programming. Would someone point me to some material that could explain how it is done. I googled to find this sample link but its not really much help without documentation! =/
I would be glad if someone could show me how this could be done without webworkers though! That would be just awesome! =)
I came across this link on whatwg. This is really weird because it explains how to use multicore programming in webworkers etc, but on executing on my chrome browser it throws errors. Same goes with other browsers.
Error: 9Uncaught ReferenceError: Worker is not defined in worker.js
UPDATE (2018-06-21): For people coming here in search of multi-core programming in JavaScript, not necessarily browser JavaScript (for that, the answer still applies as-is): Node.js now supports multi-threading behind a feature flag (--experimental-workers): release info, relevant issue.
Writing this off the top of my head, no guarantees for source code. Please go easy on me.
As far as I know, you cannot really program in threads with JavaScript. Webworkers are a form of multi-programming; yet JavaScript is by its nature single-threaded (based on an event loop).
A webworker is seperate thread of execution in the sense that it doesn't share anything with the script that started it; there is no reference to the script's global object (typically called "window" in the browser), and no reference to any of your main script's variables other than data you send to the thread.
Think as the web worker as a little "server" that gets asked a question and provides an answer. You can only send strings to that server, and it can only parse the string and send back what it has computed.
// in the main script, one starts a worker by passing the file name of the
// script containing the worker to the constructor.
var w = new Worker("myworker.js");
// you want to react to the "message" event, if your worker wants to inform
// you of a result. The function typically gets the event as an argument.
w.addEventListener("message",
function (evt) {
// process evt.data, which is the message from the
// worker thread
alert("The answer from the worker is " + evt.data);
});
You can then send a message (a String) to this thread using its postMessage()-Method:
w.postMessage("Hello, this is my message!");
A sample worker script (an "echo" server) can be:
// this is another script file, like "myworker.js"
self.addEventListener("message",
function (evt) {
var data = JSON.parse(evt.data);
/* as an echo server, we send this right back */
self.postMessage(JSON.stringify(data))
})
whatever you post to that thread will be decoded, re-encoded, and sent back. of course you can do whatever processing you would want to do in between. That worker will stay active; you can call terminate() on it (in your main script; that'd be w.terminate()) to end it or calling self.close() in your worker.
To summarize: what you can do is you zip up your function parameters into a JSON string which gets sent using postMessage, decoded, and processed "on the other side" (in the worker). The computation result gets sent back to your "main" script.
To explain why this is not easier: More interaction is not really possible, and that limitation is intentional. Because shared resources (an object visible to both the worker and the main script) would be subject to two threads interfering with them at the same time, you would need to manage access (i.e., locking) to that resource in order to prevent race conditions.
The message-passing, shared-nothing approach is not that well-known mainly because most other programming languages (C and Java for example) use threads that operate on the same address space (while others, like Erlang, for instance, don't). Consider this:
It is really hard to code a larger project with mutexes (a mutual exclusion mechanism) because of the associated deadlock/race condition complexities. This is stuff that can make grown men cry!
It is really easy in comparison to do message-passing, shared-nothing semantics. The code is isolated; you know exactly what goes into your worker and what comes out of your worker. Deadlocks and race conditions are impossible to achieve!
Just try it out; it is capable of doing interesting things, probably all you want. Bear in mind that it is still implementation defined whether it takes advantage of multicore as far as I know.
NB. I just got informed that at least some implementations will handle JSON encoding of messages for you.
So, to give an answer to your question (it's all above; tl;dr version): No, you cannot do this without web workers. But there is nothing really wrong about web workers aside from browser support, as is the case with HTML5 in general.
As far as I remember this is only possible with the new HTML5 standard. The keyword is "Web-Worker"
See also:
HTML5: JavaScript Web Workers
JavaScript Threading With HTML5 Web Workers
Web workers are the answer to the client side. For NodeJS there are many approaches. Most popular - spawn several processes with pm2 or similar tool. Run single process and spawn/fork child processes. You can google around these and will find a lot of samples and tactics.
Web workers are already well supported by all browsers. https://caniuse.com/#feat=webworkers
API & samples: https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers

Categories