Why would an exception cause resource leaks in Node.js?

Why would an exception cause resource leaks in Node.js? - javascript

If you look at the beginning of the Node.js documentation for domains it states:
By the very nature of how throw works in JavaScript, there is almost never any way to safely "pick up where you left off", without leaking references, or creating some other sort of undefined brittle state.
Again in the code example it gives in that first section it says:
Though we've prevented abrupt process restarting, we are leaking resources like crazy
I would like to understand why this is the case? What resources are leaking? They recommend that you only use domains to catch errors and safely shutdown a process. Is this a problem with all exceptions, not just when working with domains? Is it a bad practice to throw and catch exceptions in Javascript? I know it's a common pattern in Python.
EDIT
I can understand why there could be resource leaks in a non garbage collected language if you throw an exception because then any code you might run to clean up objects wouldn't run if an exception is thrown.
The only reason I can imagine with Javascript is if throwing an exception stores references to variables in the scope where the exception was thrown (and maybe things in the call stack), thus keeping references around, and then the exception object is kept around and never gets cleaned up. Unless the leaking resources referred to are resources internal to the engine.
UPDATE
I've Written a blog explaining the answer to this a bit better now. Check it out

Unexpected exceptions are the ones you need to worry about. If you don't know enough about the state of the app to add handling for a particular exception and manage any necessary state cleanup, then by definition, the state of your app is undefined, and unknowable, and it's quite possible that there are things hanging around that shouldn't be. It's not just memory leaks you have to worry about. Unknown application state can cause unpredictable and unwanted application behavior (like delivering output that's just wrong -- a partially rendered template, or an incomplete calculation result, or worse, a condition where every subsequent output is wrong). That's why it's important to exit the process when an unhandled exception occurs. It gives your app the chance to repair itself.
Exceptions happen, and that's fine. Embrace it. Shut down the process and use something like Forever to detect it and set things back on track. Clusters and domains are great, too. The text you were reading is not a caution against throwing exceptions, or continuing the process when you've handled an exception that you were expecting -- it's a caution against keeping the process running when unexpected exceptions occur.

I think when they said "we are leaking resources", they really meant "we might be leaking resources". If http.createServer handles exceptions appropriately, threads and sockets shouldn't be leaked. However, they certainly could be if it doesn't handle things properly. In the general case, you never really know if something handles errors properly all the time.
I think they are wrong / very misleading when they said "By the .. nature of how throw works in JavaScript, there is almost never any way to safely ..." . There should not be anything about how throw works in Javascript (vs other languages) that makes it unsafe. There is also nothing about how throw/catch works in general that makes it unsafe - unless of course you use them wrong.
What they should have said is that exceptional cases (regardless of whether or not exceptions are used) need to be handled appropriately. There are a few different categories to recognize:
A. State
Exceptions that occur while external state (database writing, file output, etc) is in a transient state
Exceptions that occur while shared memory is in a transient state
Exceptions where only local variables might be in a transient state
B. Reversibility
Reversible / revertible state (eg database rollbacks)
Irreversible state (Lost data, unknown how to reverse, or prohibitive to reverse)
C. Data criticality
Data can be scrapped
Data must be used (even if corrupted)
Regardless of the type of state you're messing with, if you can reverse it, you should do that and you're set. The problem is irreversible state. If you can destroy the corrupted data (or quarantine it for separate inspection), that is the best move for irreversible state. This is done automatically for local variables when an exception is thrown, which is why exceptions excel at handling errors in purely functional code (ie functions with no possible side-effects). Likewise, any shared state or external state should be deleted if that's acceptable. In the case of shared state, either throw exceptions until that shared state becomes local state and is cleaned up by unrolling of the stack (either statically or via the GC), or restart the program (I've read people suggesting the use of something like nodejitsu forever). For external state, this is likely more complicated.
The last case is when the data is critical. Well, then you're gonna have to live with the bugs you've created. Everyone has to deal with bugs, but its the worst when your bugs involve corrupted data. This will usually require manual intervention (reconstructing the lost/damaged data, selectively pruning, etc) - exception handling won't get you the whole way in the last case.
I wrote a similar answer related to how to handle mid-operation failure in various cases in the context of multiple updates to some data storage: https://stackoverflow.com/a/28355495/122422

Taking the sample from the node.js documentation:
var d = require('domain').create();
d.on('error', function(er) {
// The error won't crash the process, but what it does is worse!
// Though we've prevented abrupt process restarting, we are leaking
// resources like crazy if this ever happens.
// This is no better than process.on('uncaughtException')!
console.log('error, but oh well', er.message);
});
d.run(function() {
require('http').createServer(function(req, res) {
handleRequest(req, res);
}).listen(PORT);
});
In this case you are leaking connections when an exception occurs in handleRequest before you close the socket.
"Leaked" in the sense that you finished processing the request without cleaning up afterwards. Eventually the connection will time out and close the socket, but if your server is under high load it may run out of sockets before that happens.
Depending on what you do in handleRequest you may also be leaking file handles, database connections, event listeners, etc.
Ideally you should handle your exceptions so you can clean up after them.

Related

security considerations when using (end-user-defined) JavaScript code inside Java

I am working on a Java project. In it, we want to enable an end-user to define variables which are calculated based on a set of given variables of primitive types or strings. At some point, all given variables are set to specific values, and then the calculations should be carried out. All resulting calculated variables must then be sent to Java.
I am in the process of evaluating ways for the end-user to define his calculations. The (current) idea is to let him write JavaScript and let that code be interpreted/executed inside the Java program. I know of two ways for this to be done: Either use the javax.scripting API or GraalVM/Truffle. In both, we would do it like this:
The given variables are given into the script. In javax.scripting via ScriptEngine.put, in Graal/Truffle via Value.putMember.
The end-user can define variables in the global context (whose names must not collide with the ones coming from Java). How he sets their values is up to him - he can set them directly (to a constant, to one of the given variables, to the sum of some of them ...) or define objects and functions and set the values by calling those.
When the time comes where the given variables have a fixed value, the script is executed.
All variables that were defined in the global context by the script will be sent to Java. In javax.scripting via ScriptEngine.get, in Graal/Truffle via Value.getMember.
NOTE: We would not grant the script access to any Java classes or methods. In javax.scripting via check if the script contains the string Java.type (and disallow such a script), in Graal/Truffle via using the default Context (which has allowAllAccess=false).
The internet is full of hints and tips regarding JavaScript security issues and how to avoid them. On the one hand, I have the feeling that none of them apply here (explanation below). On the other hand, I don't know JavaScript well - I have never used it for anything else than pure, side-effect-free calculations.
So I am looking for some guidance here: What kind of security issues could be present in this scenario?
Why I cannot see any security issues in this scenario:
This is pure JavaScript. It does not even allow creating Blobs (which are part of WebAPI, not JavaScript) which could be used to e.g. create a file on disk. I understand that JavaScript does not contain any functionality to escape its sandbox (like file access, threads, streams...), it is merely able to manipulate the data that is given into its sandbox. See this part of https://262.ecma-international.org/11.0/#sec-overview:
ECMAScript is an object-oriented programming language for performing
computations and manipulating computational objects within a host
environment. ECMAScript as defined here is not intended to be
computationally self-sufficient; indeed, there are no provisions in
this specification for input of external data or output of computed
results. Instead, it is expected that the computational environment of
an ECMAScript program will provide not only the objects and other
facilities described in this specification but also certain
environment-specific objects, whose description and behaviour are
beyond the scope of this specification except to indicate that they
may provide certain properties that can be accessed and certain
functions that can be called from an ECMAScript program.
The sandbox in our scenario only gets some harmless toys (i.e. given variables of primitive types or strings) put into it, and after the child has played with them (the script has run), the resulting buildings (user-defined variables) are taken out of it to preserve them (used inside Java program).

(1) Code running in a virtual machine might be able to escape. Even for well known JS implementations such as V8 this commonly happens. By running untrusted code on your server, whenever such a vulnerability becomes known, you are vulnerable. You should definitely prepare for that, do a risk assessment, e.g. which other data is accessible on the (virtual) machine the engine runs on (other customers data?, secrets?), and additionally harden your infrastructure against that.
(2) Does it halt? What happens if a customer runs while(true); ? Does that crash your server? One can defend against that by killing the execution after a certain timeout (don't try to validate the code, this will never work reliably).
(3) Are the resources used limited (memory)? With a = ""; while(true) a += "memory"; one can easily allocate a lot of memory, with negative impact on other programs. One should make sure that also the memory usage is limited in such a way that the program is killed before resources are exhausted.

Just some thoughts. You're essentially asking if you can trust your sandbox/vitual machine, for that you should either assume that you're using a good one or the only way to be really sure is to read through all its source code yourself. If you choose a trusted and well known sandbox, I'd guess you can just trust it (javascript shouldn't be able to affect file system stuff outside of it).
On the other hand why aren't you just doing all this calculations client side and then sending the result to your backend, it seems like a lot of setup just to be able to run javascript server side. If the argument for this is "not cheating" or something similar, then you can't avoid that even if your code is sent to the server (you have no idea who's sending you that javascript). In my opinion doing this setup just to run it server side doesn't make sense, just run it client side.
If you do need to use it server side then you need to consider if your java is running with root permissions (in which case it will likely also invoke the sandbox with root permissions). On my setup my nodejs is executing under ~/home so even if a worst case happens and someone manages to delete everything the worst they can do is wipe out the home directory. If you're running javascript server side then I'd strongly suggest at the very least never do so under root. It shouldn't be able to do anything outside that sandbox but at least then even in the worst case it can't wipe out your server.
Something else I'd consider (since I have no idea what your sandbox allows or limits) is whether you can request and make API calls with javascript in that sandbox (or anything similar), because if it's running under root and allows that it would give someone root access to your infrastructure (your infrastructure thinking it's your server making requests when it's actually malicious JS code).
You could also make a mistake or start up your VM with an incorrect argument or missing config option and it suddenly allows a vulnerability without you being aware of it, so you'll have to make sure you're setting it up correctly.
Something else is that if you ever store that JS in some database, instead of just executing it, then you have to make sure that it's not made directly available to any other users without checking it otherwise you'd have XSS happening. For example you build an app for "coding tests" and store the result of their test in a database, then you want to show that result to a potential employer, if you just directly display that result to them you'll execute malicious code in their browser.
But I don't really see a reason why you should care about any of this, just run it client side.

Is try-catch faster than fail only if-else?

My situation is I have a small javascript program running on a huge amount of data formatted as a 2 dimensions array. For every cycle, it loops through the data, calls a serializer method to serialize that record before doing the calculations. The data for every cycle will be the same. So now I want to cache the serialized data into another array so that from the second time it doesn't have to call serializer again.
Here is my function:
var cached=[];
function getRow(x,y){
if(!cached[x]){
cached[x] = [];
}
if(!cached[x][y]){
cached[x][y] = serializer(data,x,y);
}
return cached[x][y] ;
}
So far it works. But as you can see, for every row it must check 2 conditions before returning cached data and the condition only helpful in the first cycle. So I'm thinking about using try-catch instead of if-else to handle it. So that in the first cycle, the try block will fail all the time then I will fill up the cache in catch block. Then after the first cycle when all the data has been cached, it will just get the value right the way and return it.
But my concern is: is it faster by doing that or try-catch will slow down the function and the performance just the same?

try/catch is only going to be more efficient in situations where no error is thrown. And, let's be clear, with modern clients, the increase in efficiency is going to be marginal and probably not noticeable in most applications.
When errors are thrown, performance will suffer because an error object is instantiated and configured, a scope is created for that error object to exist within and (of course) the JS runtime has to recover from the error.
In my experience, try/catch is generally used incorrectly in the first place. It should not be a matter of "choice" to use try/catch, it should be a matter or necessity. If there is a chance your code can cause an error and there is a way for you to code it to avoid that error, do that - - no try/catch needed.
Examples:
Errors that can occur because of bad user input should be dealt with
by re-thinking the UI and how the input is processed.
Errors that can occur because of a bad interaction with a remote
source (think AJAX) generally have error and timeout callback
options.
If there is a way for your code to throw an error and it is beyond the control of your code (network outage, server error), try/catch may be necessary. I say may because many APIs these days have error callback functions to handle these kinds of things.
In short, you should never just choose try/catch. try/catch is a language feature to be used when no other solution is available.

HTTP status codes - Backbone.js and Jquery

Backbone.js has a neat feature where you are able to sync changes back to your sever using standard HTTP verbs.
For example you may have a model object and some code which executes a get:
var coolModel = Backbone.Model.extend({url:'mysite/mymodel'});
var myCoolModel = new coolModel();
myCoolModel.fetch({error:processError});
Under the case where the server returns a 4XX or 5XX the error function 'processError' is run, which is great, you are able to process the error in which ever way suits.
As backbone.js uses jQuery to perform the GET, Jquery reports the error, which it is. The 4XX is a valid error which should be recovered from, my client side app is not broken, it just needs to behave slightly differently.
My question is - is it considered bad practice to have this error raised from jQuery displayed in the browsers console window or status bar? Should I be suppressing this error somehow so that users in production don't see an error reported by the browser when the error is recoverable? Or is it correct in the land of HTTP to leave it as is?

Handling errors in Backbone is a really interesting topic and one I hope to write about at some point. It's very nice to visually indicate errors to your users in a non-obtrusive manner. Some things to consider are:
Your users are not looking at the status bar or developer tools
Your users are expecting specific behavior from your application
When your application does not behave correctly visual problem indicators are important
I'd recommend considering how the failure impacts the user's intention. For instance if they are fetching data for the first page and that data is not returned correctly, you will need to handle the error by displaying a failure of data retrieved (or even better fall back on previously loaded data from a cache... it it exists). If the intention is to save an item and the error code returned is 400 that is definitely not a success and should be indicated that the user should retry saving again, (or perhaps attempt a re-save on an interval).
You can silently ignore errors and not indicate them, but your users will get confused and it will lead to unexpected problems. I can't preach to use perfect error handling, because I'm still getting better at it myself.

I would say HTTP status codes are there for a reason, entirely valid if the reason for them is valid, so yes, just use them. However: 400 means Bad Request, which means the input is syntactically wrong. You should send a more appropriate header (like 409 for a conflict, 428 for a failed precondition, etc.). I'm struggling to come up with a project with a valid use for 418 I'm a teapot, but I will succeed some day..
Anybody interested in the inner workings of your site could look at the console, but there should be no problem with this, nor should you overly pander to a clean look there, just make sure your own process flow is sound.

Garbage collector issues on spidermonkey.... JS_AnchorPtr()?

I've rolled my own javascript server side language called bondi. Just recently upgraded to the new spider monkey.
Now that JS enter local roots and leave local roots function is gone/useless from the 1.8.5 api, is it enough to just use anchor pointer(JS_AnchorPtr(varname)) at the end of your function calls to make sure the compiler isn't removing references to keep the garbage collector happy?
I've been testing it by removing all my references to JS_EnterLocalRootScope (see here)
/ Leave local root scope and adding JS_AnchorPtr() to the bottom of the script.
I looked up AnchorPoint function in the source code of spider monkey. Guess what... it does nothing. There's no doco for it either. I'm using it just so that I can get a mention in of those variables so the garbage collector doesn't kill them.

Well, blame seems to say that bug 519949 is recommending you use js::Anchor so that the conservative stack scanner will pick it up.
Note that the conservative scanner can find any GC thing that's on the stack or in registers, so the only really tricky case is where you use derived values when the "owning" GC thing may be dead, like so:
{
JSString *str = GetMeSomeStringYo();
const jschar *chars = str->chars();
// Note, |str| is not "live" here, but the derived |chars| is!
// The conservative stack scanner won't see |chars| and know
// to keep |str| alive, so we should be anchoring |str|.
DoSomethingThatCanCauseGC();
return chars[0];
}
If you're using C the JS_AnchorPtr at the end of the functions should be enough. You are correct that the function has a nop implementation! The idea is that, so long as it's performing a call to a shared object symbol with the variable to keep alive as a parameter, the calling function will have to keep that value around in machine state in order to perform the do-nothing call. This is more sucky for perf than js::Anchor.
There's one potential trap in the unlikely case that you're statically linking against SpiderMonkey and have Link Time Optimization enabled: the cross-object call may be inlined with a null implementation, eliminating liveness of the variable, in which case the same GC hazards may pop back up.

How to ensure a WebSocket message is JSON string, preventing JSON.parse error?

I've looked around for a suitable method to catch or prevent invalid JSON.parse calls, specifically in the case of WebSocket messages that don't involve type/catch block due to its performance hit.
I've almost fully moved my RESTful API to pure WebSocket API using JSON for communications. The only problem is, I can't figure out how to prevent JSON.parse from halting the app when a malformed message string is put through my onmessage function. All messages sent from the server are theoretically proper JSON that's been stringified, so the question also is, is this an edge case to worry about? Since the function thats used to send data from the serverside JSON stringifies before sending.
I'm using React and Redux with redux-thunk to open a WebSocket and add event listeners, so on a message the function below is being run.
function onMessage(msg) {
const data = JSON.parse(msg.data);
return {
type: data.type,
data: data.data
}
}
But this, of course, breaks if msg is not a valid JSON string then halting execution of the app.
So, without a try/catch block, is the only option to (somehow) ensure valid JSON is being sent? Or is this an edge case I shouldn't be worried about.
EDIT
This may not be such a big issue for client side since all messages are coming from a centralized point (the server), though on the other hand, quite a big issue for the server, seeing it's possible it to receive messages that have not been sent from the application.
Is try/catch really the devil it's made out to be? Since the only thing I can think of is to create a regex check, which in itself would end up becoming quite complicated.

don't involve type/catch block due to its performance hit.
Forget the myths. You want to catch an exception, like the one from JSON.parse, you use a try/catch block. It's that simple and not a significant performance hit. Of course you could also write your own logic to validate JSON strings (not with regex!), but that's gonna be a complete parser which just doesn't use exceptions to signal malformed input - and much slower than the native function.
Is this an edge case to worry about?
On the client, hardly. You're controlling the server and making sure to send only valid JSON strings. If you don't, I'd worry much more about the server than about a few clients crashing. The users will most likely reload the page and continue.
Though on the other hand, quite a big issue for the server, seeing it's possible it to receive messages that have not been sent from the application.
Yes. On the server you absolutely need to worry about malformed input. If sending invalid JSON makes your server crash, that's really bad.

We Keep Coding

JavaScript is the programming language of the Web.