Node.js reliability for large application

Node.js reliability for large application - javascript

I am new to Node.js and am currently questioning its reliability.
Based on what I've seen so far, there seems to be a major flaw: any uncaught error/exceptions crashes the server. Sure, you can try to bullet-proof your code or put try/catch in key areas, but there will almost always be bugs that slip through the crack. And it seems dangerous if one problematic request could affect all other requests. There are 2 workarounds that I found:
Use daemon or module like forever to automatically restart the server when it crashes. The thing I don't like about this is that the server is still down for a second or two (for a large site, that could be hundreds (of thousands?) of request).
Catch uncaught exceptions using process.on('uncaughtException'). The problem with this approach (as far as I know) is that there is no way to get a reference to the request that causes the exception. So that particular request is left hanging (user sees loading indicator until timeout). But at least in this case, other non-problematic requests can still be handled.
Can any Node.js veteran pitch in?

For automatic restarts and load-balancing, I'd suggest you check out Learnboost's up balancer.
It allows you to reload a worker behind the load-balancer without dropping any requests. It stops directing new requests towards the worker, but for existing requests that are already being served, it provides workerTimeout grace period to wait for requests to finish before truly shutting down the process.
You might adapt this strategy to be also triggered by the uncaughtException event.

You have got full control of the base process, and that is a feature.
If you compare Node to an Apache/PHP setup the latter is really just equivalent to having a simple Node server that sends each incoming request to it's own process which is terminated after the request has been handled.
You can make that setup in Node if you wish, and in many cases something like that is probably a good idea. The great thing about Node is that you can break this pattern, you could for instance have the main process or another permanent process do session handling before a request is passed to it's handler.
Node is a very flexible tool, that is good if you need this flexibility, but it takes some skill to handle.

Exceptions don't crash the server, they raise exceptions.
Errors in node.js that bring down the entire process are a different story.
Your best bet (which you should do with any technology), is just test it out with your application as soon as possible to see if it fits.

An uncaught exception will, if not caught, crash the server. Something like calling a misspelled function. I use process.on('uncaughtException') to capture such exceptions. If you use this then, yes, the error sent to process.on('uncaughtException') is less informative.
I usually include a module like nomnom to allow for command-line flags. I include one called --exceptions which, when set, bypasses process.on('uncaughtException'). Basically, if I see that uncaught exceptions are happening then I, in development, start up the app with --exceptions so that when that error is raised it will not be captured, which causes Node to spit out the stack trace and then die. This tells you what line it happened on, and in what file.
Capturing the exceptions is one way to deal with it. But, like you said, that means that if an error happens it may result in users not receiving responses, et cetera. I would actually recommend letting the error crash the server. (I use process.on('uncaughtException') in apps, not webservers). And using forever. The fact is that it is likely better for the webserver to crash and then expose what you need to fix.
Let's say you used PHP instead of Node. PHP does not abruptly crash the server (since it doesn't really serve). It spits out really ugly errors. Sure, it doesn't result in a whole server going down and then having to come back up. Nobody wants their clients to have any downtime. But it also means that a problem will persist and will be less noticeable. We've all seen sites that have said errors, and they don't get patched very fast. If such a bug were to take everything down for one small blip (which honestly its not all that bad in the larger picture) then it would surely call attention to itself. You would see it happen and would track that bug down.
The fact is that bugs will exist in any system, independent of language or platform. And it is arguably better for them to be fatal in order for you to know they happened. And over time it causes you to become more aware of how these error occur. I don't know about you, but I know a lot of PHP devs who make the same common mistakes time after time.

Related

Error 504, avoid it with some data passing from server to client?

I'm developing an app that should receive a .CSV file, save it, scan it, and insert data of every record into DB and at the end delete the file.
With a file with about 10000 records there aren't problems but with a larger file the PHP script is correctly runned and all data are saved into DB but is printed ERROR 504 The server didn't respond in time..
I'm scanning the .CSV file with the php function fgetcsv();.
I've already edit settings into php.ini file (max execution time (120), etc..) but nothing change, after 1 minute the error is shown.
I've also try to use a javascript function to show an alert every 10 seconds but also in this case the error is shown.
Is there a solution to avoid this problem? Is it possible pass some data from server to client every tot seconds to avoid the error?
Thank's

Its typically when scaling issues pop up when you need to start evolving your system architecture, and your application will need to work asynchronously. This problem you are having is very common (some of my team are dealing with one as I write) but everyone needs to deal with it eventually.
Solution 1: Cron Job
The most common solution is to create a cron job that periodically scans a queue for new work to do. I won't explain the nature of the queue since everyone has their own, some are alright and others are really bad, but typically it involves a DB table with relevant information and a job status (<-- one of the bad solutions), or a solution involving Memcached, also MongoDB is quite popular.
The "problem" with this solution is ultimately again "scaling". Cron jobs run periodically at fixed intervals, so if a task takes a particularly long time jobs are likely to overlap. This means you need to work in some kind of locking or utilize a scheduler that supports running the job sequentially.
In the end, you won't run into the timeout problem, and you can typically dedicate an entire machine to running these tasks so memory isn't as much of an issue either.
Solution 2: Worker Delegation
I'll use Gearman as an example for this solution, but other tools encompass standards like AMQP such as RabbitMQ. I prefer Gearman because its simpler to set up, and its designed more for work processing over messaging.
This kind of delegation has the advantage of running immediately after you call it. The server is basically waiting for stuff to do (not unlike an Apache server), when it get a request it shifts the workload from the client onto one of your "workers", these are scripts you've written which run indefinitely listening to the server for workload.
You can have as many of these workers as you like, each running the same or different types of tasks. This means scaling is determined by the number of workers you have, and this scales horizontally very cleanly.
Conclusion:
Crons are fine in my opinion of automated maintenance, but they run into problems when they need to work concurrently which makes running workers the ideal choice.
Either way, you are going to need to change the way users receive feedback on their requests. They will need to be informed that their request is processing and to check later to get the result, alternatively you can periodically track the status of the running task to provide real-time feedback to the user via ajax. Thats a little tricky with cron jobs, since you will need to persist the state of the task during its execution, but Gearman has a nice built-in solution for doing just that.
http://php.net/manual/en/book.gearman.php

How does Node.js use fewer threads to handle multiple connections?

I've got no problem with events and callbacks, synchrony/asynchrony, the call stack and the queue.
However, as I understand it, other servers make a new thread for each connection which contain both the blocking request and handler for the response of that request where as in node this handler would be passed to the main thread as a callback. The ability of this kind server to handle multiple requests is therefore limited by it's ability to create and switch between multiple threads.
When Node receives a blocking request it sends it into asynchrony land while it carries on processing the main thread. What happens in asynchrony land, doesn't a thread still need to be created to await the response for that request and then to sent the event to node event loop? If so, why isn't Node as limited by the server's ability to create and switch between threads? If not, what happens to the request?

I think there's some confusion over how the event loop actually works. NodeJS doesn't "receive a blocking request" and "send it into asynchrony land". It's asynchronous to begin with - unless you call a ...Sync() pattern function, EVERY call and EVERY operation is async. Confusingly, once you are inside your CODE, EVERY operation is synchronous.
It's a "cooperative multitasking" approach - all calls to the system are expected to "start the ball rolling" and return immediately, while your own code is suppose to do what it needs to do as quickly as possible and yield control back to the JSVM (by returning from your function).
To understand how this works when you're dealing with network communications, you need to go back in time to before threads really even existed. In the early days, if you had multiple network connections, your single-threaded process would have to put together a list of all the sockets it wanted information on (such as "has data arrived for me to read?"), and ask the OS if that was true by calling select(). This would a yes/no for each socket for each question. This was typically done in a while() loop that ran until the program was terminated. You would ask for a list of sockets with new data, read that data, do something with it, and then go back to sleep, over and over again.
NodeJS is far more sophisticated but this analogy works well for it. It has a main "event loop" that is constantly sleeping until there is work to do, then waking up and doing it.
Everything that you do comes from, or goes into, this channel. If you write data to a network socket, and ask to be notified (called back) when it's done, NodeJS passes your request to the operating system and then goes to sleep. You stop running. Your context is saved - all your local vars are saved. When the OS comes back and says "done!", NodeJS checks its list and sees you wanted to know about this, and calls your function, reloading your context so all your local vars are where you need them.
To be very clear, it is entirely possible that when the data is finished being written to the network, and the OS notification comes back for that, NodeJS is busy with other work! NodeJS won't "create a thread" to handle it - it'll ignore it completely until it gets some free time! It won't be lost... it just won't be handled "yet".
This drives programmers used to threading models nuts - it seems illogical that this constant state of never immediately responding to an incoming event "until it has a chance" could possibly be efficient. But software architectures are often deceiving. Threading models actually have fairly high overhead. CPU core counts aren't infinite - the entire computer as a whole is doing plenty of work all the time. Threads aren't free - just because you make one doesn't mean the CPU itself has time to do anything with it. And the overhead of thread creation and management often means an efficiency loss.
Old-school event-loop models eliminate this overhead. When things go badly like you have an infinite loop in your code, they can behave very badly - often locking up completely. But when things are going well they can actually be a lot faster, and many benchmarks have shown that well-written NodeJS modules can perform as well as or even better than similar modules in other languages.
In summary, the most common confusion in NodeJS is what "async" really means. A good way to think of it is that in threading models, programmers are expected to be "bad"/simplistic (write blocking code and just wait for things to return) and the core VM or OS is expected to be "good"/smart (tolerate this by making threads to handle async work). In NodeJS, programmers are expected to be "good"/sophisticated (write well-structured async code), allowing the JSVM to focus on what it does best and not need as much magic to make things work well. Well-used, NodeJS puts a lot of power in your hands.

Debugging node.js with Express Framework: Troubleshooting missing res.end

The Express node.js server application I'm currently developing will sometimes go off into the void and stops returning requests. I very much suspect somewhere in it I am doing something that is ultimately skipping a res.end that I need to be doing.
Does anyone have any tips or tricks on how to best go about looking for where I'm missing the res.end (assuming that's the problem)? Is there anyway to view the currently open requests in real time in the node event loop?
When things hang up, everything is still running in that Express will still accept incoming requests. And, these requests are logged to the console via express.logger just fine. But, the request is never returned to the client and no error is being thrown.
Thank you in advance for your thoughts.
Follow-up:
There doesn't seem to be much documentation on node's debug mode beyond the raw API docs. I did find this article which seems to provide a good overview of several options. But, there doesn't seem to be the kind of thing I was thinking of for watching the event loop, something a la vtop's process list.
More Follow-up:
Yesterday, Daniel Hood published Debugging in Node.js stating that, "The state of debugging node.js apps has always been a bit unfortunate..." So, perhaps my question is not so silly as to merit down votes as it might seem. Anyway, in his next article he promises to cover some tools for debugging asynchronous operations.

I posted the last article you mentioned. If you need to debug async stuff, I'll be going over things like long stack traces (modules such longjohn), the try-catch module, and things like zones/domains. Hopefully those links will be of some help (might be a week before I write part 2).
However, for your particular issue, have you tried setting a breakpoint at some known location and stepping through the process? This way you can just watch how the app is executed, and maybe find where some function is returning nothing when it should be returning the req/res object.
If it isn't very reproducible, maybe some simple logging would be helpful.

AJAX or Socket.IO make more sense in my situation?

I have been working on something using AJAX that sometimes requires a couple POST's per second. Each post returns a JSON object (generated by the PHP file being posted to, ~11,000 bytes) and on average the latency is between 30ms and 250ms depending on if i'm on wifi or wired, but every roughly 1/15 calls it spikes up to about 4000ms. I am trying to find a way around this, as of right now I see two options:
Throw a timeout on the AJAX call, and have it call a GET on fail (the POST should still go through, it's the return trip that always times out) or...
Cut the entire thing down, learn node.js so I can use websockets to potentially rectify this issue.
Either solution as far as I can see is based on WHY the original call is failing. If it is something wrong with the AJAX call, then a new GET should be likely to go through, and solve the issue. but if it is something with the server itself then logically the GET would just time out as well, it's an issue with the server, and I'm dead in the water.
Since I have no experience yet at all with websockets I was hoping for some feedback on the best action to take next. Thanks for the input.
If it would help, I can very likely reduce the returning payload to 1/15 the size with some sneaky coding. would that make an impact?

WebSockets are a great option! Actually working with SocketIO is pretty simple and has a shallow learning curve.
Because the connection stays open, your requests skips the DNS lookup and routing for lower latency. This mean much lower overhead for each POST request you make.
If you ever forsee pushing data to your users, WebSockets are the de facto way to do it. Ajax polling is going out of style.
That said, you would have to port your back end logic to JavaScript. You would have to change your deployment strategy to a server that supports Node apps. You would have to deal with learning a new environment -- can also have some overhead.
Before you explore Node, consider the drawbacks above. I think it is a great bit of technology, but I would also look into the following approaches, especially if you are pressed for time.
1/15 size reduction is totally worth. Actually, it is worth it in both cases.
Can you do any sort of batching with your POST requests from the client side? If subsequent requests rely on the results of the previous POST request, you cannot do this. In this case, I strongly suggest using WebSockets.
All in all, there are always tradeoffs. If you aren't pressed for time, given Node and SocketIO and whirl, they are becoming very prevalent web technologies and are worth learning.

Cut the entire thing down, learn node.js so I can use websockets to potentially rectify this issue.
There is no point doing things with wrong tools. If you need real-time communication, use servers which support it out-of-the box, like node.js (probably the simplest to get into from PHP).
Since I have no experience yet at all with websockets
Get some framework atop of the raw websockets, like primus or socket.io and good luck ;)

NodeJS & Socket.IO

I am facing a strange issue while using NodeJS and Socket.io.
Server which receive data via ZeroMQ. That work perfect.
For each message from ZeroMQ, I used sockets.volatile.emit to send that to all connected clients.
The issue arise only for large number of connected accounts (more than 100), it seems there is a queue on the sending to clients (client receive message in delay that keep increasing)
Note : Each connected client received each message from ZeroMQ, so basically for more client there is more data sent over the socket.IO.
Via Logs/Debug i know the receive from ZeroMQ has no delay and all works on that part. The emitting seems to have a queue or delay that keeps increasing.
The messages rate is 80 messages/sec for each client.
Note: NodeJS 0.10.20 and Socket.IO 0.9.16.
How can I control that to prevent client received old messages ?

Checkout this article it will tell you a lot of the basic mistakes and it's, about blocking the event loop which seems pretty similar to what your doing.
Maybe use tools such as: Debug and Blocked i think it would help solve you'r issue. Both to debug where you creating a bottleneck on performance & other basic issues.
Alternatively hook your node project up on PM2 and bind it to Keymetrics.IO this will give you a good view into your server and why it's running slow and why you make a performance bottleneck.
Its hard to solve your problem without code examples but here is 3 reasons why your app or you could create bottlenecks (maybe unknowingly):
Parsing a big json payload with the JSON.parse function.
Trying to do syntax highlighting on a big file on the backend (with something like Ace or highlight.js).
Parsing a big output in one go (such as the output of a git log command from a child process).
More info in the first article in section 2 called "Blocking the event loop"
A question related to yours, this one.
Wanna know more about the Event loop i can warmly direct you to a tread "How the single threaded non blocking IO model works in Node.js"
Here is a model of the Node.js Processing model, to see what happens on the event loop and its surroundings

If it turns out that you're not blocking the event loop in any terrible way, then you might be hitting the limits on what socket.io can handle for your specific application. If thats the case then you might consider scaling up your instances.
Check out this article for more information:
http://drewww.github.io/socket.io-benchmarking/

We Keep Coding

JavaScript is the programming language of the Web.