Performance of NodeJS with large amount of callbacks

Performance of NodeJS with large amount of callbacks - javascript

I am working on a NodeJS application. There is a specific RESTful API (GET) that, when triggered by the user, it requires the server to do about 10-20 network operations to pull information from different sources. All these network operations are async callbacks, and once they ALL are finished, the result is consolidated by the nodejs app and sent back to the client. All these operations are started in parallel via async.map function.
I just want to understand, since nodejs is single threaded, and it does not make use of multi-core machines (at least not without clustering), how does node scale when it has many callbacks to process? Does the actual processing of callbacks depend on node's single thread being idle, or are callbacks processed in parallel as well as the main thread?
The reason why I ask is, I see the performance of my 20 callbacks deteriorate from the first callback to the last one. For example, the first network operation (out of the 10-20) takes 141ms to complete, whereas the last one takes about 4 seconds (measured as the time from when the function is executed, until the callback of the function returns a value or an error). They are all the same network operation hitting the same data source, so the data source is not the bottleneck). I know for a fact that the data source takes no more than 200ms to respond to a single request.
I found this thread, so it looks to me that the one single thread needs to address all callbacks AND new requests coming up.
So my question is, for operations that will trigger many callbacks, what is the best practice in optimizing their performance?

For network operations node.js is effectively single threaded. However there is a persistent misunderstanding that handling I/O requires constant CPU resource. The core of your question boil down to:
Does the actual processing of callbacks depend on node's single thread being idle, or are callbacks processed in parallel as well as the main thread?
The answer is yes and no. Yes, callbacks are only executed when the main thread is idle. No, the "processing" is not done when thread is idle. To be specific: there is no "processing" - it takes zero CPU time for node to "process" thousands of callbacks if what you mean by "process" is waiting.
How asynchronous I/O works (in any programming language)
Hardware
If we really need to understand how node (or browser) internals work we must unfortunately first understand how computers work - from the hardware to the operating system. Yes, this is going to be a deep dive so bear with me..
It all began with the invention of interrupts..
It was a great invention, but also a Box of Pandora - Edsger Dijkstra
Yes, the quote above is from the same "Goto considered harmful" Dijkstra. From the very beginning introducing asynchronous operation to computer hardware was considered a very hard topic even for some of the legends in the industry.
Interrupts was introduced to speed up I/O operations. Rather than needing to poll some input with software (taking CPU time away from useful work) the hardware will send a signal to the CPU to tell it an event has occurred. The CPU will then suspend the currently running program and execute another program to handle the interrupt - thus we call these functions interrupt handlers. And the word "handler" has stuck all the way up the stack to GUI libraries which call callback functions "event handlers".
If you've been paying attention you will notice that this concept of an interrupt handler is actually a callback. You configure the CPU to call a function at some later time when an event happens. So even callbacks are not a new concept - it's way older than C.
OS
Interrupts make modern operating systems possible. Without interrupts there would be no way for the CPU to temporarily stop your program to run the OS (well, there is cooperative multitasking, but let's ignore that for now). How an OS works is that it sets up a hardware timer in the CPU to trigger an interrupt and then it tells the CPU to execute your program. It is this periodic timer interrupt that runs your OS. Apart form the timer, the OS (or rather device drivers) sets up interrupts for I/O. When an I/O event happens the OS will take over your CPU (or one of your CPU in a multi-core system) and checks against its data structure which process it needs to execute next to handle the I/O (this is called preemptive multitasking).
So, handling network connections is not even the job of the OS - the OS just keeps track of connections in it's data structures (or rather, the networking stack). What really handles network I/O is your network card, your router, your modem, your ISP etc. So waiting for I/O takes zero CPU resources. It just takes up some RAM to remember which program owns which socket.
Processes
Now that we have a clear picture of this we can understand what it is that node does. Various OSes have various different APIs that provide asynchronous I/O - from overlapped I/O on Windows to poll/epoll on Linux to kqueue on BSD to the cross-platform select(). Node internally uses libuv as a high-level abstraction over these APIs.
How these APIs work are similar though the details differ. Essentially they provide a function that when called will block your thread until the OS sends an event to it. So yes, even non-blocking I/O blocks your thread. The key here is that blocking I/O will block your thread in multiple places but non-blocking I/O blocks your thread in only one place - where you wait for events.
What this allows you to do is design your program in an event-oriented manner. This is similar to how interrupts allow OS designers to implement multitasking. In effect, asynchronous I/O is to frameworks what interrupts are to OSes. It allows node to spend exactly 0% CPU time to process (wait for) I/O. This is what makes node fast - it's not really faster but does not waste time waiting.
Callback processing
With the understanding we now have of how node handles network I/O we can understand how callbacks affect performance.
There is zero CPU penalty having thousands of callbacks waiting
Of course, node still needs to maintain data structures in RAM to keep track of all the callbacks so callbacks do have memory penalty.
Processing the return value from callbacks is done in a single thread
This has some advantages and some drawbacks. It means node does not have to worry about race conditions and thus node does not internally use any semaphores or mutexes to guard data access. The disadvantage is that any CPU intensive javascript will block all other operations.
You mention that:
I see the performance of my 20 callbacks deteriorate from the first callback to the last one
The callbacks are all executed sequentially and synchronously in the main thread (only the waiting is actually done in parallel). Thus it could be that your callback is doing some CPU intensive calculations and the total execution time of all callbacks is actually 4 seconds.
However, I rarely see this kind of issue for that number of callbacks. It's still possible, I still don't know what you're doing in your callbacks. I just think it's unlikely.
You also mention:
until the callback of the function returns a value or an error
One likely explanation is that your network resource cannot handle that many simultaneous connections. You may not think it's much since it's only 20 connections but I've seen plenty of services that would crash at 10 requests/second. The problem is all 20 requests are simultaneous.
You can test this by taking node out of the picture and use a command line tool to send 20 simultaneous requests. Something like curl or wget:
# assuming you're running bash:
for x in `seq 1 20`;do curl -o /dev/null -w "Connect: %{time_connect} Start: %{time_starttransfer} Total: %{time_total} \n" http://example.com & done
Mitigation
If it turns out that the issue is doing the 20 requests simultaneously is stressing the other service what you can do is limit the number of simultaneous requests.
You can do this by batching your requests:
async function () {
let input = [/* some values we need to process */];
let result = [];
while (input.length) {
let batch = input.splice(0,3); // make 3 requests in parallel
let batchResult = await Promise.all(batch.map(x => {
return fetchNetworkResource(x);
}));
result = result.concat(batchResult);
}
return result;
}

Related

Is there any other way to implement a "listening" function without an infinite while loop?

I've been thinking a lot about code and libraries like React that automatically, well, react to events as they happen, and was wondering about how all of that is implemented at the lower levels of C++ and machine code.
I can't seem to figure out any other way something like an event listener could be implemented with if not with a while loop running on another thread.
So is that all this is under the hood? Just while loops all the way down? Like, for example, RethinkDB, which advertises itself as a "realtime database" that has the repubsub library. Is the "subscribe" method just implemented using a while loop under the hood? I couldn't seem to find any information on that.
Like, sockets and stuff, too. When a computer is a "listening" on a port for a socket connection, is that computer just running something like:
while(1) {
if (connectionFound) {
return True;
}
}
Or is there something I'm missing?

I've written the answer to this question as an aside in another answer. Normally I'd close this question as a duplicate and point to that answer however this is a very different question. The other question asked about javascript performance. In order to answer that I had to first write the answer to this question.
As such I'm going to do something that's not normally supposed to be done: I'm going to copy part of my answer to another question. So here's my answer:
Actual events that javascript and node.js waits on requires no looping at all. In fact they require 0% CPU time.
How asynchronous I/O works (in any programming language)
Hardware
If we really need to understand how node (or browser) internals work we must unfortunately first understand how computers work - from the hardware to the operating system. Yes, this is going to be a deep dive so bear with me..
It all began with the invention of interrupts..
It was a great invention, but also a Box of Pandora - Edsger Dijkstra
Yes, the quote above is from the same "Goto considered harmful" Dijkstra. From the very beginning introducing asynchronous operation to computer hardware was considered a very hard topic even for some of the legends in the industry.
Interrupts was introduced to speed up I/O operations. Rather than needing to poll some input with software in an infinite loop (taking CPU time away from useful work) the hardware will send a signal to the CPU to tell it an event has occurred. The CPU will then suspend the currently running program and execute another program to handle the interrupt - thus we call these functions interrupt handlers. And the word "handler" has stuck all the way up the stack to GUI libraries which call callback functions "event handlers".
Wikipedia actually has a fairly nice article about interrupts if you're not familiar with it and want to know more: https://en.wikipedia.org/wiki/Interrupt.
If you've been paying attention you will notice that this concept of an interrupt handler is actually a callback. You configure the CPU to call a function at some later time when an event happens. So even callbacks are not a new concept - it's way older than C.
OS
Interrupts make modern operating systems possible. Without interrupts there would be no way for the CPU to temporarily stop your program to run the OS (well, there is cooperative multitasking, but let's ignore that for now). How an OS works is that it sets up a hardware timer in the CPU to trigger an interrupt and then it tells the CPU to execute your program. It is this periodic timer interrupt that runs your OS.
Apart form the timer, the OS (or rather device drivers) sets up interrupts for I/O. When an I/O event happens the OS will take over your CPU (or one of your CPU in a multi-core system) and checks against its data structure which process it needs to execute next to handle the I/O (this is called preemptive multitasking).
Everything form keyboard and mouse to storage to network cards use interrupts to tell the system that there is data to be read. Without those interrupts, monitoring all those inputs would take a lot of CPU resources. Interrupts are so important that they are often designed into I/O standards like USB and PCI.
Processes
Now that we have a clear picture of this we can understand how node/javascript actually handle I/O and events.
For I/O, various OSes have various different APIs that provide asynchronous I/O - from overlapped I/O on Windows to poll/epoll on Linux to kqueue on BSD to the cross-platform select(). Node internally uses libuv as a high-level abstraction over these APIs.
How these APIs work are similar though the details differ. Essentially they provide a function that when called will block your thread until the OS sends an event to it. So yes, even non-blocking I/O blocks your thread. The key here is that blocking I/O will block your thread in multiple places but non-blocking I/O blocks your thread in only one place - where you wait for events.
Check out my answer to this other question for a more concrete example of how this kind of API works at the C/C++ level: I know that callback function runs asynchronously, but why?
For GUI events like button click and mouse move the OS just keep track of your mouse and keyboard interrupts then translate them into UI events. This frees your software form needing to know the positions of buttons, windows, icons etc.
What this allows you to do is design your program in an event-oriented manner. This is similar to how interrupts allow OS designers to implement multitasking. In effect, asynchronous I/O is to frameworks what interrupts are to OSes. It allows javascript to spend exactly 0% CPU time to process (wait for) I/O. This is what makes asynchronous code fast - it's not really faster but does not waste time waiting.
This answer is fairly long as is so I'll leave links to my answers to other questions that's related to this topic:
Node js architecture and performance (Note: This answer provides a bit of insight on the relationship of events and threads - tldr: the OS implements threads on top of kernel events)
Does javascript process using an elastic racetrack algorithm
how node.js server is better than thread based server

"listeners" and "subscriptions" are just ideas. Everything can be abstracted with lambdas. Here is one possible implementation -
const logger =
// create a new "listener",
// send any data we "hear" to console.log
listen(console.log)
// implement so-called "listener"
const listen = (responder) =>
x => (responder(x), x)
// run it synchronously
logger(1)
logger(2)
// or asynchronously
setTimeout(_ => logger(3), 2000)
// 1
// 2
// some time later...
// 3
So let's say we have a

NodeJS: understanding nonblocking / event queue / single thread

I'm new to Node and try to understand the non-blocking nature of node.
In the image below I've created a high level diagram of the request.
As I understand, all processes from a single user for a single app run on a single thread.
What I would like to understand is the how the logic of the event loop fits in this diagram. Is the event loop the same as the processor pipeline where instructions are queued?
Imagine that we load an app page into RAM that creates a stream to read from by the program:
readstream.on('data', function(data) {});
Instructions for creating the readstream and waiting for data to occur: does this instruction "hang" in a register (waiting for the I/O to finish) in the processor whereas in a multithreaded environment, the processor just doesn't take new instructions from the RAM until the result of the previous I/O request has been returned to the RAM?
Or is the way I see this entirely/partially wrong?
Just a supplementary (related, perhaps stupid) question: run different users on different threads on the server and isn't the single threaded benefit only for a single user?
I'm new to this type of detail, so excuse me if this question doesn't entirely make sense to you. But understanding this seems essential for me before moving forward.

Event-driven non-blocking I/O relies on the fact that modern operating systems have a 'select' method that performs polling at the O/S level (not wasting CPU cycles). The select method allows you to register callbacks for certain I/O events. This tends to be much more efficient than the 'thread-per-connection' model commonly used in thread enabled languages. For more info, do a 'man select' on a Unix/Linux OS.

Threads and I/O have to do with operating system implementation and services, not CPU architecture.
Operations that involve input/output devices of any kind — mass storage, networks, serial ports, etc. — are structured as requests from the CPU to an external device that, by one of several possible mechanisms, are later satisfied.
On top of that reality, operating systems provide alternative programming models. In one model, the factual nature of input/output operations are essentially disguised such that executing programs are given an API that appears to be synchronous. In a C program, a call to the write() system call will cause the entire process to delay until the operation has completed.
Another programming model more closely exposes the asynchronous reality of the system. That's what Node uses. Operating systems provide ways to initiate long-duration asynchronous operations, along with ways for a process to either check for results or to block and wait for results. In Node, the runtime system can juggle lots of separate operations because the entire model is based on code running in response to events. An event can be a synthetic thing (such as the "event" of a Node module being loaded and run initially), or it can be something that's a result of actual asynchronous external events. In the case of input/output operations, the Node runtime waits for operating system notification and translates that into an event that causes some JavaScript code to run.

node.js I/O non-blocking - understanding when it is most beneficial

After reading about event loops and how async works in node.js, this is my understanding of node.js:
Node actually runs processes one at a time and not simultaneously.
Node really shines when multiple databse I/O tasks are called.
It runs faster (than blocking I/O) because it doesn't wait for the response of one call before dealing with the next call. And while dealing with the other call, when the result of the first call arrives, it "gets back to it", basically going back and forth crossing calls and callbacks, without leaving the OS process idle, as opposed to what blocking I/O does. Please correct me if I'm wrong.
But here's my question:
Non-blocking I/O seems to be faster than blocking I/O only if the entity (server/process/thread?) that handles the request sent by node, is not the node server itself.
What would be the cases when the sever handling the request is the same server making the request? If my first bullet is correct, in this case a blocking I/O will work faster than non-blocking if it uses different threads for the task?
Would file compression be an example to such I/O task that works faster on multithreaded blocking I/O?

The main benefit of non-blocking operations is that a relatively heavyweight CPU thread is not kept busy while the server is waiting for something to happen elsewhere (networking, disk I/O, etc...). This means that many different requests can be "in-flight" with only the single CPU thread and no thread is stuck waiting for I/O. A burden is placed back on the developer to write async-friendly code and to use async I/O operations, but in a heavy I/O bound operation, there can be a real benefit to server scalability. The single thread model also really simplifies access to shared resources since there is far, far less opportunity for threading conflicts, deadlocks, etc... This can result in fewer hard-to-find thread synchronization bugs that tend to only nail your server at the worst time (e.g. when it's busy).
Yes, non-blocking I/O only really helps if the agent handling the I/O operation is not node.js itself because the whole point of non-blocking I/O in node is that node is free to use its single thread to go do other things while the I/O operation is running and if it's node that is serving the I/O operation then that wouldn't be true.
Sorry, but I don't understand the part of your question about file compression. File compression takes a certain amount of CPU, no matter who handles it and there are a bunch of different considerations if you were trying to decide whether to handle it inside of node itself or in an outside process (running a different thread). That isn't a simple question. I'd probably start with using whatever code I already had for the compression (e.g. use node code if that's what you had or an external library/process if that's what you had) and only investigate a different option if you actually ran into a performance or scalability issue or knew you had an issue.
FYI, a simple mechanism for handling compression would be to spool the uncompressed data to files in a temporary directory from your node.js app and then have another process (which could be written in any system, even include node) that just looks for files in the temporary directory to which it applies the compression and then does something more permanent with the resulting compressed data.

Does async programming mean multi-threading?

lets talk about JavaScript code which has setInterval methods every 2 sec.
I also have a onblur animation event for some control.
In a case where onblur occurs (+ animation), I might get the setInterval function.
Question:
Does async programming mean multi-threading? (in any way?)

No. It means literally what it means-- asynchronous. Understanding the difference between asynchronous programming and thread-based programming is critical to your success as a programmer.
In a traditional, non-threaded environment, when a function must wait on an external event (such as a network event, a keyboard or mouse event, or even a clock event), the program must wait until that event happens.
In a multi-threaded environment, many individual threads of programming are running at the same time. (Depending upon the number of CPUs and the support of the operating system, this may be literally true, or it may be an illusion created by sophisticated scheduling algorithms). For this reason, multi-threaded environments are difficult and involve issues of threads locking each other's memory to prevent them from overrunning one another.
In an asychronous environment, a single process thread runs all the time, but it may, for event-driven reasons (and that is the key), switch from one function to another. When an event happens, and when the currently running process hits a point at which it must wait for another event, the javascript core then scans its list of events and delivers the next one, in a (formally) indeterminate (but probably deterministic) order, to the event manager.
For this reason, event-driven, asynchronous programming avoids many of the pitfalls of traditional, multi-threaded programming, such as memory contention issues. There may still be race conditions, as the order in which events are handled is not up to you, but they're rare and easier to manage. On the other hand, because the event handler does not deliver events until the currently running function hits an idle spot, some functions can starve the rest of the programming. This happens in Node.js, for example, when people foolishly do lots of heavy math in the server-- that's best shoved into a little server that node then "waits" to deliver the answer. Node.js is a great little switchboard for events, but anything that takes longer than 100 milliseconds should be handled in a client/server way.
In the browser environment, DOM events are treated as automatic event points (they have to be, modifying the DOM delivers a lot of events), but even there badly-written Javascript can starve the core, which is why both Firefox and Chrome have these "This script is has stopped responding" interrupt handlers.

A single threaded event loop is a good example of being asynchronous in a single threaded language.
The concept here is that you attach doLater callback handlers to the eventLoop. Then the eventLoop is just a while(true) that checks whether the specific timestamp for each doLater handler is met, and if so it calls the handler.
For those interested, here is a naive (and horribly inefficient toy) implementation of a single threaded event loop in JavaScript
This does mean that without any kind of OS thread scheduler access of your single thread, your forced to busy wait on the doLater callbacks.
If you have a sleep call you could just do sleep until the next doLater handler which is more efficient then a busy wait since you deschedule your single thread and let the OS do other things.

No asynchronous programming doesn't mean multithreading specifically.
For achieving multiple tasks at the same time we use multi-threading and other is event loop architecture in node js.
JavaScript is synchronous and single threaded and that is why node js is also single threaded but with event loop node do a non-blocking I/O operations. Node.js uses the libuv library that uses a fixed-sized thread pool that handles the execution of parallel tasks.
Thread is a sequence of code instructions and also the sub unit of process.
In multi-threading, processes have multiple threads.
In single threading, processes have single thread.

Only in the sense that it executes code haphazardly and risks race conditions. You will not get any performance benefits from using timeouts and intervals.
However, HTML5's WebWorkers do allow for real multithreading in the browser:
http://www.html5rocks.com/en/tutorials/workers/basics/

If there is a callback, something has to call it. The units of execution are threads & so, yes, some other thread has to call the callback, either directly or by queueing up some asynchronous procedure call to the initiating thread.

how node.js server is better than thread based server

Node.js server is works on event based models where callback functions are supported. But I am not able to understand how is it better than traditional thread based servers where threads wait for system IO. In case of thread based model, when a thread needs to wait for IO, it gets preempted so doesn't consume CPU cycles hence doesn't contribute to wait time.
How Node.js improves wait time?

when a thread needs to wait for IO, it gets preempted
Actually, it's not preempted. Preemption is something completely different. What happens is that the thread is blocked.
For an event based model something similar happens. Event based interpreters are basically state machines. Only, the state machine is abstracted away and is not visible to the user. When something is waiting for an event it passes the control back to the interpreter. When the interpreter has nothing else to process it blocks itself waiting for I/O. Only, unlike traditional threading code the interpreter waits for multiple I/O.
What's happening at the C level is that the interpreter is using something like select(), poll(), epoll() and friends (depends on the OS and library installed) to do the blocking and waiting for I/O.
Now, why does a select()/poll() based mechanism generally perform better? Actually, 'generally' here depends on what you mean. A select() based server executes all code in a single process/thread. The biggest performance gain from this is that it avoids context switching - every time the OS transfers control over from one thread to another it has to save all the relevant registers, memory map, stack pointers, FPU context etc. so that the other thread can resume execution where it left off. The overhead of doing this can be quite significant.
In fact, there is a historical example of how extreme the overhead can be. Back in the early 2000s someone started benchmarking web servers. To the surprise of everyone, tclhttpd outperformed Apache for serving static files. Now, tcl is not only an interpreted language, but back in 2000 it was a very slow interpreted language because it didn't have a seperate compilation phase (it sort of does now). Tcl scripts are interpreted directly in string form making it around 400x slower than C. Apache is obviously written in C so what's making tclhttpd faster?
It turned out that tclhttpd is event based running only on a single thread while Apache was multithreaded. The overhead of constant thread switching turned out to give tclhttpd enough advantage to perform better than Apache.
Of course, there is always a compromise. A single threaded server like tclhttpd or node.js cannot take advantage of multiple CPUs. Back in the early 2000s multiple CPUs were uncommon. These days they are almost default. Not to mention that most CPUs are also hyperthreaded (hyperthreading adds hardware to the CPU to make context switching cheap).
The best servers these days have learned from history and are a combination of both. Apache2, and Nginx use therad pools: they are multithreaded but each thread serves more than a single connection. This is a hybrid of the two approaches but is more complex to manage.
Read the following article for a more in-depth discussion on this topic: The C10K problem

Threads are relatively heavy-weight objects that have a resource footprint extending all the way into the kernel. When you park a thread in a blocking syscall or on a mutex or condition variable, you are tying up all those resources but doing nothing. Now the OS has to find more resources so your program can create another thread... Then you idle them too. It doesn't take long before the OS is struggling to scavenge more resources for your program to waste.
CPU time is just one small part of he bigger picture. :-)

Simply put:
In a threaded server, no matter how many threads you have, you can always have that many threads waiting for IO.
In node, no matter how many IO operations are pending, you always have your event loop ready to do the next thing.

When having a lot of threads you are going to have a lot of context switching which is going to be expensive. You want have this overhead when using node.js's Event loop
Context Switch
A context switch is the
computing process of storing and
restoring state (context) of a CPU so
that execution can be resumed from the
same point at a later time.
Event loop
In computer science, the event loop,
message dispatcher, message loop or
message pump is a programming
construct that waits for and
dispatches events or messages in a
program.

I think you are full of myths regarding to threads and cost of context switching.
Discover yourself the truth.

We Keep Coding

JavaScript is the programming language of the Web.