NodeJS: understanding nonblocking / event queue / single thread

NodeJS: understanding nonblocking / event queue / single thread - javascript

I'm new to Node and try to understand the non-blocking nature of node.
In the image below I've created a high level diagram of the request.
As I understand, all processes from a single user for a single app run on a single thread.
What I would like to understand is the how the logic of the event loop fits in this diagram. Is the event loop the same as the processor pipeline where instructions are queued?
Imagine that we load an app page into RAM that creates a stream to read from by the program:
readstream.on('data', function(data) {});
Instructions for creating the readstream and waiting for data to occur: does this instruction "hang" in a register (waiting for the I/O to finish) in the processor whereas in a multithreaded environment, the processor just doesn't take new instructions from the RAM until the result of the previous I/O request has been returned to the RAM?
Or is the way I see this entirely/partially wrong?
Just a supplementary (related, perhaps stupid) question: run different users on different threads on the server and isn't the single threaded benefit only for a single user?
I'm new to this type of detail, so excuse me if this question doesn't entirely make sense to you. But understanding this seems essential for me before moving forward.

Event-driven non-blocking I/O relies on the fact that modern operating systems have a 'select' method that performs polling at the O/S level (not wasting CPU cycles). The select method allows you to register callbacks for certain I/O events. This tends to be much more efficient than the 'thread-per-connection' model commonly used in thread enabled languages. For more info, do a 'man select' on a Unix/Linux OS.

Threads and I/O have to do with operating system implementation and services, not CPU architecture.
Operations that involve input/output devices of any kind — mass storage, networks, serial ports, etc. — are structured as requests from the CPU to an external device that, by one of several possible mechanisms, are later satisfied.
On top of that reality, operating systems provide alternative programming models. In one model, the factual nature of input/output operations are essentially disguised such that executing programs are given an API that appears to be synchronous. In a C program, a call to the write() system call will cause the entire process to delay until the operation has completed.
Another programming model more closely exposes the asynchronous reality of the system. That's what Node uses. Operating systems provide ways to initiate long-duration asynchronous operations, along with ways for a process to either check for results or to block and wait for results. In Node, the runtime system can juggle lots of separate operations because the entire model is based on code running in response to events. An event can be a synthetic thing (such as the "event" of a Node module being loaded and run initially), or it can be something that's a result of actual asynchronous external events. In the case of input/output operations, the Node runtime waits for operating system notification and translates that into an event that causes some JavaScript code to run.

Related

Node.js single-thread mechanism

I learnt Node.js is single-threaded and non-blocking. Here I saw a nice explanation How, in general, does Node.js handle 10,000 concurrent requests?
But the first answer says
The seemingly mysterious thing is how both the approaches above manage to run workload in "parallel"? The answer is that the database is threaded. So our single-threaded app is actually leveraging the multi-threaded behaviour of another process: the database.
(1) which gets me confused. Take a simple express application as an example, when I
var monk = require('monk');
var db = monk('localhost:27017/databaseName');
router.get('/QueryVideo', function(req, res) {
var collection = db.get('videos');
collection.find({}, function(err, videos){
if (err) throw err;
res.render('index', { videos: videos })
});
});
And when my router responds multi requests by doing simple MongoDB query. Are those queries handled by different threads? I know there is only one thread in node to router client requests though.
(2) My second question is, how does such a single-threaded node application ensure security? I don't know much about security but it looks like multi requests should be isolated (at least different memory space?) Even though multi-threaded applications can't ensure security, because they still share many things. I know this may not be a reasonable question, but in today's cloud service, isolation seems to be an important topic. I am lost in topics such as serverless, wasm, and wasm based serverless using Node.js environment.
Thank you for your help!!

Since you asked about my answer I guess I can help clarify.
1
For the specific case of handling multiple parallel client requests which triggers multiple parallel MongoDB queries you asked:
Are those queries handled by different threads?
On node.js since MongoDB connects via the network stack (tcp/ip) all parallel requests are handled in a single thread. The magic is a system API that allows your program to wait in parallel. Node.js uses libuv to select which API to use depending on which OS at compile time. But which API does not matter. It is enough to know that all modern OSes have APIs that allow you to wait on multiple sockets in parallel (instead of the usual waiting for a single socket in multiple threads/processes). These APIs are collectively called asynchronous I/O APIs.
On MongoDB.. I don't know much about MongoDB. Mongo may be implemented in multiple threads or it may be singlethreaded like node.js. Disk I/O are themselves handled in parallel by the OS without using threads but instead use I/O channels (eg, PCI lanes) and DMA channels. Basically both threads/processes and asynchronous I/O are generally implemented by the OS (at least on Linux and Mac) using the same underlying system: OS events. And OS events are just functions that handle interrupts. Anyway, this is straying quite far from the discussion about databases..
I know that MySQL and Postgresql are both multithreaded to handle parsing the SQL query loop (query processing in SQL are basically operations that loop through rows and filter the result - this requires both I/O and CPU which is why they're multithreaded)
If you are still curious how computers can do things (like wait for I/O) without the CPU executing a single instruction you can check out my answers to the following related questions:
Is there any other way to implement a "listening" function without an infinite while loop?
What is the mechanism that allows the scheduler to switch which threads are executing?
2
Security is ensured by the language being interpreted and making sure the interpreter does not have any stack overflow or underflow bugs. For the most part this is true for all modern javascript engines. The main mechanism to inject code and execute foreign code via program input is via buffer overflow or underflow. Being able to execute foreign code then allows you to access memory. If you cannot execute foreign code being able to access memory is kind of moot.
There is a second mechanism to inject foreign code which is prevalent in some programming language cultures: code eval (I'm looking at you PHP!). Using string substitution to construct database queries in any language open you up to sql code eval attack (more commonly called sql injection) regardless of your program's memory model. Javascript itself has an eval() function. To protect against this javascript programmers simply consider eval evil. Basically protection against eval is down to good programming practices and Node.js being open source allows anyone to look at the code and report any cases where code evaluation attack is possible. Historically Node.js has been quite good in this regards - so your main guarantee about security from code eval is Node's reputation.

(1) The big picture goes like this; for nodejs there are 2 types of thread: event (single) and workers (pool). So long you don't block the event loop, after nodejs placed the blocked I/O call to worker thread; nodejs goes on to service next request. The worker will place the completed I/O back to the event loop for next course of action.
In short the main thread: "Do something else when it need to wait, come back and continue when the wait is over, and it does this one at a time".
And this reactive mechanism has nothing to do with thread running in another process (ie database). The database may deploy other type of thread management scheme.
(2) The 'memory space' in your question is in the same process space. A thread belongs to a process (ie Express app A) never run in other process (ie Fastify app B) space.

Why async/await performs better than threads if it is just a wrapper around them?

This topic was on my mind for a long time.
Let's assume we have a typical web server, one in Node.js and the other in Java(or any other language with threads).
Why would node perform better (handle more IO/network based requests per second) than a java server just because it uses async/await? Isn't it just a syntatic sugar that utilizes the same threads java/c#/c++ use behind the scenes?

There is no reason to expect Node to be faster than a server written in Java. Why do you think it might be?
It seems the other answers here (so far) are explaining the benefits of asynchronous programming in JS compared to single-threaded synchronous operations -- that's obvious, but not the question.
The key point everyone agrees on is that certain operations are inherently slow (e.g.: waiting for network requests, waiting for disk/database access), and it's efficient to let the CPU do something else while such operations are in flight. Using several threads in your application is one well-established way to do that; but of course that's only possible in languages that give you threads. Many traditional server implementations (in Java, C, C++, ...) use one thread per request (or, equivalently, a thread pool to distribute incoming requests over). These threads can block waiting for, say, the database -- that's okay, the operating system will put the waiting thread to sleep and let the CPU work on another thread (handling another request) in the meantime. The end result is fairly similar to what you get with Node.
JavaScript, of course, doesn't make threads available to the programmer. But instead, it has this concept of scheduling requests with the JavaScript engine and providing a callback to be invoked upon completion of the request. That's how the overall system behaves similarly to a traditional threaded programming language: user code can say "do this stuff, then schedule a database access, and when the result is available, continue with this [callback] code here", and while waiting for the database request, the CPU gets to execute some other code. What you want to avoid is the CPU sitting around idly waiting while there is other work waiting for the CPU to have time for it, and both approaches (Java threads and JavaScript callbacks) accomplish that.
Finally, async/await (just like Promises) are indeed just syntactic sugar that make it easier to write callback-based code. Code using async/await isn't any faster than old-style code using callbacks directly, just prettier and less error-prone. It also isn't any faster than a (well-written) Java-based server.
Node.js is popular because it's convenient to use the same language for the client and server parts of an app. From a performance point of view, it's not better than traditional alternatives, it's just also not worse (or at least not much; in practice how efficiently you design your app matters more than whether you implement it in Java or JavaScript). Don't fall for the hype :-)

Asynchrony(asyn/await) is essential for activities that are potentially blocking, such as when your application accesses the web. Access to a web resource sometimes is slow or delayed. If such an activity is blocked within a synchronous process, the entire application must wait.
In an asynchronous process (thread), the application can continue with other work that doesn't depend on the web resource until the potentially blocking task finishes.
https://learn.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-2012/hh191443(v=vs.110)?redirectedfrom=MSDN#threads Though you should understand the differences between threads and async/await programming.

In regards to Asynchronous programming, Usability is the key. For instance, one request can be split up into smaller tasks i.e. fetching into internal results, reading, writing, establishing connections etc... thus, half the time gets wasted waiting on dependent tasks. Asynchronous models use this time to handle other incoming requests keeping a callback function, registering in a queue saving the state and becomes available for another request. Thus, they can handle more requests.
Understand more on handling requests: https://www.youtube.com/watch?v=cCOL7MC4Pl0

Performance of NodeJS with large amount of callbacks

I am working on a NodeJS application. There is a specific RESTful API (GET) that, when triggered by the user, it requires the server to do about 10-20 network operations to pull information from different sources. All these network operations are async callbacks, and once they ALL are finished, the result is consolidated by the nodejs app and sent back to the client. All these operations are started in parallel via async.map function.
I just want to understand, since nodejs is single threaded, and it does not make use of multi-core machines (at least not without clustering), how does node scale when it has many callbacks to process? Does the actual processing of callbacks depend on node's single thread being idle, or are callbacks processed in parallel as well as the main thread?
The reason why I ask is, I see the performance of my 20 callbacks deteriorate from the first callback to the last one. For example, the first network operation (out of the 10-20) takes 141ms to complete, whereas the last one takes about 4 seconds (measured as the time from when the function is executed, until the callback of the function returns a value or an error). They are all the same network operation hitting the same data source, so the data source is not the bottleneck). I know for a fact that the data source takes no more than 200ms to respond to a single request.
I found this thread, so it looks to me that the one single thread needs to address all callbacks AND new requests coming up.
So my question is, for operations that will trigger many callbacks, what is the best practice in optimizing their performance?

For network operations node.js is effectively single threaded. However there is a persistent misunderstanding that handling I/O requires constant CPU resource. The core of your question boil down to:
Does the actual processing of callbacks depend on node's single thread being idle, or are callbacks processed in parallel as well as the main thread?
The answer is yes and no. Yes, callbacks are only executed when the main thread is idle. No, the "processing" is not done when thread is idle. To be specific: there is no "processing" - it takes zero CPU time for node to "process" thousands of callbacks if what you mean by "process" is waiting.
How asynchronous I/O works (in any programming language)
Hardware
If we really need to understand how node (or browser) internals work we must unfortunately first understand how computers work - from the hardware to the operating system. Yes, this is going to be a deep dive so bear with me..
It all began with the invention of interrupts..
It was a great invention, but also a Box of Pandora - Edsger Dijkstra
Yes, the quote above is from the same "Goto considered harmful" Dijkstra. From the very beginning introducing asynchronous operation to computer hardware was considered a very hard topic even for some of the legends in the industry.
Interrupts was introduced to speed up I/O operations. Rather than needing to poll some input with software (taking CPU time away from useful work) the hardware will send a signal to the CPU to tell it an event has occurred. The CPU will then suspend the currently running program and execute another program to handle the interrupt - thus we call these functions interrupt handlers. And the word "handler" has stuck all the way up the stack to GUI libraries which call callback functions "event handlers".
If you've been paying attention you will notice that this concept of an interrupt handler is actually a callback. You configure the CPU to call a function at some later time when an event happens. So even callbacks are not a new concept - it's way older than C.
OS
Interrupts make modern operating systems possible. Without interrupts there would be no way for the CPU to temporarily stop your program to run the OS (well, there is cooperative multitasking, but let's ignore that for now). How an OS works is that it sets up a hardware timer in the CPU to trigger an interrupt and then it tells the CPU to execute your program. It is this periodic timer interrupt that runs your OS. Apart form the timer, the OS (or rather device drivers) sets up interrupts for I/O. When an I/O event happens the OS will take over your CPU (or one of your CPU in a multi-core system) and checks against its data structure which process it needs to execute next to handle the I/O (this is called preemptive multitasking).
So, handling network connections is not even the job of the OS - the OS just keeps track of connections in it's data structures (or rather, the networking stack). What really handles network I/O is your network card, your router, your modem, your ISP etc. So waiting for I/O takes zero CPU resources. It just takes up some RAM to remember which program owns which socket.
Processes
Now that we have a clear picture of this we can understand what it is that node does. Various OSes have various different APIs that provide asynchronous I/O - from overlapped I/O on Windows to poll/epoll on Linux to kqueue on BSD to the cross-platform select(). Node internally uses libuv as a high-level abstraction over these APIs.
How these APIs work are similar though the details differ. Essentially they provide a function that when called will block your thread until the OS sends an event to it. So yes, even non-blocking I/O blocks your thread. The key here is that blocking I/O will block your thread in multiple places but non-blocking I/O blocks your thread in only one place - where you wait for events.
What this allows you to do is design your program in an event-oriented manner. This is similar to how interrupts allow OS designers to implement multitasking. In effect, asynchronous I/O is to frameworks what interrupts are to OSes. It allows node to spend exactly 0% CPU time to process (wait for) I/O. This is what makes node fast - it's not really faster but does not waste time waiting.
Callback processing
With the understanding we now have of how node handles network I/O we can understand how callbacks affect performance.
There is zero CPU penalty having thousands of callbacks waiting
Of course, node still needs to maintain data structures in RAM to keep track of all the callbacks so callbacks do have memory penalty.
Processing the return value from callbacks is done in a single thread
This has some advantages and some drawbacks. It means node does not have to worry about race conditions and thus node does not internally use any semaphores or mutexes to guard data access. The disadvantage is that any CPU intensive javascript will block all other operations.
You mention that:
I see the performance of my 20 callbacks deteriorate from the first callback to the last one
The callbacks are all executed sequentially and synchronously in the main thread (only the waiting is actually done in parallel). Thus it could be that your callback is doing some CPU intensive calculations and the total execution time of all callbacks is actually 4 seconds.
However, I rarely see this kind of issue for that number of callbacks. It's still possible, I still don't know what you're doing in your callbacks. I just think it's unlikely.
You also mention:
until the callback of the function returns a value or an error
One likely explanation is that your network resource cannot handle that many simultaneous connections. You may not think it's much since it's only 20 connections but I've seen plenty of services that would crash at 10 requests/second. The problem is all 20 requests are simultaneous.
You can test this by taking node out of the picture and use a command line tool to send 20 simultaneous requests. Something like curl or wget:
# assuming you're running bash:
for x in `seq 1 20`;do curl -o /dev/null -w "Connect: %{time_connect} Start: %{time_starttransfer} Total: %{time_total} \n" http://example.com & done
Mitigation
If it turns out that the issue is doing the 20 requests simultaneously is stressing the other service what you can do is limit the number of simultaneous requests.
You can do this by batching your requests:
async function () {
let input = [/* some values we need to process */];
let result = [];
while (input.length) {
let batch = input.splice(0,3); // make 3 requests in parallel
let batchResult = await Promise.all(batch.map(x => {
return fetchNetworkResource(x);
}));
result = result.concat(batchResult);
}
return result;
}

node.js I/O non-blocking - understanding when it is most beneficial

After reading about event loops and how async works in node.js, this is my understanding of node.js:
Node actually runs processes one at a time and not simultaneously.
Node really shines when multiple databse I/O tasks are called.
It runs faster (than blocking I/O) because it doesn't wait for the response of one call before dealing with the next call. And while dealing with the other call, when the result of the first call arrives, it "gets back to it", basically going back and forth crossing calls and callbacks, without leaving the OS process idle, as opposed to what blocking I/O does. Please correct me if I'm wrong.
But here's my question:
Non-blocking I/O seems to be faster than blocking I/O only if the entity (server/process/thread?) that handles the request sent by node, is not the node server itself.
What would be the cases when the sever handling the request is the same server making the request? If my first bullet is correct, in this case a blocking I/O will work faster than non-blocking if it uses different threads for the task?
Would file compression be an example to such I/O task that works faster on multithreaded blocking I/O?

The main benefit of non-blocking operations is that a relatively heavyweight CPU thread is not kept busy while the server is waiting for something to happen elsewhere (networking, disk I/O, etc...). This means that many different requests can be "in-flight" with only the single CPU thread and no thread is stuck waiting for I/O. A burden is placed back on the developer to write async-friendly code and to use async I/O operations, but in a heavy I/O bound operation, there can be a real benefit to server scalability. The single thread model also really simplifies access to shared resources since there is far, far less opportunity for threading conflicts, deadlocks, etc... This can result in fewer hard-to-find thread synchronization bugs that tend to only nail your server at the worst time (e.g. when it's busy).
Yes, non-blocking I/O only really helps if the agent handling the I/O operation is not node.js itself because the whole point of non-blocking I/O in node is that node is free to use its single thread to go do other things while the I/O operation is running and if it's node that is serving the I/O operation then that wouldn't be true.
Sorry, but I don't understand the part of your question about file compression. File compression takes a certain amount of CPU, no matter who handles it and there are a bunch of different considerations if you were trying to decide whether to handle it inside of node itself or in an outside process (running a different thread). That isn't a simple question. I'd probably start with using whatever code I already had for the compression (e.g. use node code if that's what you had or an external library/process if that's what you had) and only investigate a different option if you actually ran into a performance or scalability issue or knew you had an issue.
FYI, a simple mechanism for handling compression would be to spool the uncompressed data to files in a temporary directory from your node.js app and then have another process (which could be written in any system, even include node) that just looks for files in the temporary directory to which it applies the compression and then does something more permanent with the resulting compressed data.

how node.js server is better than thread based server

Node.js server is works on event based models where callback functions are supported. But I am not able to understand how is it better than traditional thread based servers where threads wait for system IO. In case of thread based model, when a thread needs to wait for IO, it gets preempted so doesn't consume CPU cycles hence doesn't contribute to wait time.
How Node.js improves wait time?

when a thread needs to wait for IO, it gets preempted
Actually, it's not preempted. Preemption is something completely different. What happens is that the thread is blocked.
For an event based model something similar happens. Event based interpreters are basically state machines. Only, the state machine is abstracted away and is not visible to the user. When something is waiting for an event it passes the control back to the interpreter. When the interpreter has nothing else to process it blocks itself waiting for I/O. Only, unlike traditional threading code the interpreter waits for multiple I/O.
What's happening at the C level is that the interpreter is using something like select(), poll(), epoll() and friends (depends on the OS and library installed) to do the blocking and waiting for I/O.
Now, why does a select()/poll() based mechanism generally perform better? Actually, 'generally' here depends on what you mean. A select() based server executes all code in a single process/thread. The biggest performance gain from this is that it avoids context switching - every time the OS transfers control over from one thread to another it has to save all the relevant registers, memory map, stack pointers, FPU context etc. so that the other thread can resume execution where it left off. The overhead of doing this can be quite significant.
In fact, there is a historical example of how extreme the overhead can be. Back in the early 2000s someone started benchmarking web servers. To the surprise of everyone, tclhttpd outperformed Apache for serving static files. Now, tcl is not only an interpreted language, but back in 2000 it was a very slow interpreted language because it didn't have a seperate compilation phase (it sort of does now). Tcl scripts are interpreted directly in string form making it around 400x slower than C. Apache is obviously written in C so what's making tclhttpd faster?
It turned out that tclhttpd is event based running only on a single thread while Apache was multithreaded. The overhead of constant thread switching turned out to give tclhttpd enough advantage to perform better than Apache.
Of course, there is always a compromise. A single threaded server like tclhttpd or node.js cannot take advantage of multiple CPUs. Back in the early 2000s multiple CPUs were uncommon. These days they are almost default. Not to mention that most CPUs are also hyperthreaded (hyperthreading adds hardware to the CPU to make context switching cheap).
The best servers these days have learned from history and are a combination of both. Apache2, and Nginx use therad pools: they are multithreaded but each thread serves more than a single connection. This is a hybrid of the two approaches but is more complex to manage.
Read the following article for a more in-depth discussion on this topic: The C10K problem

Threads are relatively heavy-weight objects that have a resource footprint extending all the way into the kernel. When you park a thread in a blocking syscall or on a mutex or condition variable, you are tying up all those resources but doing nothing. Now the OS has to find more resources so your program can create another thread... Then you idle them too. It doesn't take long before the OS is struggling to scavenge more resources for your program to waste.
CPU time is just one small part of he bigger picture. :-)

Simply put:
In a threaded server, no matter how many threads you have, you can always have that many threads waiting for IO.
In node, no matter how many IO operations are pending, you always have your event loop ready to do the next thing.

When having a lot of threads you are going to have a lot of context switching which is going to be expensive. You want have this overhead when using node.js's Event loop
Context Switch
A context switch is the
computing process of storing and
restoring state (context) of a CPU so
that execution can be resumed from the
same point at a later time.
Event loop
In computer science, the event loop,
message dispatcher, message loop or
message pump is a programming
construct that waits for and
dispatches events or messages in a
program.

I think you are full of myths regarding to threads and cost of context switching.
Discover yourself the truth.

We Keep Coding

JavaScript is the programming language of the Web.