How does node.js implement non-blocking I/O?

How does node.js implement non-blocking I/O? - javascript

From here i have found that node.js implements non-blocking i/o model. But i don't understand how.
As javascript is single threaded. How can a single thread do i/o operations and simultaneously executing the further process.

It is true that operations such as sleep will be blocking the thread. But I/O events can indeed be asynchronous.
Node.js uses an event loop for this. An event loop is “an entity that handles and processes external events and converts them into callback invocations”
Whenever data is needed nodejs registers a callback and sends the operation to this event loop. Whenever the data is available the callback is called.
http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/ for more info

The I/O that is handled by node.js is multithreaded internally.
It is the programming interface that is single threaded and asynchronous.

Ryan Dahl: Original Node.js presentation at JSConf 2009 (Ryan is the creator of Node.js)
This video will answer your question.
The essence of JavaScript(v8) is event callbacks (button onclick="functions()" etc). That's how the I/O is multithreaded. We still have to write our code to not have blocking I/O by only writing callbacks; otherwise the code will wait on db.query responses and be blocking before executing the next line of code.

Related

How does node.js schedule asynchronous and synchronous tasks?

I know how node.js executes code asynchronously without blocking the main thread of execution by using the event-loop for scheduling the asynchronous tasks, but I'm not clear on how the main thread actually decides to put aside a piece of code for asynchronous execution.
(Basically what indicates that this piece of code should be executed asynchronously and not synchronously, what are the differentiating factors?)
And also, what are the asynchronous and synchronous APIs provided by node?

There is a mistake in your assumption when you ask:
what indicates that this piece of code should be executed asynchronously and not synchronously
The mistake is thinking that some code are executed asynchronously. This is not true.
Javascript (including node.js) executes all code synchronously. There is no part of your code that is executed asynchronously.
So at first glance that is the answer to your question: there is no such thing as asynchronous code execution.
Wait, what?
But what about all the asynchronous stuff?
Like I said, node.js (and javascript in general) executes all code synchronously. However javascript is able to wait for certain events asynchronously. There are no asynchronous code execution, however there is asynchronous waiting.
What's the difference between code execution and waiting?
Let's look an an example. For the sake of clarity I'll use pseudocode in a fake made-up language to remove any confusion from javascript syntax. Let's say we want to read from a file. This fake language supports both synchronous and asynchronous waiting:
Example1. Synchronously wait for the drive to return bytes form the file
data = readSync('filename.txt');
// the line above will pause the execution of code until all the
// bytes have been read
Example2. Asynchronously wait for the drive to return bytes from the file
// Since it is asynchronous we don't want the read function to
// pause the execution of code. Therefore we cannot return the
// data. We need a mechanism to accept the returned value.
// However, unlike javascript, this fake language does not support
// first-class functions. You cannot pass functions as arguments
// to other functions. However, like Java and C++ we can pass
// objects to functions
class MyFileReaderHandler inherits FileReaderHandler {
method callback (data) {
// process file data in here!
}
}
myHandler = new MyFileReaderHandler()
asyncRead('filename.txt', myHandler);
// The function above does not wait for the file read to complete
// but instead returns immediately and allows other code to execute.
// At some point in the future when it finishes reading all data
// it will call the myHandler.callback() function passing it the
// bytes form the file.
As you can see, asynchronous I/O is not special to javascript. It has existed long before javascript in C++ and even C libraries dealing with file I/O, network I/O and GUI programming. In fact it has existed even before C. You can do this kind of logic in assembly (and indeed that's how people design operating systems).
What's special about javascript is that due to it's functional nature (first-class functions) the syntax for passing some code you want to be executed in the future is simpler:
asyncRead('filename.txt', (data) => {
// process data in here
});
or in modern javascript can even be made to look like synchronous code:
async function main () {
data = await asyncReadPromise('filename.txt');
}
main();
What's the difference between waiting and executing code. Don't you need code to check on events?
Actually, you need exactly 0% CPU time to wait for events. You just need to execute some code to register some interrupt handlers and the CPU hardware (not software) will call your interrupt handler when the interrupt occurs. And all manner of hardware are designed to trigger interrupts: keyboards, hard disk, network cards, USB devices, PCI devices etc.
Disk and network I/O are even more efficient because they also use DMA. These are hardware memory readers/writers that can copy large chunks (kilobytes/megabytes) of memory from one place (eg. hard disk) to another place (eg. RAM). The CPU actually only needs to set up the DMA controller and then is free to do something else. Once the DMA controller completes the transfer it will trigger an interrupt which will execute some device driver code which will inform the OS that some I/O event has completed which will inform node.js which will execute your callback or fulfill your Promise.
All the above use dedicated hardware instead of executing instructions on the CPU to transfer data. Thus waiting for data takes 0% CPU time.
So the parallelism in node.js has more to do with how many PCIe lanes your CPU supports than how many CPU cores it has.
You can, if you want to, execute javascript asynchronously
Like any other language, modern javascript has multi-threading support in the form of webworkers in the browser and worker_threads in node.js. But this is regular multi-threading like any other language where you deliberately start another thread to execute your code asynchronously.

Does event loops polls for event completion or kernel/os notifies back?

When Node.js starts, it initializes the event loop, processes the provided input script which may make async API calls, schedule timers, or call process.nextTick(), then begins processing the event loop.
There are seven phases and each phase has its own event queue which is based on FIFO.
So application makes a request event, event demultiplexer gathers those requests and pushes to respective event queues.
For example, If my code makes two reqeusts one is setTimeOut() and another is some API Call, demultiplexer will push the first one in timer queue and other in poll queue.
But events are there, and loop watches over those queues and events, on completion in pushes the registered callback to the callstack where it is processed.
My question is,
1). Who handles events in event queue to OS?
2). Does event loop polls for event completion in each event queue or does OS notifies back?
3). Where and who decides whether to call native asyncrhonous API or handle over to a thread pool?
I am very verge of understanding this, I have been strugling a lot to grasp the concepts. There are a lot of false information about node.js event loop and how it handles asynchronous calls using one thread.
Please answer this questions if possible. Below are the references where I could get some better insight from.
https://github.com/nodejs/nodejs.org/blob/master/locale/en/docs/guides/event-loop-timers-and-nexttick.md
https://dev.to/lunaticmonk/understanding-the-node-js-event-loop-phases-and-how-it-executes-the-javascript-code-1j9
how does reactor pattern work in Node.js?
https://www.youtube.com/watch?v=PNa9OMajw9w&t=3s

Who handles events in event queue to OS?
How OS events work depends upon the specific type of event. Disk I/O works one way and Networking works a different way. So, you can't ask about OS events generically - you need to ask about a specific type of event.
Does event loop polls for event completion in each event queue or does OS notifies back?
It depends. Timers for example are built into the event loop and the head of the timer list is checked to see if it's time has come in each timer through the event loop. File I/O is handled by a thread pool and when a disk operation completes, the thread inserts a completion event into the appropriate queue directly so the event loop will just find it there the next time through the event loop.
Where and who decides whether to call native asynchronous API or handle over to a thread pool?
This was up to the designers of nodejs and libuv and varies for each type of operation. The design is baked into nodejs and you can't yourself change it. Nodejs generally uses libuv for cross platform OS access so, in most cases, it's up to the libuv design for how it handles different types of OS calls. In general, if all the OSes that nodejs runs on offer a non-blocking, asynchronous mechanism, then libuv and nodejs will use it (like for networking). If they don't (or it's problematic to make them all work similarly), then libuv will build their own abstraction (as with file I/O and a thread pool).
You do not need to know the details of how this works to program asynchronously in nodejs. You make a call and get a callback (or resolved promise) when its done, regardless of how it works internally. For example, nodejs offers some asynchronous crypto APIs. They happen to be implemented using a thread pool, but you don't need to know that in order to use them.

Is javascript itself synchronous and it's the environment that is asynchronous?

Native javascript is single threaded and synchronous. Only a few objects can run asynchronously and get added to the callback queue such as HTTP requests, timers and events. These asynchronous objects are a result of the actual javascript environment and not javascript itself. setTimeout() seems to be the go to for examples for asynchronous code. That function gets moved to the web API container and then eventually the callback queue. There doesn't seem to be a way to write asynchronous code in javascript that doesn't involve using objects that get moved to the Web API container. I can write my own custom objects with callbacks, but the most that will do is structure it to run in the proper order. You can never write javascript code that runs in parallel that doesn't rely on those objects.
This is my understanding of how it works, if I am wrong please correct me.

setTimeout and setInterval are just a convenient way to trigger some asynchronous behavior for an example. It is part of standard javascript library in all implementations, and not just part of the browser environment.
But all other sources of async code depend on some external process. When making an HTTP request, your javascript thread tells the browser to make a request (what headers, what url, etc). The browser, according to its own compiled internals, then formats the request, sends it, waits for a response, and eventually adds an item to javascript's event loop to be processed next time the event loop runs. File system access and database queries are two other common examples of async code that depends on external processes (the OS, and a database, respectively)
How javascript handles async code in a single threaded process is all down to this event loop. This event loop in psuedo code is basically this:
while (queue.waitForMessage()) {
queue.processNextMessage();
}
setTimeout tells the environment to pop something into that queue at some point of the future. But the processing of that queue is single threaded. Only one event message can be handled at the same time, but any number can be added to that queue.
You can get true concurrency with workers, but this basically adds a new javascript process that is itself single threaded, and has a method of communicating with messages to and from the main javascript process.

javascript - event driven and concurrency issues?

Greetings,
I've been studying javascript, nodejs. And I don't understand how the concurrency issues are avoided in javascript.
Lets say I'm working on a object
var bigObject = new BigObject();
and I have a setTimer(function(){ workOnBigOjbect...} ) that will also do work on bigOjbect.
If I have disk IO being written into bigObject, and a timer object working on bigObject, and regularly code reading from bigObject, how are concurrency issues avoided?
In a regular language, I would use a mutex or thread-safe queue/command pattern. I also don't see much discussion about race conditions for javascript.
Am I missing something?

The whole point of node.js is that it's event-driven. All the code runs in event handlers in a single thread. There are no concurrency issues because the code doesn't run concurrently. The downside is that each event handler must exit quickly because it blocks the other events.
In your example, the code will start the disk IO and exit immediately. The node.js infrastructure will notify the program that the IO operation was completed by running an event handler. The timer event will be called before or after the IO event, but never concurrently.

Javascript is single-threaded. If the time arrives when your function is supposed to execute (based on how you called setTimer), and the parent code is still running, the function will not execute until the parent code has completed.

There is only a single thread; see: Node.js on multi-core machines
I would speculate that this is because Multiple threads are not supported in the underlying V8 JavaScript engine since typically JavaScript executes within a browser (where in a windows case there is only a single UI thread) and does not support multiple threads.

There's is this thing in javascript called Run-to-Completion which ensures that if a code is executing it executes completely before any other (asynchronous) code runs, hence, no concurrency issues.
In case of your example whenever the timer callback is called it will execute completely and will never be pre-empted in middle to execute some other code.
See Why no concurrency control tool in javascript for more details.

What's an event-loop and how is it different than using other models?

I have been looking into Node.JS and all the documentation and blogs talk about how it uses an event-loop rather than a per-request model.
I am having some confusion understanding the difference. I feel like I am 80% there understanding it but not fully getting it yet.

A threaded model will spawn a new thread for every request. This means that you get quite some overhead in terms of computation and memory. An event loop runs in a single thread, which means you don't get the overhead.
The result of this is that you must change your programming model. Because all these different things are happening in the same thread, you cannot block. This means you cannot wait for something to happen because that would block the whole thread. Instead you define a callback that is called once the action is complete. This is usually referred to as non-blocking I/O.
Pseudo example for blocking I/O:
row = db_query('SELECT * FROM some_table');
print(row);
Pseudo example for non-blocking I/O:
db_query('SELECT * FROM some_table', function (row) {
print(row);
});
This example uses lambdas (anonymous functions) like they are used in JavaScript all the time. JS makes heavy use of events, and that's exactly what callbacks are about. Once the action is complete, an event is fired which triggers the callback. This is why it is often referred to as an evented model or also asynchronous model.
The implementation of this model uses a loop that processes and fires these events. That's why it is called an event queue or event loop.
Prominent examples of event queue frameworks include:
EventMachine (Ruby)
Tornado (Python)
node.js (V8 server-side JavaScript)

Think of incoming requests or callbacks as events, that are enqueued and processed.
That is exactly the same what is done in most of the GUI systems. The system can't know when a user will click a button or do some interaction. But when he does, the event will propagated to the event loop, which is basically a loop that checks for new events in the queue and process them.
The advantage is, that you don't have to wait for results for yourself. Instead, you register callback functions that are executed when the event is triggered. This allows the framework to handle I/O stuff and you can easily rely on it's internal efficiency when dealing with long-taking actions instead of blocking processes by yourself.
In short, everythings runs in parallel but your code. There will never be two fragments of callback functions running concurrently – the event loop is a single thread. The processes that execute stuff externally and finally propagate events however can be distributed in multiple threads/processes.

An evented loop allows you to handle the time it takes to talk to the hard drive or network. take this list of time:
Source | CPU Cycles
L1 | 3 Cycles
L2 | 14 Cycles
RAM | 250 Cycles
Disk | 41,000,000 Cycles
Network| 240,000,000 Cycles
That time you're running curl in PHP is just wasting CPU.

We Keep Coding

JavaScript is the programming language of the Web.