NodeJS uses a event driven model in which only one thread executes the events. I understand the first event executed will be the user JS code. Simple example from nodeJS website of a webserver is below
var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end('Hello World\n');
}).listen(1337, '127.0.0.1');
console.log('Server running at http://127.0.0.1:1337/');
First event executed will perform the above steps. Later on the event loop will wait for events to be enqueued. My question is which thread enqueues event? Is there are a separate thread that does that? If yes can multiple threads enqueue events also?
Thank you.
My question is which thread enqueues event? Is there are a separate thread that does that? If yes can multiple threads enqueue events also?
At it's core, the javascript interpreter is a loop around the select() sytem call. The select() system call informs the OS about the filehandles that your program is interested in and the OS will block execution of your program until there are events on any of those filehandles. Basically, select() will return if there's data on some I/O channel your program is interested in.
In pseudocode, it looks something like this:
while(1) {
select(files_to_write,files_to_read,NULL,NULL,timeout)
process_writable(files_to_write)
process_readable(files_to_read)
timeout = process_timers()
}
This single function allows the interpreter to implement both asynchronous I/O and setTimeout/setInterval. (Technically, this is only partly true. Node.js uses either select or poll or epoll etc. based on what functions are available and what functions are more efficient on your OS)
So, who enqueues events? The OS.
There is one exception. Disk I/O on node.js is handled by a separate thread. So for this I/O it is this disk I/O thread that enqueues events to the event loop.
It could be implemented without threads. For example, the Tcl programming language (which predates javascript but also has built-in event loop) implements disk I/O in the main thread using features such as kqueue (on BSD based OS like MacOS) or aio (on Linux and several other OSes) or overlapped-i/o (Windows). But the node.js developers simply chose threading to handle disk i/o.
For more how this works at the C level see this answer: I know that callback function runs asynchronously, but why?
Related
I want to understand internal working of node.js, I am intentionally including computation task ( for loop). But I see it is still blocking main thread.
Here is my script
console.log("start");
for (let i = 0; i < 10; i++) {
console.log(i)
}
console.log("end")
And the o/p is :
start
1
2
3
....
10
end
But according to node.js architecture shouldn't high computation tasks be executed by different thread picked from thread pool and event loop continue executing non-blocking task?
I am referencing node.js internal architecture using this link enter link description here
Can someone please explain the architecture and behavior of the script?
By default, nodejs uses only ONE thread to run your Javascript with. That means that (unless you engage WorkerThreads which are essentially an entirely separate VM), only one piece of Javascript is ever running at once. Nodejs does not "detect" some long running piece of Javascript and move it to another thread. It has no features like that at all. If you have some long running piece of synchronous Javascript, it will block the event loop and block all other Javascript and all other event processing.
Internal to its implementation, nodejs has a thread pool that it uses for certain types of native code (internal implementations of file I/O and crypto operations). That only supports the implementation of asynchronous implementations for file I/O and crypto operations - it does not parallelize the running of Javascript.
So, your script you show:
console.log("start");
for (let i = 0; i < 10; i++) {
console.log(i)
}
console.log("end")
Is entirely synchronous and runs sequentially and blocks all other Javascript from running while it is running because it is using the one thread for running Javascript while it is running.
Nodejs gets its excellent scalability from its asynchronous I/O model that does not have to use a separate thread in order to have lots of asynchronous operations in flight at the same time. But, keep in mind that these asynchronous I/O operations all have native code behind them (some of which may use threads in their native code implementations).
But, if you have long running synchronous Javascript operations (like say something like image analysis written in Javascript), then those typically need to be moved out of the main event loop thread either by shunting them off to WorkerThreads or to other processes or to a native code implementation that may use OS threads.
But according to node.js architecture shouldn't high computation tasks be executed by different thread picked from thread pool and event loop continue executing non-blocking task?
No, that is not how nodejs works and is not a correct interpretation of the diagram you show. The thread pool is NOT used for running your Javascript. It is used for internal implementation of some APIs such as file I/O and some crypto operations. It is not used for running your Javascript. There is just one main thread for running your Javascript (unless you specifically run your code in a WorkerThread).
I want to understand internal working of node.js, I am intentionally including computation task ( for loop). But I see it is still blocking main thread.
Yes, a for loop (that does not contain an await statement that is awaiting a promise) will completely occupy the single Javascript thread and will block the event loop from processing other events while the for loop is running.
JS executes its code Synchronouse. there are few things that gets "Asynchronouse" like setInterval or setTimout for exmple. But thats actually not fully true. Asynchronouse means things get done in parallel witch is not true. Take a look at setTimeout. By executing it you add the function into the task que, later the event loop grabs it from the que and put it onto the stack and executes it, syncrhonouse. If you want to execute something really parallel then you should consider using an worker thread
There are absolutely no threads in JS (unless you explicitly use worker threads). Javascript uses cooperative multi-tasking which means that a function will always complete before the next one will start. The only other way to yield control back to the scheduler is to separate a task out into another function that is called asynchronously. So in your example, e.g., you could do:
console.log("start");
setTimeout(() => {
for (let i = 0; i < 10; i++) {
console.log(i)
}}, 0);
console.log("end")
and you would get:
start
end
1
2
..
9
This also answers your question about heavy computations: unless you use the relatively new worker threads, you cannot run heavy computations in node.js "in the background" without the use of native code.
So if you really have heavy loads you have three options:
worker threads,
native code that is multi-threaded, e.g., written in C/C++, or
breaking your computation down into small pieces, each one yielding control back to the scheduler when done (e.g., using map/reduce).
What I don't understand is, nodejs is asynchronous so I run multiple tasks doesn't matter how long they take, because the code parser continue to the other tasks and when the long task is completed it will let me know with a callback function, so where is the blocking for the CPU intensive tasks? Even if a task it takes 10 seconds the other lines of code in Js will continue to execute and to start other tasks. As NodeJs use only 4 threads for heavy tasks I understand that all this threads will be busy so here is come in place the scenario of why to don't use heavy cpu tasks with nodejs, am I right?
var listener = readAsync("I/O heavy calculation", function(){
console.log("I run after the I/O is done.");
})
//The parse will send the request and pass to the next line of code
console.log("I run before I/O request is done")
I expect the global declared console.log to run before the callback function.
Nodejs programs are single threaded by design. It prioritizes the application of whats most important inside of the event loop. There are ways to optimize your performance at scale. For example, there is a cluster module built in node js to assign the same node clients to different workers.
https://nodejs.org/en/blog/release/v10.5.0/ from node 10.5 there is multi threading support but it is experimental.
I know that CPU intensive work in main process will block the UI process. I have another question, does long time IO operation in main process block the UI.
Recently, I use electron to develop a desktop application of file management.
Step 1:
My UI process use asynchronous IPC (provided by Electron) to tell the main-process to fetch data of file list from network, (only fetch meta data of file, not contain file content)
Step 2:
Main-process fetch data of file list from network and then store the file list into sqlite(I use TypeORM ), and then select parts of the file list from sqlite, and response them back to UI-process
Sometimes the step2 costs tens of seconds (for example, I fetch 10000 items of file data from network ), and my UI will be slowed down.
So, I have two question:
+ Does long time IO operation in main process block the UI ?
+ What's the best way to do IO operation(database or local file) in electron applcation ?
Potentially, I/O can block your application. Node offers blocking and non-blocking I/O operations. You'll want to use the non-blocking variants.
The Node docs have a section on blocking vs non-blocking I/O. Two code samples from that page, one blocking, one non-blocking:
const fs = require('fs');
const data = fs.readFileSync('/file.md'); // blocks here until file is read
const fs = require('fs');
fs.readFile('/file.md', (err, data) => {
if (err) throw err;
});
The second question ("what's the best way?") is opinionated and off-topic so I'll focus on the first:
Does long time IO operation in main process block the UI ?
No it does not. I/O in electron happens either from the Chromium side or the Node.js side - in both cases JavaScript's I/O execution model uses an event loop. The action is queued and then performed either on a threadpool in the background (like dns queries for example) or using native operating system async non-blocking I/O facilities (like socket writes).
The one caveat is the browsers do offer some (older) APIs that are blocking (like a synchronous XMLHttpRequest), but you are likely not using those.
For more details see our event loop and timers tutorial.
From here i have found that node.js implements non-blocking i/o model. But i don't understand how.
As javascript is single threaded. How can a single thread do i/o operations and simultaneously executing the further process.
It is true that operations such as sleep will be blocking the thread. But I/O events can indeed be asynchronous.
Node.js uses an event loop for this. An event loop is “an entity that handles and processes external events and converts them into callback invocations”
Whenever data is needed nodejs registers a callback and sends the operation to this event loop. Whenever the data is available the callback is called.
http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/ for more info
The I/O that is handled by node.js is multithreaded internally.
It is the programming interface that is single threaded and asynchronous.
Ryan Dahl: Original Node.js presentation at JSConf 2009 (Ryan is the creator of Node.js)
This video will answer your question.
The essence of JavaScript(v8) is event callbacks (button onclick="functions()" etc). That's how the I/O is multithreaded. We still have to write our code to not have blocking I/O by only writing callbacks; otherwise the code will wait on db.query responses and be blocking before executing the next line of code.
Behind the scenes in node, how does the http module's createServer method (and its callback) interact with the event loop? Is it possible to build functionality similar to createServer on my own in userland, or would this require a change to node's underlying system code?
That is, my general understanding of node's event loop is
Event loop ticks
Node looks for any callbacks to run
Node runs those callbacks
Event loops ticks again, process repeats ad-infinitum
What I'm still a little fuzzy on is how createServer fits into the event loop. If I do something like this
var http = require('http');
// create an http server and handle with a simple hello world message
var server = http.createServer(function (request, response) {
//...
});
I'm telling node to run my callback whenever an HTTP request comes in. That doesn't seem compatible with the event loop model I understand. It seems like there's some non-userland and non-event loop that's listening for HTTP requests, and then running my callback if one comes in.
Put another way — if I think about implementing my own version version of createServer, I can't think of a way to do it since any callback I schedule will run once. Does createServer just use setTimeout or setInterval to constantly recheck for an incoming HTTP request? Or is there something lower level, more efficient going on. I understand I don't need to fully understand this to write efficient node code, but I'm curious how the underlying system was implemented.
(I tried following along in the node source, but the going is slow since I'm not familiar with the node module system, or the legacy assumptions w/r/t to coding patterns deep in the system code)
http.createServer is a convenience method for creating a new http.Server() and attaching the callback as an event listener to the request event. Of course the node http library implements the protocol parsing, as well.
There is no constant polling of the event loop, node is waiting for the C++ tcp bindings to receive data on the socket, which then marshall that data as a buffer to your callback.
If you were to implement your own http parser, you would start with a net.Server object as your base. See node's implementation here: https://github.com/joyent/node/blob/master/lib/_http_server.js#L253
The events library does the generation and handling of events as mentioned by CrazyTrain in comments. It has EventEmitter class which is used for servers, sockets and streams etc.
Event-loop like you said an infinite loop executing the callbacks after every tick. The callback provided with the http server is an eventhandler, specifically for event request.
var server = http.createServer(function (request, response) //request handler
Eventhandlers can be executed multiple times. http.server is an instance of EventEmitter. The way it works incoming requests is that it first parses an incoming request. When parsed, it emits the request event. The eventemitter then executes the callback for request with the parameters supplied.
You are right that EventEmitter is not a part of event loop. And it needs to be implemented by the developer of the module or library, only using the handlers provided by user of the module. But most importantly, it provides the necessary mechanism to implement events.