I have read the difference between Multi thread mechanism and NodeJS Single thread mechanism here. I know less about thread concept.
My question
The above article says that all the Non Blocking I/O is handled using single thread in Event loop.
I have read through questions posted in this forum, but all it says is just the overview of how single thread is working and not the deeper mechanism. says something like...
Starts processing the Client Request
If that Client Request Does Not requires any Blocking IO Operations, then process everything, prepare response and send it back to client.
If there are like 2 or more Non Blocking requests in Event Queue, Event loop takes each requests and processes it.
First request enter Event Pool and starts processing and does not wait or hold till the response and meanwhile request 2 enters and starts processing without wait.
Now,since the 2nd request has taken the thread for processing (and all request is handled using single thread) , currently what is handling the 1st request process, If there is thread sharing , how is it happening ?
Is the first request process released when handling 2nd request and later comes back to 1st request ? if so how is it happening in thread perspective ?
how does single thread processes 2 or more request concurrently as basically thread will be assigned to a request until all it's process is finished
and how is single thread handled for both Input and Output operation at same time ?
is there any topic i am missing to read so that i'm getting this single thread event loop mechanism ?
First off, "single threaded" applies only to one thread running your Javascript. The node.js has other native threads for implementing some of the functions in the built-in library. For example, file I/O uses a thread pool in order to implement asynchronous file I/O. But, what's most important to understanding how your own Javascript runs is that there is only one thread of your Javascript.
Let's imagine that you have a simple web server like this:
const http = require('http');
function sendFile(res, filename) {
if (filename.startsWith("/")) {
filename = filename.slice(1) + ".html";
}
fs.readFile("1.html", (err, data) => {
if (err) {
res.writeHead(404);
res.end("not found");
} else {
res.writeHead(200, {'Content-Type': 'text/html'});
res.write(data);
res.end();
}
});
}
const server = http.createServer((req, res) => {
if (req.url === "/1" || req.url === "/2" || req.url === "/3") {
sendFile(req.url);
} else {
res.writeHead(404);
res.end("not found");
}
});
server.listen(80);
This web server responds to requests for three URLs /1, /2 and /3.
Now imagine that three separate clients each request one of those URLs. Here's the sequence of events:
Client A requests http://myserver.com/1
Client B requests http://myserver.com/2
Client C requests http://myserver.com/3
Server receives incoming connection from client A, establishes the connection, client sends the request for /1 and the server reads and parses that request.
While the server is busy reading the request from client A, the requests from both client B and client C arrive.
The TCP stack handles incoming connections at the OS level (using other threads i.e. kernel level thread).
Notifications of the arriving connections are put in the node.js event queue. Because the node.js server is busy running Javascript for the client A connection, those two events sit in the event queue for now.
At the same time as those other connections are arriving, the node.js server is starting to run the request handler for /1. It finds a match in the first if statement and calls sendFile("/1").
sendFile() calls fs.readFile() and then returns. Because fs.readFile() is asynchronous, that file operation is started, but is handed over to the I/O system inside of node.js and then the function immediately returns. When sendFile() returns, it goes back to the http server request handler which also then returns. At this point, there's nothing else for this request to do. Control has been returned back to the interpreter to decide what to do next.
The node.js interpreter checks the event queue to see if there is anything in their to process. It finds the incoming request from client B and that request starts processing. This request goes through the same 8 and 9 steps until it returns with another fs.readFile() operations initiated.
Then step 10 is repeated for the incoming request from client C.
Then, some short time later, one of the three fs.readfile() operations that were previously initiated completes and places a completion callback into the Javascript event queue. As soon as the Javascript interpreter has nothing else to do, it finds that event in the event queue and begins to process it. This calls the callback that was passed to fs.readFile() with the two parameters that that function expects and the code in the callback starts to execute.
Assuming the fs.readFile() operation was successful, it calls res.writeHead(), then res.write(), then res.send(). Those three calls all send data to the underlying OS TCP stack where it is then sent back to the client.
After res.end() returns, control is returned back to the interpreter and it checks the event queue for the next event. If another fs.readFile() callback is already in the event queue, then it is pulled out of the event queue and processed like the previous one. If the event queue is empty, then the interpreter waits until something is put in the event queue.
If there are like 2 or more Non Blocking requests in Event Queue, Event loop takes each requests and processes it.
node.js only runs one at a time. But, the key is that asynchronous code in the request handler allows the handler to return control back to the system so that other events can be processed while that first request was waiting for its asynchronous operation to complete. This is a form of cooperative, non-pre-emptive multi-tasking. It's not multiple threads of Javascript. The first request handler actually starts and asynchronous operation and then returns (as if it was done). When it returns, the next event in the queue can start processing. At some later time when the asynchronous operation completes, it will insert its own event into the event queue and it will get back in line to use the single thread of Javascript again.
First request enter Event Pool and starts processing and does not wait or hold till the response and meanwhile request 2 enters and starts processing without wait.
Most of this has already been described above. If the Javascript thread is busy why request 2 enters the event queue, that request will sit in the event queue until the Javascript thread is no longer busy. It may have to wait a short period of time. But, it won't have to wait until request 1 is done, only until request 1 returns control back to the system and is, itself, waiting for some asynchronous operation to complete.
Now,since the 2nd request has taken the thread for processing (and all request is handled using single thread) , currently what is handling the 1st request process, If there is thread sharing , how is it happening ?
While the 2nd request is using the Javascript thread, the 1st request is not running any Javascript. It's native code asynchronous operations may be running in the background (all asynchronous operations require some native code), but there is only one piece of Javascript running at any given time so if the 2nd request is running some Javascript, then the first request is either waiting for its asynchronous operation to finish or that operation has already finished and an event is sitting in the event queue waiting for the 2nd request to be done so that event can get processed.
Is the first request process released when handling 2nd request and later comes back to 1st request ? if so how is it happening in thread perspective ?
This all works through the event queue. 1st request runs until it returns. 2nd request runs until it returns. When async operation from 1st request completes it inserts an item in the event queue. When the JS interpreter is free, it pulls that event from the event queue and runs it. There may be threads involved in native code implementations of asynchronous operations, but there is still only one thread of Javascript.
how does single thread processes 2 or more request concurrently as basically thread will be assigned to a request until all it's process is finished
It never actually runs multiple pieces of Javascript concurrently. The Javascript from each different operation runs until it returns control back to the interpreter. Asynchronous operations (such as file I/O or networking operations) can run concurrently and those are managed by native code, sometimes using additional threads and sometimes not. File I/O uses a thread pool to implement non-blocking, asynchronous file I/O. Networking uses OS event notifications (select, epoll, etc...), not threads.
and how is single thread handled for both Input and Output operation at same time ?
It doesn't in your Javascript. It would typically read, then write. It doesn't do both "at the same time". Now, the TCP stack may be doing some actual parallel work inside the OS, but that's all managed by the OS and even that probably gets serialized at the network interface at some point.Requests are handled by single thread where as input output processes are managed by os level threads created per each process by OS
is there any topic i am missing to read so that i'm getting this single thread event loop mechanism ?
Read every thing you can find about the Javascript event queue. Here are some references to get you started:
How does JavaScript handle AJAX responses in the background?
Where is the node.js event queue?
Node.js server with multiple concurrent requests, how does it work?
Asynchronous process handler in node
How does a single thread handle asynchronous code in Node.js?
Related
I am currently deeply learning Nodejs platform. As we know, Nodejs is single-threaded, and if it executes blocking operation (for example fs.readFileSync), a thread should wait to finish that operation. I decided to make an experiment: I created a server that responses with the huge amount of data from a file on each request
const { createServer } = require('http');
const fs = require('fs');
const server = createServer();
server.on('request', (req, res) => {
let data;
data =fs.readFileSync('./big.file');
res.end(data);
});
server.listen(8000);
Also, I launched 5 terminals in order to do parallel requests to a server. I waited to see that while one request is being handled, the others should wait for finishing blocking operation from the first request. However, the other 4 requests were responded concurrently. Why does this behavior occur?
What you're likely seeing is either some asynchronous part of the implementation inside of res.end() to actually send your large amount of data or you are seeing all the data get sent very quickly and serially, but the clients can't process it fast enough to actually show it serially and because the clients are each in their own separate process, they "appear" to show it arriving concurrently just because they're too slow reacting to show the actually arrival sequence.
One would have to use a network sniffer to see which of these is actually occurring or run some different tests or put some logging inside the implementation of res.end() or tap into some logging inside the client's TCP stack to determine the actual order of packet arrival among the different requests.
If you have one server and it has one request handler that is doing synchronous I/O, then you will not get multiple requests processes concurrently. If you believe that is happening, then you will have to document exactly how you measured that or concluded that (so we can help you clear up your misunderstanding) because that is not how node.js works when using blocking, synchronous I/O such as fs.readFileSync().
node.js runs your JS as single threaded and when you use blocking, synchronous I/O, it blocks that one single thread of Javascript. That's why you should never use synchronous I/O in a server, except perhaps in startup code that only runs once during startup.
What is clear is that fs.readFileSync('./big.file') is synchronous so your second request will not get started processing until the first fs.readFileSync() is done. And, calling it on the same file over and over again will be very fast (OS disk caching).
But, res.end(data) is non-blocking, asynchronous. res is a stream and you're giving the stream some data to process. It will send out as much as it can over the socket, but if it gets flow controlled by TCP, it will pause until there's more room to send on the socket. How much that happens depends upon all sorts of things about your computer, it's configuration and the network link to the client.
So, what could be happening is this sequence of events:
First request arrives and does fs.readFileSync() and calls res.end(data). That starts sending data to the client, but returns before it is done because of TCP flow control. This sends node.js back to its event loop.
Second request arrives and does fs.readFileSync() and calls res.end(data). That starts sending data to the client, but returns before it is done because of TCP flow control. This sends node.js back to its event loop.
At this point, the event loop might start processing the third or fourth requests or it might service some more events (from inside the implementation of res.end() or the writeStream from the first request to keep sending more data. If it does service those events, it could give the appearance (from the client point of view) of true concurrency of the different requests).
Also, the client could be causing it to appear sequenced. Each client is reading a different buffered socket and if they are all in different terminals, then they are multi-tasked. So, if there is more data on each client's socket than it can read and display immediately (which is probably the case), then each client will read some, display some, read some more, display some more, etc... If the delay between sending each client's response on your server is smaller than the delay in reading and displaying on the client, then the clients (which are each in their own separate processes) are able to run concurrently.
When you are using asynchronous I/O such as fs.readFile(), then properly written node.js Javascript code can have many requests "in flight" at the same time. They don't actually run concurrently at exactly the same time, but one can run, do some work, launch an asynchronous operation, then give way to let another request run. With properly written asynchronous I/O, there can be an appearance from the outside world of concurrent processing, even though it's more akin to sharing of the single thread whenever a request handler is waiting for an asynchronous I/O request to finish. But, the server code you show is not this cooperative, asynchronous I/O.
Maybe is not related directly to your question but i think this is useful,
You can use a stream instead of reading the full file into memory, for example:
const { createServer } = require('http');
const fs = require('fs');
const server = createServer();
server.on('request', (req, res) => {
const readStream = fs.createReadStream('./big.file'); // Here we create the stream.
readStream.pipe(res); // Here we pipe the readable stream to the res writeable stream.
});
server.listen(8000);
The point of doing this is:
Looks nicer.
You don't store the full file in RAM.
This works better because is non blocking, and the res object is already a stream, and this means the data will be transfered in chunks.
Ok so streams = chunked
Why not read chunks from the file and send them in real time instead of reading a really big file and divide that in chunks after?
Also why is really important on a real production server?
Because every time a request is received, your code is going to add that big file into ram, to that add this is concurrent so you are expecting to serve multiple files at the same time, so let's do the most advanced math my poor education allows:
1 request for a 1gb file = 1gb in ram
2 requests for a 1gb file = 2gb in ram
etc
That clearly doesn't scale nicely right?
Streams allows to decouple that data from the current state of the function (inside that scope), so in simple terms its going to be (with the default chunk size of 16kb):
1 request for 1gb file = 16kb in ram
2 requests for 1gb file = 32kb in ram
etc
And also, the OS its already passing a stream to node (fs) so it works with streams end to end.
Hope it helps :D.
PD: Never use sync operations (blocking) inside async operations (non blocking).
In general is an event loop only for IO ? and what exactly is an IO job?
For example let's say that a request comes into NodeJs which is then making an outbound http request to an API to get some data while not blocking the user in the meantime.
Is that an IO job and how would NodeJs handle it? what if instead of the http request I wanted to asynchronously make a lengthy calculation and then return a value to the user? Is that handled by the event loop too despite being CPU bound?
In general is an event loop only for IO ?
I wouldn't count timers (setTimeout, setInterval) and sheduling (setImmeadiate, process.nextTick) as IO but one could generally say that the events in the event loop are coming from the outside
and what exactly is an IO job?
That depends on the context you are talking about. Every program gets a certain Input by the user and generates a certain Output. In a terminal for example the input are your keypresses and the output is what is displayed. Whe talking about nodejs IO, one usually refer to network / file operations, or more generally: code not written in js.
For example let's say that a request comes into NodeJs which is then making an outbound http request to an API to get some data while not blocking the user in the meantime.
Is that an IO job and how would NodeJs handle it?
Nodejs would spawn a background thread that makes the request, the main process continues with other stuff in the meantime (continues with other events on the event queue). Then if the async request is done the background process pushes the result onto the event queue, the event loop will pull it from there and execute callbacks etc.
what if instead of the http request I wanted to asynchronously make a lengthy calculation and then return a value to the user?
You have to spawn another thread in nodejs, lengthy calculations are synchronous otherwise.
Is that handled by the event loop too despite being CPU bound?
Everything somewhen lands on the event loop, and everything gets executed on the CPU ...
I have been working in Node.js and I am wondering what exactly does listen method do, in terms of eventloop. If I had a long running request, does it mean that server will never listen since it can only do one work at a time.
var http = require('http');
function handleRequest(request, response) {
response.end('Some Response at ' + request.url);
}
var server = http.createServer(handleRequest);
server.listen(8083, function() {
console.log('Listening...')
})
Is server.listen listening to some event?
You can think of server.listen() as starting your web server so that it is actually listening for incoming requests at the TCP level. From the node.js http documentation for .listen():
Begin accepting connections on the specified port and hostname.
The callback passed to server.listen() is optional. It is only called once to indicate that the server has been successfully started and is now listening for incoming requests. It is not what is called on every new incoming request. The callback passed to .createServer() is what is called for every new incoming request.
Multiple incoming requests can be in process at the same time though due to the single-threaded nature of node.js only one request is actually executing JS code at once.
But, a long running request is generally idle most of the time (e.g. waiting for database I/O or disk I/O or network I/O) so other requests can be processed and run during that idle time. This is the async nature of node.js and why it is important to use asynchronous I/O programming with node.js rather than synchronous I/O processing because asynchronous I/O allows other requests to run during the time when node.js is just waiting for I/O.
Yes, it basically binds an event listener to that port; similar to how event listeners work in your own code. Going more in depth would involve sockets, etc...
https://nodejs.org/api/net.html#net_server_listen_port_host_backlog_callback
The other answers are essentially correct, but I wanted to add more detail.
When you call createServer, the handler you pass in is what gets called on every incoming HTTP connection. But that is merely setting that up: it does not actually start the server or start listening for those connections. That doesn't happen until you call listen.
The (optional) callback for listen is just what gets called when the server has successfully started and is now listening for connections. Most of the time, it's simply used to log to the console that the server is started. You could also use it to record server start time for uptime monitoring. That callback is NOT invoked for every HTTP request: only once on server startup.
You don't even have to supply the callback for listen; it works fine without it. Here are some common variations (note that it's a good practice to let the port be specified by an environment variable, usually PORT; if that environment variable isn't set, there is a default):
// all in one line, no startup message
var server = http.createServer(handler).listen(process.env.PORT || 8083);
// two lines, no startup message
var server = http.createServer(handler); // server NOT started
server.listen(process.env.PORT || 8083); // server started, no confirmation
// most typical variation
var server = http.createServer(handler);
server.listen(process.env.PORT || 8083, function() {
// server started, startup confirmed - note that this only gets called once
console.log('server started at ' + Date.now());
});
So What happens with socket's client scope on 'disconnect' event?
I'm trying to avoid bad racing conditions and logic flaws in my node.js + mongoose + socket.io app.
What do I mean by scope is:
io.on('connection', function (client) {
///CLIENT SCOPE///
//define private vars, store and operate client's session info//
//recieving and sending client's socket signals//
}
Some background:
Let's say, for example, I implement some function that operates db by finding room and writing user to this room.
BUT, in the moment when the room (to be written in) found, but user yet not written in - he disconnects. On disconnect event i must pull him out of his last room in db, but i can not, it's not saved in db for this moment yet.
The only way I see is to assign a bool value on 'disconnect' event against which i can check before saving guy in to the room and in the case of true don't save him at all.
What i'm confused with - would this bool survive a disconnect event, as it saved in client's scope.
What happens with the scope? is it completely wiped out on disconnect? or it's wiped out only when everything that relys on this scope is finished?
I'm using 'forceNew': true to force socket.connect(); to socket immedietly if something goes wrong (hypothetically) and socket error fired without user really leaving the site.
If user reconnects through this 'old' socket is he getting back his scope on server, or this socket's previous scope has been wiped out on his disconnection or wiped out on reconnection by on 'connection' event?
The client closure will remain alive as long as there is code running that uses that closure so you generally don't have to worry about that issue. A closure is essentially an object in Javascript and it will only be garbage collected when there is no active code that has a reference to anything inside the closure.
As for your concurrency issue with a socket being disconnected while you are writing to the DB, you are correct to recognize that this is an issue to be careful with. Exactly what you need to do about it depends a bit on how your database behaves. Because node.js runs single threaded, your node.js code writing to the database will run to completion before any disconnect event gets processed. This doesn't mean that the database write will have completed before the disconnect event starts processing, but it does mean that it will have already been sent to the database. So, if your database processes multiple requests in the order it receives them, then it will likely just do the right thing and you won't have any extra code to worry about.
If your database could actually process the delete before the write finishes (which seems unlikely), then you'd have to code up some protection for that. The simplest concept there is to implement a database queue on all database operations for a given user. I'd probably create an object with common DB methods on it to implement this queue and create a separate object in each client closure so they were local to a given user. When a database operation is in process, this object would have a flag indicating that an operation was in progress. If another database operation was called while this flag was set, that second operation would go in a queue rather than being sent directly to the database. Each time a database operation finishes, it checks the queue to see if there is a next operation is waiting to run. If so, it runs it.
FYI, I have a simple node.js app (running on a Raspberry Pi with temperature sensors) that records temperature data and every so-often it writes that data to disk using async writes. Because new temperature data could arrive while I'm in the middle of my async writes of the data, I have a similar issue. I abstracted all operations on the data in an object, implemented a flag that indicates if I'm in the process of writing the data and, if any method calls arrive to modify the data, then those operations go in a queue and they are processed only when the write has finished.
As to your last question, the client scope you have in the io.on('connection', ...) closure is associated only with that particular connection event. If you get another connection event and thus this code is triggered to run again, that will create a new and separate closure that has nothing to do with the prior one, even if the client object is the same. Normally, this works out just fine because your code in this function will just set things up again for a new connection.
I have been reading a pretty good book on node and I am on the topic of the framework where Node.js is non blocking. I come from a VB background so I am use to seeing code go in sequence. For the code below, with regards to none blocking Asynchronous framework. What is the event here, shouldn't the event loop pick up on the "for" meaning this is the event and going in sequence node should not do anything until i++ = var i?
Reason why I ask is that I am thinking of an SNMP server side application and I just can not get in my head around what node.js will do if I tell it to ping 10 devices. If the 7th IP is offline I would have to wait for the snmp timeout to occur before going to the 8th, is this correct?
var http = require('http'),
urls = ['shapeshed.com', 'www.bbc.co.uk', 'edition.cnn.com'];
function fetchPage(url) {
var start = new Date();
http.get({ host: url }, function(res) {
console.log("Got response from: " + url);
console.log('Request took:', new Date() - start, 'ms');
});
}
for(var i = 0; i < urls.length; i++) {
fetchPage(urls[i]);
}
Coming from a VB background you have an advantage: VB is event driven too! Have you ever needed to can DoEvents() in your VB code? That's telling VB to run the pending events in the event queue.
The difference is that in VB the events are typically user triggered and UI based, mouse clicks and the like. Node, being primarily server side, has events primarily around I/O.
Your code never gets interrupted or blocked (unless you do it deliberately). In the code snippet above, for example, the call to http.get means "go get this URL, and call this callback when you're done." This kicks off the http request and returns immediately. So your for loop will spin through all your URLs, kicking off all the get operations, and then be finished.
At that point you return from your function, and node goes back to the event loop. Once a request is done, node schedules that's request's callback onto the event loop and the callback will eventually run.
One thing to think about: what if one of the http requests finished before the for loop did? In that case the callback would be scheduled on the event loop. But you're not back to the event loop yet, you're still running your for loop. None of the callbacks will execute until you return from whatever function is currently running.
This kind of "don't do too much in an event handler" advice should sound very familiar to a VB programmer.
No. Asynchronous means that I/O (like a HTTP request) does not block; it is transparently handled on separate threads. The call to http.get returns immediately. Thus, your for loop actually completes (in real time) before a single byte goes over the wire.
In the case of the http module, requests are actually queued in the background for you via the Agent class. By default, node will only open 5 concurrent HTTP requests. You can change this by using a custom Agent.