Poll phase in NodeJS event loop - javascript

I was going through node docs for event loop and I got very confused.
It says -
timers: this phase executes callbacks scheduled by setTimeout() and
setInterval().
I/O callbacks: executes almost all callbacks with the exception of close callbacks, the ones scheduled by timers, and setImmediate().
idle, prepare: only used internally.
poll: retrieve new I/O events; node will block here when appropriate.
check: setImmediate() callbacks are invoked here.
close callbacks: e.g. socket.on('close', ...).
Then in detailed poll phase, they say that it executes timers scheduled with timer and also process i/o events in poll queue. My confusion is taht we already have timer phase and i/o callback phase for those callbacks, then what is the work done by poll phase. It also says that thread may sleep in poll phase but I don't get it properly.
My questions are-
Why poll phase is executing scripts for timers and i/o(s) when we already have timer and i/o callback phase ?
Is it like poll phase executes callbacks on behalf of timer and i/o callback phase and timer and callback phase is only for internal processing no callbacks are executed in this phase ?
Where can we place promises in this loop ? Earlier I thought that promises can be thought simply as callbacks and we can treat them like callbacks only, but in this video, he says that promises goes into an internal event loop, but does not talk in detail.
I am very confused at this point. Any help will be appreciated.

The poll phase boils down to an asynchronous I/O wait. Libuv will use different APIs depending on the OS but they all generally have the same pattern. I'm going to use select() as an example.
The poll is basically a system call like this:
select(maxNumberOfIO, readableIOList, writableIOList, errorIOList, timeout);
This function blocks. If no timeout value is specified it blocks forever.
The result is that node.js will not be able to execute any javascript as long as there is no I/O activity. This obviously makes it impossible to execute time-based callbacks like setTimeout() or setInterval().
Therefore, what node needs to do before calling such a function is to calculate what value to pass as timeout. It generally does this by going through the list of all timers and figure out the shortest amount of time it can wait for I/O (the next nearest timer) and use that as the timeout value. It basically processes all the timers but not to execute their callbacks, it does it to figure out the wait time.

Nodejs has 5 major phases.
1) timers phase.
2) pending call back phase.
3) poll phase
4) check (set immediate).
5) close
Answer to your questions.
1)The call backs to timers and check phase are executed in their respective phases and not in poll phase.
2)All the I/o related call backs and other are executed in the poll phase. The pending call back phase is only for system level callbacks like tcp errors, none of our concern
3)After each phase, node js has an internal event loop which resolves all the process.nextTick callbacks, and another smaller event loop which executes the resolved promises then callbacks i.e Promise.resolve.then() callbacks.

I was just reading about that myself. As far as the timers are concerned the documentation about the event loop gives a decent answer in the form of an example. Say a setTimeout timer is set to trigger after 100ms but an I/O process is in progress (in the polling phase) and requires more than 100ms to execute, say 150ms. Once it is finished the polling phase will then wrap back to the timer phase and execute the setTimeout later than the expected 100ms, at 150ms.
Hope that helps answer how the polling phase relates to the timer phase. In essence the polling phase, as I understand it, can 'make the decision' to run the timer phase again if necessary.

Related

Node.js Event-Loop Mechanism

I'm learning the mechanism of Event-Loop in Node.js, and I'm doing some exercises, but have some confusions as explained bellow.
const fs = require("fs");
setTimeout(() => console.log("Timer 1"), 0);
setImmediate(() => console.log("Immediate 1"));
fs.readFile("test-file-with-1-million-lines.txt", () => {
console.log("I/O");
setTimeout(() => console.log("Timer 2"), 0);
setTimeout(() => console.log("Timer 3"), 3000);
setImmediate(() => console.log("Immediate 2"));
});
console.log("Hello");
I expected to see the following output:
Hello
Timer 1
Immediate 1
I/O
Timer 2
Immediate 2
Timer 3
but I get the following output:
Hello
Timer 1
Immediate 1
I/O
Immediate 2
Timer 2
Timer 3
Would you please clarify for me how are these lines executed step by step.
First off, I should mention that if you really want asynchronous operation A to be processed in a specific order with relation to asynchronous operation B, you should probably write your code such that it guarantees that without relying on the details of exactly what gets to run first. But, that said, I have run into issues where one type of asynchronous operation can "hog" the event loop and starve other types of events and it can be useful to understand what's really going on inside if/when that happens.
Broken down to its core, your question is really about why Immediate2 logs before Timer2 when scheduled from within an I/O callback, but not when called from top level code? Thus it is inconsistent.
This has to do with where the event loop is in its cycle through various checks it is doing when the setTimeout() and setImmediate() are called (when they are scheduled). It is somewhat explained here: https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/#setimmediate-vs-settimeout.
If you look at this somewhat simplified diagram of the event loop (from the above article):
You can see that there are a number of different parts to the event loop cycle. setTimeout() is served by the "timers" block at the top of the diagram. setImmediate() is served in the "check" block near the bottom of the diagram. File I/O is served in the "poll" block in the middle.
So, if you schedule both a setImmediate(fn1) and a setTimeout(fn2, 0) from within a file I/O callback (which is your case for Intermediate2 and Timer2), then the event loop processing happens to be in the poll phase when these two are scheduled. So, the next phase of the event loop is the "check" phase and the setImmediate(fn1) gets processed. Then, after the "check" phase and the "close callbacks" phase, then it cycles back around to the "timers" phase and you get the setTimeout(fn2,0).
If, on the other hand, you call those same two setImmediate() and setTimeout() from code that runs from a different phase of the event loop, then the timer might get processed first before the setImmediate() - it will depend upon exactly where that code was executed from in the event loop cycle.
This structure of the event loop is why some people describe setImmediate() as "it runs right after I/O" because it's positioned in the loop to be processed right after the "poll" phase. If you are in the middle of processing some file I/O in an I/O callback and you want something to run as soon as the stack unwinds, you can use setImmediate() to accomplish that. It will always run after the current I/O callback finishes, but before timers.
Note: Missing from this simplified description is promises which have their own special treatment. Promises are considered microtasks and they have their own queue. They get to run a lot more often. Starting with node v11, they get to run in every phase of the event loop. So, if you have three pending timers that are ready to run and you get to the timer phase of the event loop and call the callback for the first pending timer and in that timer callback, you resolve a promise, then as soon as that timer callback returns back to the system, then it will serve that resolved promise. So, microtasks (such as promises and process.nextTick()) get served (if waiting to run) between every operation in the event loop, not just between phases of the event loop, but even between pending events in the same phase. You can read more about these specifics and the changes in node v11 here: New Changes to the Timers and Microtasks in Node v11.0.0 and above.
I believe this was done to improve the performance of promise-related code as promises became more of a central part of the nodejs architecture for asynchronous operations and there is also some standards-related work in this area too to make this consistent across different JS envrionments.
Here's another reference that covers part of this:
Nodejs Event Loop - interaction with top-level code
The reason for this output is the asynchronous nature of javascript.
You set the first 2 outputs in a sort of timeout with the execution time to be 0 this makes them still wait a tick.
Next you have the file read which takes a while to be finished and thus delays the execution of the functions in the callback
The first console.log within the callback is fired as soon as the callback is executed and the rest within the callback follows the first part of your code
Lastly you have the console.log at the bottom which gets executed at first because there is no delay for it and it does not need to wait till the next tick.
As some added help, check out this video.
https://youtu.be/cCOL7MC4Pl0
The presenter gives an amazing talk on the event loop. I think it is a great resource.
While this is particularly for the browser, many aspects are shared in Node.

JS event Loop confuse

I am pretty new to JS event loop, I wonder if anyone could give me a brief walk thru about how js engine run this:
function start(){
setTimeout(function(){
console.log("Timeout")
}, 0)
setImmediate(function(){
console.log("Immediate")
})
process.nextTick(function(){
console.log("next tick")
})
}
The result is :
next tick
Timeout
Immediate
I thought when JS engine runs this,
it parses the script, set Timer, set Immediate, set nextTick queue, then goes into POLL stage, checks if anything queued there(nothing in this case)
Before moving to CHECK stage, it runs nextTick queue, print "next tick".
Moves to CHECK stage, run immediate queue, print "Immediate"
Loops back to TIMER stage, print "Timeout"
My confuse is why setTimeout print out before Immediate?
PS,
After I set Timeout delay from 0 to 3 or more, then the order is:
next tick
Immediate
Timeout
But this still does not explain why previous order JS event loop runs.
I wonder is there anything I missed?
setImmediate queues the function behind whatever I/O, event, callbacks that are already in the event queue. So, in this case setTimeout is already in queue.
I think you should read https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/ again. It has answer to all your questions. I am quoting following lines from above mentioned document
setImmediate() vs setTimeout()
setImmediate and setTimeout() are similar, but behave in different ways depending on when they are called.
setImmediate() is designed to execute a script once the current poll phase completes.
setTimeout() schedules a script to be run after a minimum threshold in ms has elapsed.
The order in which the timers are executed will vary depending on the context in which they are called.
Understanding process.nextTick()
process.nextTick() is not technically part of the event loop. Instead, the nextTickQueue will be processed after the current operation completes, regardless of the current phase of the event loop.
Looking back at our diagram, any time you call process.nextTick() in a given phase, all callbacks passed to process.nextTick() will be resolved before the event loop continues. This can create some bad situations because it allows you to "starve" your I/O by making recursive process.nextTick() calls, which prevents the event loop from reaching the poll phase.

When does the event loop turn?

I am a bit confused about when the event loop spins in the browser.
The questions are:
Does a task and the pending microtasks, happen in the same loop iteration/turn/tick?
Which are the actual conditions that need to be met in order for the loop to turn?
Are these conditions the same in node.js event loop? - I don't know if this is a stupid question.
Let's say we have a webpage and in the front end we have JavaScript code that schedules a task and waits for a promise (which is a microtask). Is the execution of the promise considered to happen in the same turn of the event loop as the task, or in different iterations?
I currently assume that they all happen in the same iteration. Since if betting otherwise, in the case of microtasks executing while mid-task, that would mean that the task would require multiple loop iterations in order to fully complete. Which seems wired to me. Would it be correct to also say that the update rendering part, that may occur after each task, happens in the same loop turn?
Thank you in advance!
------------------------------------------------------------------------------------
I know I am supposed to add a comment, but it is going to be a long one, and I also need to write code, so I am editing the question and asking for clarification here.
#T.J. Crowder Thank you so much for your time and detailed explanation!
I had indeed misread "microtasks are processed after callbacks (as long as no other JavaScript is mid-execution)" in this great article and had gotten a bit confused.
I also had questions about the 4ms setTimout for which I couldn't find information about, so thanks for that info also.
One last thing, though... If we were to mark the loop ticks between the example code, where would we put them (assuming console.logs do not exist)?
Suppose we have a function named exampleCode, having the following body:
setTimeout(setTimeoutCallback, 0);
Promise.resolve().then(promiseCallback);
For the above code, my guess would be...
Just before executing exampleCode (macro)task:
first loop tick
setTimeoutCallback (macro)task scheduling
Promise.then microtask scheduling
promiseCallback execution
second loop tick
setTimeoutCallback execution
third loop tick
Or is there an additional loop tick between between Promise.then microtask scheduling and promiseCallback execution ?
Thank you in advance once again!
Does a task and the pending microtasks, happen in the same loop iteration/turn/tick?
The task occurs, then when it ends, any pending microtasks it scheduled are run.
Which are the actual conditions that need to be met in order for the loop to turn?
It's not clear what you mean by this. It may be easier to think in terms of jobs and a job queue (which is the ECMAScript spec's terminology): If there is a pending job and the thread servicing that queue is not doing something else, it picks up the job and runs it to completion.
Are these conditions the same in node.js event loop?
Close enough, yes.
Let's say we have a webpage and in the front end we have JavaScript code that schedules a task and waits for a promise (which is a microtask). Is the execution of the promise considered to happen in the same turn of the event loop as the task, or in different iterations?
In a browser (and in Node), it happens after the task completes, when the task's microtasks (if any) are run, before the next queued task/job gets picked up.
For instance:
// This code is run in a task/job
console.log("Scheduling (macro)task/job");
setTimeout(() => {
console.log("timeout callback ran");
}, 0);
console.log("Scheduling microtask/job");
Promise.resolve().then(() => {
console.log("promise then callback ran");
});
console.log("main task complete");
On a compliant browser (and Node), that will output:
Scheduling (macro)task/job
Scheduling microtask/job
main task complete
promise then callback ran
timeout callback ran
...because the microtask ran when the main task completed, before the next macrotask ran.
(Note that setTimeout(..., 0) will indeed schedule the timer to run immediately on compliant browsers provided it's not a nested timeout; more here. You'll see people saying there is no "setTimeout 0", but that's outdated information. It's only clamped to 4ms if the timer's nesting level is > 5.)
More to explore:
MDN Concurrency Model and Event Loop
ECMAScript Spec: Jobs and Job Queues
WHAT-WG "HTML5" Spec: Event Loops, Processing Model, Timers, and Timer Initialization Steps
Re the code and question in the edit/comment:
setTimeout(setTimeoutCallback, 0);
Promise.resolve().then(promiseCallback);
Your guess looks pretty good. Here's how I'd describe it:
Schedule the task to run that script
(When thread is next free)
Pick up the next task (from #1 above)
Run that task:
Create the task for the timer callback
In parallel, queue the task when the time comes
Queue a microtask for the promise then callback
End of task
Microtask check
Run the then callback
(When thread is next free)
Pick up the next task (the one for the timer callback)
Run the task
End of task
Microtask check (none to do in this case)
The processing model text explicitly calls out that the task ends prior to the microtask check, but I don't think that's observable in any real way.

What is there so exceptional about timers in Node.js?

I want to understand Event Loop better. I read documents, articles, Node.js' API docs. Almost all of them separate Timers:
setImmediate():
setImmediate(callback[, arg][, ...])
To schedule the "immediate"
execution of callback after I/O events callbacks and before setTimeout
and setInterval .
process.nextTick():
process.nextTick(callback[, arg][, ...])#
This is not a simple alias to setTimeout(fn, 0), it's much more
efficient. It runs before any additional I/O events (including
timers) fire in subsequent ticks of the event loop.
Why? What is so exceptional about timer functions in Node.js in the context of Event Loop?
These all relate to having really fine grained control over the asynchronous execution of the callback.
The nextTick() function is executed the soonest to the point where it's called. It got its name from event loop "tick". Tick represents full turn of the event loop where different sort of events such as timers, io, network are exhausted once. Even though the name today is confusing since it has changed semantics over node versions. The nextTick() callback is executed in the same tick as the calling code. Adding more nextTick()s in the nextTick() callback function aggregates them and they are called in the same tick.
The setImmediate() gives the second closest execution. It's almost identical to setTimeout(0) but it is called before all traditional setTimeout() and setInterval() timers. It is processed as the first thing at the beginning of the next tick. You get sort of a fast lane of asynchronous execution that is privileged over the traditional setInterval() and setTimeout(). Adding more setImmediate() callbacks in setImmediate() itself defers them to next tick and they are not executed in the same tick in contrast to how nextTick()s are. This setImmediate() functionality was originally the semantics for nextTick(), hence the name nextTick().
The setTimeout() and setInterval() work then as expected, having the third closest exeuction point (or later if the timeouts are long).
Put simply, the order they called is:nextTick()-->setImmediate()-->setTimeout(fn, 0)

Javascript internals - clearTimeout just before it fires

Let's say I do this:
var timer = setTimeout(function() {
console.log("will this happen?");
}, 5000);
And then after just less than 5 seconds, another callback (from a network event in NodeJS for example) fires and clears it:
clearTimeout(timer);
Is there any possibility that the callback from the setTimeout call is already in the queue to be executed at this point, and if so will the clearTimeout be in time to stop it?
To clarify, I am talking about a situation where the setTimeout time actually expires and the interpreter starts the process of executing it, but the other callback is currently running so the message is added to the queue. It seems like one of those race condition type things that would be easy to not account for.
Even though Node is single thread, the race condition the question describes is possible.
It can happen because timers are triggered by native code (in lib_uv).
On top of that, Node groups timers with the same timeout value. As a result, if you schedule two timers with the same timeout within the same ms, they will be added to the event queue at once.
But rest assured node internally solves that for you. Quoting code from node 0.12.0:
timer.js > clearTimeout
exports.clearTimeout = function(timer) {
if (timer && (timer[kOnTimeout] || timer._onTimeout)) {
timer[kOnTimeout] = timer._onTimeout = null;
// ...
}
}
On clearing a timeout, Node internally removes the reference to the callback function. So even if the race condition happens, it can do no harm, because those timers will be skipped:
listOnTimeout
if (!first._onTimeout) continue;
Node.js executes in a single thread.
So there cannot be any race conditions and you can reliably cancel the timeout before it triggers.
See also a related discussion (in browsers).
I am talking about a situation where the setTimeout time actually expires and the interpreter starts the process of executing it
Without having looked at Node.js internals, I don't think this is possible. Everything is single-threaded, so the interpreter cannot be "in the process" of doing anything while your code is running.
Your code has to return control before the timeout can be triggered. If you put an infinite loop in your code, the whole system hangs. This is all "cooperative multitasking".
This behavior is defined in the HTML Standard, the fired task starts with:
If the entry for handle in the list of active timers has been cleared, then abort these steps.
Therefore even if the task has been queued already, it'll be aborted.
Whether this applies to Node.js, however, is debatable, as the documentation just states:
The timer functions within Node.js implement a similar API as the timers API provided by Web Browsers but use a different internal implementation that is built around the Node.js Event Loop.

Categories