Event loop blocking and async programming in Node JS - javascript

I am newbie to Node js programming and hence want to understand the core concepts and practises very appropriately. AFAIK node js has non-blocking I/O allowing all disk and other I/O operations running async way while its JS runs in single thread managing the resource and execution paths using Event Loop. As suggested at many places developers are advised to write custom functions/methods using callback pattern e.g.
function processData(inputData, cb){
// do some computation and other stuff
if (err) {
cb(err, null);
}else{
cb(null, result);
}
}
callback = function(err, result){
// check error and handle
// if not error then grab result and do some stuff
}
processData(someData, callback)
// checking whether execution is async or not
console.log('control reached at the end of block');
If I run this code everything runs synchronously printing console message at last. What I believed was that the console message will print first of all followed by execution of processData function code which will in turn invoke the callback function. And if it happens this way, the event loop will be unblocked and would only execute the response to a request when the final response is made ready by 'callback' function.
My question is:- Is the execution sequence alright as per node's nature Or Am I doing something wrong Or missing an important concept?
Any help would be appreciate from you guys !!!

JavaScript (just as pretty much any other language) runs sequentially. If you write
var x = 1;
x *= 2;
x += 1;
you expect the result to be 3, not 4. That would be pretty bad. The same goes with functions. When you write
foo();
bar();
you expect them to run in the very same order. This doesn't change when you nest functions:
var foo = function() {};
var bar = function(clb) {
clb();
foo();
};
var callback = function() {
};
bar(callback);
The expected execution sequence is bar -> callback -> foo.
So the biggest damage people do on the internet is that they put equals sign between asynchronous programming and callback pattern. This is wrong. There are many synchronous functions using callback pattern, e.g. Array.prototype.forEach().
What really happens is that NodeJS has this event loop under the hood. You can tell it to schedule something and run later by calling multiple special functions: setTimeout, setInterval, process.nextTick to name few. All I/O also schedules some things to be done on the event loop. Now if you do
var foo = function() {};
var bar = function(clb) {
procress.nextTick(clb); // <-- async here
foo();
};
var callback = function() {
};
bar(callback);
The expected execution sequence is this bar -> (schedule callback) -> foo and then callback will fire at some point (i.e. when all other queued tasks are processed).
That's pretty much how it works.

You are writing synchronous code all the way down and expecting it to run asynchronously. Just because you are using callbacks does not mean it's an asynchronous operation. You should try using timers, api within your functions to feel node js non blocking asynchronous pattern. I hope The Node.js Event Loop will help you to get started.

Related

Difference between Javascript async functions and Web workers?

Threading-wise, what's the difference between web workers and functions declared as
async function xxx()
{
}
?
I am aware web workers are executed on separate threads, but what about async functions? Are such functions threaded in the same way as a function executed through setInterval is, or are they subject to yet another different kind of threading?
async functions are just syntactic sugar around
Promises and they are wrappers for callbacks.
// v await is just syntactic sugar
// v Promises are just wrappers
// v functions taking callbacks are actually the source for the asynchronous behavior
await new Promise(resolve => setTimeout(resolve));
Now a callback could be called back immediately by the code, e.g. if you .filter an array, or the engine could store the callback internally somewhere. Then, when a specific event occurs, it executes the callback. One could say that these are asynchronous callbacks, and those are usually the ones we wrap into Promises and await them.
To make sure that two callbacks do not run at the same time (which would make concurrent modifications possible, which causes a lot of trouble) whenever an event occurs the event does not get processed immediately, instead a Job (callback with arguments) gets placed into a Job Queue. Whenever the JavaScript Agent (= thread²) finishes execution of the current job, it looks into that queue for the next job to process¹.
Therefore one could say that an async function is just a way to express a continuous series of jobs.
async function getPage() {
// the first job starts fetching the webpage
const response = await fetch("https://stackoverflow.com"); // callback gets registered under the hood somewhere, somewhen an event gets triggered
// the second job starts parsing the content
const result = await response.json(); // again, callback and event under the hood
// the third job logs the result
console.log(result);
}
// the same series of jobs can also be found here:
fetch("https://stackoverflow.com") // first job
.then(response => response.json()) // second job / callback
.then(result => console.log(result)); // third job / callback
Although two jobs cannot run in parallel on one agent (= thread), the job of one async function might run between the jobs of another. Therefore, two async functions can run concurrently.
Now who does produce these asynchronous events? That depends on what you are awaiting in the async function (or rather: what callback you registered). If it is a timer (setTimeout), an internal timer is set and the JS-thread continues with other jobs until the timer is done and then it executes the callback passed. Some of them, especially in the Node.js environment (fetch, fs.readFile) will start another thread internally. You only hand over some arguments and receive the results when the thread is done (through an event).
To get real parallelism, that is running two jobs at the same time, multiple agents are needed. WebWorkers are exactly that - agents. The code in the WebWorker therefore runs independently (has it's own job queues and executor).
Agents can communicate with each other via events, and you can react to those events with callbacks. For sure you can await actions from another agent too, if you wrap the callbacks into Promises:
const workerDone = new Promise(res => window.onmessage = res);
(async function(){
const result = await workerDone;
//...
})();
TL;DR:
JS <---> callbacks / promises <--> internal Thread / Webworker
¹ There are other terms coined for this behavior, such as event loop / queue and others. The term Job is specified by ECMA262.
² How the engine implements agents is up to the engine, though as one agent may only execute one Job at a time, it very much makes sense to have one thread per agent.
In contrast to WebWorkers, async functions are never guaranteed to be executed on a separate thread.
They just don't block the whole thread until their response arrives. You can think of them as being registered as waiting for a result, let other code execute and when their response comes through they get executed; hence the name asynchronous programming.
This is achieved through a message queue, which is a list of messages to be processed. Each message has an associated function which gets called in order to handle the message.
Doing this:
setTimeout(() => {
console.log('foo')
}, 1000)
will simply add the callback function (that logs to the console) to the message queue. When it's 1000ms timer elapses, the message is popped from the message queue and executed.
While the timer is ticking, other code is free to execute. This is what gives the illusion of multithreading.
The setTimeout example above uses callbacks. Promises and async work the same way at a lower level — they piggyback on that message-queue concept, but are just syntactically different.
Workers are also accessed by asynchronous code (i.e. Promises) however Workers are a solution to the CPU intensive tasks which would block the thread that the JS code is being run on; even if this CPU intensive function is invoked asynchronously.
So if you have a CPU intensive function like renderThread(duration) and if you do like
new Promise((v,x) => setTimeout(_ => (renderThread(500), v(1)),0)
.then(v => console.log(v);
new Promise((v,x) => setTimeout(_ => (renderThread(100), v(2)),0)
.then(v => console.log(v);
Even if second one takes less time to complete it will only be invoked after the first one releases the CPU thread. So we will get first 1 and then 2 on console.
However had these two function been run on separate Workers, then the outcome we expect would be 2 and 1 as then they could run concurrently and the second one finishes and returns a message earlier.
So for basic IO operations standard single threaded asynchronous code is very efficient and the need for Workers arises from need of using tasks which are CPU intensive and can be segmented (assigned to multiple Workers at once) such as FFT and whatnot.
Async functions have nothing to do with web workers or node child processes - unlike those, they are not a solution for parallel processing on multiple threads.
An async function is just1 syntactic sugar for a function returning a promise then() chain.
async function example() {
await delay(1000);
console.log("waited.");
}
is just the same as
function example() {
return Promise.resolve(delay(1000)).then(() => {
console.log("waited.");
});
}
These two are virtually indistinguishable in their behaviour. The semantics of await or a specified in terms of promises, and every async function does return a promise for its result.
1: The syntactic sugar gets a bit more elaborate in the presence of control structures such as if/else or loops which are much harder to express as a linear promise chain, but it's still conceptually the same.
Are such functions threaded in the same way as a function executed through setInterval is?
Yes, the asynchronous parts of async functions run as (promise) callbacks on the standard event loop. The delay in the example above would implemented with the normal setTimeout - wrapped in a promise for easy consumption:
function delay(t) {
return new Promise(resolve => {
setTimeout(resolve, t);
});
}
I want to add my own answer to my question, with the understanding I gathered through all the other people's answers:
Ultimately, all but web workers, are glorified callbacks. Code in async functions, functions called through promises, functions called through setInterval and such - all get executed in the main thread with a mechanism akin to context switching. No parallelism exists at all.
True parallel execution with all its advantages and pitfalls, pertains to webworkers and webworkers alone.
(pity - I thought with "async functions" we finally got streamlined and "inline" threading)
Here is a way to call standard functions as workers, enabling true parallelism. It's an unholy hack written in blood with help from satan, and probably there are a ton of browser quirks that can break it, but as far as I can tell it works.
[constraints: the function header has to be as simple as function f(a,b,c) and if there's any result, it has to go through a return statement]
function Async(func, params, callback)
{
// ACQUIRE ORIGINAL FUNCTION'S CODE
var text = func.toString();
// EXTRACT ARGUMENTS
var args = text.slice(text.indexOf("(") + 1, text.indexOf(")"));
args = args.split(",");
for(arg of args) arg = arg.trim();
// ALTER FUNCTION'S CODE:
// 1) DECLARE ARGUMENTS AS VARIABLES
// 2) REPLACE RETURN STATEMENTS WITH THREAD POSTMESSAGE AND TERMINATION
var body = text.slice(text.indexOf("{") + 1, text.lastIndexOf("}"));
for(var i = 0, c = params.length; i<c; i++) body = "var " + args[i] + " = " + JSON.stringify(params[i]) + ";" + body;
body = body + " self.close();";
body = body.replace(/return\s+([^;]*);/g, 'self.postMessage($1); self.close();');
// CREATE THE WORKER FROM FUNCTION'S ALTERED CODE
var code = URL.createObjectURL(new Blob([body], {type:"text/javascript"}));
var thread = new Worker(code);
// WHEN THE WORKER SENDS BACK A RESULT, CALLBACK AND TERMINATE THE THREAD
thread.onmessage =
function(result)
{
if(callback) callback(result.data);
thread.terminate();
}
}
So, assuming you have this potentially cpu intensive function...
function HeavyWorkload(nx, ny)
{
var data = [];
for(var x = 0; x < nx; x++)
{
data[x] = [];
for(var y = 0; y < ny; y++)
{
data[x][y] = Math.random();
}
}
return data;
}
...you can now call it like this:
Async(HeavyWorkload, [1000, 1000],
function(result)
{
console.log(result);
}
);

Synchronous run of included function

I have a question regarding synchronous run of included function. For example, I have the following code.
partitions(dataset, num_of_folds, function(train, test, fold) {
train_and_test(train,test, function(err, results)
{
})
})
where partitions runs num_of_folds times, for deffierent fold it returns different train and test sets. I want to run every iteration of partitions only when train_and_test is finished. How to do so?
Given your previous question: Optimum async flow for cross validation in node.js, I understand that partition is your own function, which splits the dataset into several parts, and then runs the callback (basically train_and_test here) for each of those parts.
Your issue now is that train_and_test is asynchronous, but you want to wait for each invocation to finish (which is signalled by the its own callback being called, I assume) before running the next one.
One trivial solution is do change your code to keep state, and run the next invocation from the callback. For instance:
exports.partitions = function(dataset, numOfPartitions, callback) {
var testSetCount = dataset.length / numOfPartitions;
var iPartition=0;
var iteration = function() {
if (iPartition<numOfPartitions)
{
var testSetStart = iPartition*testSetCount;
var partition = exports.partition(dataset, testSetStart, testSetCount);
callback(partition.train, partition.test, iPartition, iteration);
iPartition++;
}
};
iteration();
};
You'll then need to pass the additional callback down to your asynchronous function:
partitions(dataset, num_of_folds, function(train, test, fold, callback) {
train_and_test(train,test, function(err, results)
{
callback();
})
});
Note that I haven't tested any of the code above.
I want to run every iteration of partitions only when train_and_test is finished. How to do so?
That's entirely up to how partitions calls the callback you're giving it. If partitions calls the callback normally, then it will wait for the callback to finish before proceeding. But if it calls is asynchronously, for instance via nextTick or setTimeout or similar, then the only way you'd be able to tell it to wait would be if it provides a means of telling it that.
If I read your question another way: If train_and_test is asynchronous, and partitions isn't designed to deal with asynchronous callbacks, you can't make it wait.

Blocking javascript functions (node.js)

I have this code:
var resources = myFunc();
myFunc2(resources);
The problem is that JavaScript calls myFunc() asynchronous, and then myFunc2(), but I don't have the results of myFunc() yet.
Is there a way to block the first call? Or a way to make this work?
The reason why this code doesn't work represents the beauty and pitfalls of async javascript. It doesn't work because it is not supposed to.
When the first line of code is executed, you have basically told node to go do something and let you know when it is done. It then moves on to execute the next line of code - which is why you don't have the response yet when you get here. For more on this, I would study the event-loop in greater detail. It's a bit abstract, but it might help you wrap your head around control flow in node.
This is where callbacks come in. A callback is basically a function you pass to another function that will execute when that second function is complete. The usual signature for a callback is (err, response). This enables you to check for errors and handle them accordingly.
//define first
var first = function ( callback ) {
//This function would do something, then
// when it is done, you callback
// if no error, hand in null
callback(err, res);
};
//Then this is how we call it
first ( function (err, res) {
if ( err ) { return handleError(err); }
//Otherwise do your thing
second(res)
});
As you might imagine, this can get complicated really quickly. It is not uncommon to end up with many nested callbacks which make your code hard to read and debug.
Extra:
If you find yourself in this situation, I would check out the async library. Here is a great tutorial on how to use it.
myFunc(), if asynchronous, needs to accept a callback or return a promise. Typically, you would see something like:
myFunc(function myFuncCallback (resources) {
myFunc2(resources);
});
Without knowing more about your environment and modules, I can't give you specific code. However, most asynchronous functions in Node.js allow you to specify a callback that will be called once the function is complete.
Assuming that myFunc calls some async function, you could do something like this:
function myFunc(callback) {
// do stuff
callSomeAsyncFunction(callback);
}
myFunc(myFunc2);

Asynchronous call to synchronous function

Lets say I have an synchronous function for example:
var sum = function(x,y){
return x+y
}
I want to call this function in asynchronously. How is that possible? Is the function below be considered an asynchronous function? If this is an asynchronous function then I second function's log should be logged before the first one? I know this is a very simple example which may not be an ideal case for async function. But I just wanted to get this fundamental clear.
function(a, b, c, d, e, f, function(){
console.log(result);
})
{
result = sum(a, b);
result = sum(result, c);
result = sum(result, d);
result = sum(result, e);
result = sum(result, f);
return
)};
function(a, b, function(){
console.log(result);
})
{
result = sum(a, b);
return
)};
Please help me. If this is not correct then please help me in as to How it should be written?
From your comment on Quentin's answer
How to create a function which has some manipulation inside it which is a normal synchronous function but the overall flow doesn't wait for this function to complete rather moves to the next function.
JavaScript, the language, doesn't have a feature to do that. So you look to the environment in which it's running to see if the environment has that feature.
NodeJS does. In fact, it has several of them: setTimeout (which is a bit like what browsers give you), setImmediate, and process.nextTick. For your use case, setImmediate is probably the best choice, but see this question and its answers for more on that. setImmediate looks like this:
setImmediate(functionToCall);
(It's a global, you don't have to require anything.)
That does exactly what your title asks for: An asynchronous call to a synchronous function.
From your comment below:
I just need to create an async function inside which I call a normal synchronous function. When I mean asynchronous function I mean that when I call this function it doesn't obstruct the flow and the flow goes to the next function and this function can do something(like logging) after completion
That's not quite the same thing your question asks. :-)
First off, there are no asynchronous functions (in JavaScript). When you call a function in JavaScript, that function is executed synchronously. So that means you have two choices:
The calling code can use setImmediate to tell Node to call the function asynchronously, or
The function can do that for itself, probably using a nested function.
Here's #1:
function foo() {
console.log("A");
setImmediate(bar);
console.log("B");
}
function bar() {
console.log("C");
}
foo();
which outputs:
A
B
C
Here's #2:
function foo() {
console.log("A");
bar("C");
console.log("B");
}
function bar(arg) {
setImmediate(run);
function run() {
console.log(arg);
}
}
foo();
...which also outputs
A
B
C
Note how "C" was passed as an argument to bar. bar does nothing but schedule the callback to its nested function run. run uses the argument.
How is that possible?
It isn't. Everything it does is a synchronous operation.
You use asynchronous code when dealing with things which are intrinsically asynchronous. This is limited, in JS, almost entirely to event handling.
Is the function below be considered an asynchronous function?
No. It has a callback, but everything it does is synchronous.
I'm not completely sure that I understand what you need, but I believe that what you're looking for is to create concurrent function calls, which means you probably need to run a potentially computationally intensive code, and you don't want it to block other code from executing while this computation runs.
Node.js is usually used in a single-threaded process, which means that only a single piece of code can run at a given time, but there are still some ways to do it.
One way to do it, as stated in #T.J. Crowder 's answer, is to queue your function call with process.nextTick so it'll run on the next cycle of the event-loop. This can help break down intensive computation into iterations and that each iteration will be queued to run on the next event-loop cycle after the last one. This can help in some cases, but not always you can break down computations into iterations that run async from each other. If you're looking for a more detailed explanation into how Javascript runs asynchronously, I suggest reading this book.
There are at least two other ways (that I know of) to achieve some sort of concurrency with Node.js- Workers/Threads, and child processes.
Child processes in short are just another Node.js process that is run by your main application. You can use the native *child_process* module of Node.js to run and manage your child processes if you want to go this way. Look at the official docs for more info.
The third way to do it is to use Workers in a very similar manner to how Web Workers are used in the browser. A Worker is a snippet of code that is run in a different OS thread than your main application. To me it seems that this is the easiest way to achieve concurrency, although some will argue it is not the most robust way of doing it, and that creating child processes is better.
There a very good explanation of how Workers function on MDN. To use Workers in Node.js you will need an external module such as webworker-threads.
Here's a code sample created from your sample with a snippet from their documentation:
var Worker = require('webworker-threads').Worker;
var worker = new Worker(function(){
onmessage = function(event) {
result = sum(event.data.a, event.data.b);
result = sum(result, event.data.c);
result = sum(result, event.data.d);
result = sum(result, event.data.e);
result = sum(result, event.data.f);
postMessage(result);
self.close();
};
});
worker.onmessage = function(event) {
console.log("The result is: " + event.data);
};
worker.postMessage({a: 1, b: 2, c: 3, d: 4, e: 5, f: 6});
// Do some other stuff that will happen while your Worker runs...
With all that said, most people will try to avoid running concurrent code in Node.js. Javascript's async nature makes it easy to write simple and concise back-end code, and usually you would want to keep it that way by not trying to introduce concurrency into your application, unless it is absolutely necessary.
In Javascript (leaving node aside for a moment) An asynchronous function should be called via setTimeout(). SetTimeout will not block the script execution while waiting:
function someThingThatTakesAWhile() {
for (var i = 0; i < someVeryLargeNumber; i++) {
//...
};
}
setTimeout(function () {
someThingThatTakesAWhile();
}, 0);
(haven't tested that, please forgive typos, &c)
But if you're wanting to do this in node, I don't think this is what you really want.

node.js and asynchronous programming palindrome

This question might be possible duplication. I am a noob to node.js and asynchronous programming palindrome. I have google searched and seen a lot of examples on this, but I still have bit confusion.
OK, from google search what I understand is that all the callbacks are handled asynchronous.
for example, let's take readfile function from node.js api
fs.readFile(filename, [options], callback) // callback here will be handled asynchronously
fs.readFileSync(filename, [options])
var fs = require('fs');
fs.readFile('async-try.js' ,'utf8' ,function(err,data){
console.log(data); })
console.log("hii");
The above code will first print hii then it will print the content of
the file.
So, my questions are:
Are all callbacks handled asynchronously?
The below code is not asynchronous, why and how do I make it?
function compute(callback){
for(var i =0; i < 1000 ; i++){}
callback(i);
}
function print(num){
console.log("value of i is:" + num);
}
compute(print);
console.log("hii");
Are all callbacks handled asynchronously?
Not necessarily. Generally, they are, because in NodeJS their very goal is to resume execution of a function (a continuation) after a long running task finishes (typically, IO operations). However, you wrote yourself a synchronous callback, so as you can see they're not always asynchronous.
The below code is not asynchronous, why and how do I make it?
If you want your callback to be called asynchronously, you have to tell Node to execute it "when it has time to do so". In other words, you defer execution of your callback for later, when Node will have finished the ongoing execution.
function compute(callback){
for (var i = 0; i < 1000; i++);
// Defer execution for later
process.nextTick(function () { callback(i); });
}
Output:
hii
value of i is:1000
For more information on how asynchronous callbacks work, please read this blog post that explains how process.nextTick works.
No, that is a regular function call.
A callback will not be asynchronous unless it is forced to be. A good way to do this is by calling it within a setTimeout of 0 milliseconds, e.g.
setTimeout(function() {
// Am now asynchronous
}, 0);
Generally callbacks are made asynchronous when the calling function involves making a new request on the server (e.g. Opening a new file) and it doesn't make sense to halt execution whilst waiting for it to complete.
The below code is not asynchronous, why and how do I make it?
function compute(callback){
for(var i =0; i < 1000 ; i++){}
callback(i);
}
I'm going to assume your code is trying to say, "I need to do something 1000 times then use my callback when everything is complete".
Even your for loop won't work here, because imagine this:
function compute(callback){
for(var i =0; i < 1000 ; i++){
DatabaseModel.save( function (err, result) {
// ^^^^^^ or whatever, Some async function here.
console.log("I am called when the record is saved!!");
});
}
callback(i);
}
In this case your for loop will execute the save calls, not wait around for them to be completed. So, in your example, you may get output like (depending on timing)
I am called when the record is saved
hii
I am called when the record is saved
...
For your compute method to only call the callback when everything is truely complete - all 1000 records have been saved in the database - I would look into the async Node package, which can do this easily for you, and provide patterns for many async problems you'll face in Node.
So, you could rewrite your compute function to be like:
function compute(callback){
var count = 0
async.whilst(
function() { return count < 1000 },
function(callback_for_async_module) {
DatabaseModel.save( function (err, result) {
console.log("I am called when the record is saved!!");
callback_for_async_module();
count++;
});
},
function(err) {
// this method is called when callback_for_async_module has
// been called 1000 times
callback(count);
);
console.log("Out of compute method!");
}
Note that your compute function's callback parameter will get called sometime after console.log("Out of compute method"). This function is now asynchronous: the rest of the application does not wait around for compute to complete.
You can put every callback call inside a timeout with one milisecond, that way they will be executed first when there are a thread free and all synchron tasks are done, then will the processor work through the stack of timeouts that want to be executet.

Categories