node.js and asynchronous programming palindrome - javascript

This question might be possible duplication. I am a noob to node.js and asynchronous programming palindrome. I have google searched and seen a lot of examples on this, but I still have bit confusion.
OK, from google search what I understand is that all the callbacks are handled asynchronous.
for example, let's take readfile function from node.js api
fs.readFile(filename, [options], callback) // callback here will be handled asynchronously
fs.readFileSync(filename, [options])
var fs = require('fs');
fs.readFile('async-try.js' ,'utf8' ,function(err,data){
console.log(data); })
console.log("hii");
The above code will first print hii then it will print the content of
the file.
So, my questions are:
Are all callbacks handled asynchronously?
The below code is not asynchronous, why and how do I make it?
function compute(callback){
for(var i =0; i < 1000 ; i++){}
callback(i);
}
function print(num){
console.log("value of i is:" + num);
}
compute(print);
console.log("hii");

Are all callbacks handled asynchronously?
Not necessarily. Generally, they are, because in NodeJS their very goal is to resume execution of a function (a continuation) after a long running task finishes (typically, IO operations). However, you wrote yourself a synchronous callback, so as you can see they're not always asynchronous.
The below code is not asynchronous, why and how do I make it?
If you want your callback to be called asynchronously, you have to tell Node to execute it "when it has time to do so". In other words, you defer execution of your callback for later, when Node will have finished the ongoing execution.
function compute(callback){
for (var i = 0; i < 1000; i++);
// Defer execution for later
process.nextTick(function () { callback(i); });
}
Output:
hii
value of i is:1000
For more information on how asynchronous callbacks work, please read this blog post that explains how process.nextTick works.

No, that is a regular function call.
A callback will not be asynchronous unless it is forced to be. A good way to do this is by calling it within a setTimeout of 0 milliseconds, e.g.
setTimeout(function() {
// Am now asynchronous
}, 0);
Generally callbacks are made asynchronous when the calling function involves making a new request on the server (e.g. Opening a new file) and it doesn't make sense to halt execution whilst waiting for it to complete.

The below code is not asynchronous, why and how do I make it?
function compute(callback){
for(var i =0; i < 1000 ; i++){}
callback(i);
}
I'm going to assume your code is trying to say, "I need to do something 1000 times then use my callback when everything is complete".
Even your for loop won't work here, because imagine this:
function compute(callback){
for(var i =0; i < 1000 ; i++){
DatabaseModel.save( function (err, result) {
// ^^^^^^ or whatever, Some async function here.
console.log("I am called when the record is saved!!");
});
}
callback(i);
}
In this case your for loop will execute the save calls, not wait around for them to be completed. So, in your example, you may get output like (depending on timing)
I am called when the record is saved
hii
I am called when the record is saved
...
For your compute method to only call the callback when everything is truely complete - all 1000 records have been saved in the database - I would look into the async Node package, which can do this easily for you, and provide patterns for many async problems you'll face in Node.
So, you could rewrite your compute function to be like:
function compute(callback){
var count = 0
async.whilst(
function() { return count < 1000 },
function(callback_for_async_module) {
DatabaseModel.save( function (err, result) {
console.log("I am called when the record is saved!!");
callback_for_async_module();
count++;
});
},
function(err) {
// this method is called when callback_for_async_module has
// been called 1000 times
callback(count);
);
console.log("Out of compute method!");
}
Note that your compute function's callback parameter will get called sometime after console.log("Out of compute method"). This function is now asynchronous: the rest of the application does not wait around for compute to complete.

You can put every callback call inside a timeout with one milisecond, that way they will be executed first when there are a thread free and all synchron tasks are done, then will the processor work through the stack of timeouts that want to be executet.

Related

Difference between Javascript async functions and Web workers?

Threading-wise, what's the difference between web workers and functions declared as
async function xxx()
{
}
?
I am aware web workers are executed on separate threads, but what about async functions? Are such functions threaded in the same way as a function executed through setInterval is, or are they subject to yet another different kind of threading?
async functions are just syntactic sugar around
Promises and they are wrappers for callbacks.
// v await is just syntactic sugar
// v Promises are just wrappers
// v functions taking callbacks are actually the source for the asynchronous behavior
await new Promise(resolve => setTimeout(resolve));
Now a callback could be called back immediately by the code, e.g. if you .filter an array, or the engine could store the callback internally somewhere. Then, when a specific event occurs, it executes the callback. One could say that these are asynchronous callbacks, and those are usually the ones we wrap into Promises and await them.
To make sure that two callbacks do not run at the same time (which would make concurrent modifications possible, which causes a lot of trouble) whenever an event occurs the event does not get processed immediately, instead a Job (callback with arguments) gets placed into a Job Queue. Whenever the JavaScript Agent (= thread²) finishes execution of the current job, it looks into that queue for the next job to process¹.
Therefore one could say that an async function is just a way to express a continuous series of jobs.
async function getPage() {
// the first job starts fetching the webpage
const response = await fetch("https://stackoverflow.com"); // callback gets registered under the hood somewhere, somewhen an event gets triggered
// the second job starts parsing the content
const result = await response.json(); // again, callback and event under the hood
// the third job logs the result
console.log(result);
}
// the same series of jobs can also be found here:
fetch("https://stackoverflow.com") // first job
.then(response => response.json()) // second job / callback
.then(result => console.log(result)); // third job / callback
Although two jobs cannot run in parallel on one agent (= thread), the job of one async function might run between the jobs of another. Therefore, two async functions can run concurrently.
Now who does produce these asynchronous events? That depends on what you are awaiting in the async function (or rather: what callback you registered). If it is a timer (setTimeout), an internal timer is set and the JS-thread continues with other jobs until the timer is done and then it executes the callback passed. Some of them, especially in the Node.js environment (fetch, fs.readFile) will start another thread internally. You only hand over some arguments and receive the results when the thread is done (through an event).
To get real parallelism, that is running two jobs at the same time, multiple agents are needed. WebWorkers are exactly that - agents. The code in the WebWorker therefore runs independently (has it's own job queues and executor).
Agents can communicate with each other via events, and you can react to those events with callbacks. For sure you can await actions from another agent too, if you wrap the callbacks into Promises:
const workerDone = new Promise(res => window.onmessage = res);
(async function(){
const result = await workerDone;
//...
})();
TL;DR:
JS <---> callbacks / promises <--> internal Thread / Webworker
¹ There are other terms coined for this behavior, such as event loop / queue and others. The term Job is specified by ECMA262.
² How the engine implements agents is up to the engine, though as one agent may only execute one Job at a time, it very much makes sense to have one thread per agent.
In contrast to WebWorkers, async functions are never guaranteed to be executed on a separate thread.
They just don't block the whole thread until their response arrives. You can think of them as being registered as waiting for a result, let other code execute and when their response comes through they get executed; hence the name asynchronous programming.
This is achieved through a message queue, which is a list of messages to be processed. Each message has an associated function which gets called in order to handle the message.
Doing this:
setTimeout(() => {
console.log('foo')
}, 1000)
will simply add the callback function (that logs to the console) to the message queue. When it's 1000ms timer elapses, the message is popped from the message queue and executed.
While the timer is ticking, other code is free to execute. This is what gives the illusion of multithreading.
The setTimeout example above uses callbacks. Promises and async work the same way at a lower level — they piggyback on that message-queue concept, but are just syntactically different.
Workers are also accessed by asynchronous code (i.e. Promises) however Workers are a solution to the CPU intensive tasks which would block the thread that the JS code is being run on; even if this CPU intensive function is invoked asynchronously.
So if you have a CPU intensive function like renderThread(duration) and if you do like
new Promise((v,x) => setTimeout(_ => (renderThread(500), v(1)),0)
.then(v => console.log(v);
new Promise((v,x) => setTimeout(_ => (renderThread(100), v(2)),0)
.then(v => console.log(v);
Even if second one takes less time to complete it will only be invoked after the first one releases the CPU thread. So we will get first 1 and then 2 on console.
However had these two function been run on separate Workers, then the outcome we expect would be 2 and 1 as then they could run concurrently and the second one finishes and returns a message earlier.
So for basic IO operations standard single threaded asynchronous code is very efficient and the need for Workers arises from need of using tasks which are CPU intensive and can be segmented (assigned to multiple Workers at once) such as FFT and whatnot.
Async functions have nothing to do with web workers or node child processes - unlike those, they are not a solution for parallel processing on multiple threads.
An async function is just1 syntactic sugar for a function returning a promise then() chain.
async function example() {
await delay(1000);
console.log("waited.");
}
is just the same as
function example() {
return Promise.resolve(delay(1000)).then(() => {
console.log("waited.");
});
}
These two are virtually indistinguishable in their behaviour. The semantics of await or a specified in terms of promises, and every async function does return a promise for its result.
1: The syntactic sugar gets a bit more elaborate in the presence of control structures such as if/else or loops which are much harder to express as a linear promise chain, but it's still conceptually the same.
Are such functions threaded in the same way as a function executed through setInterval is?
Yes, the asynchronous parts of async functions run as (promise) callbacks on the standard event loop. The delay in the example above would implemented with the normal setTimeout - wrapped in a promise for easy consumption:
function delay(t) {
return new Promise(resolve => {
setTimeout(resolve, t);
});
}
I want to add my own answer to my question, with the understanding I gathered through all the other people's answers:
Ultimately, all but web workers, are glorified callbacks. Code in async functions, functions called through promises, functions called through setInterval and such - all get executed in the main thread with a mechanism akin to context switching. No parallelism exists at all.
True parallel execution with all its advantages and pitfalls, pertains to webworkers and webworkers alone.
(pity - I thought with "async functions" we finally got streamlined and "inline" threading)
Here is a way to call standard functions as workers, enabling true parallelism. It's an unholy hack written in blood with help from satan, and probably there are a ton of browser quirks that can break it, but as far as I can tell it works.
[constraints: the function header has to be as simple as function f(a,b,c) and if there's any result, it has to go through a return statement]
function Async(func, params, callback)
{
// ACQUIRE ORIGINAL FUNCTION'S CODE
var text = func.toString();
// EXTRACT ARGUMENTS
var args = text.slice(text.indexOf("(") + 1, text.indexOf(")"));
args = args.split(",");
for(arg of args) arg = arg.trim();
// ALTER FUNCTION'S CODE:
// 1) DECLARE ARGUMENTS AS VARIABLES
// 2) REPLACE RETURN STATEMENTS WITH THREAD POSTMESSAGE AND TERMINATION
var body = text.slice(text.indexOf("{") + 1, text.lastIndexOf("}"));
for(var i = 0, c = params.length; i<c; i++) body = "var " + args[i] + " = " + JSON.stringify(params[i]) + ";" + body;
body = body + " self.close();";
body = body.replace(/return\s+([^;]*);/g, 'self.postMessage($1); self.close();');
// CREATE THE WORKER FROM FUNCTION'S ALTERED CODE
var code = URL.createObjectURL(new Blob([body], {type:"text/javascript"}));
var thread = new Worker(code);
// WHEN THE WORKER SENDS BACK A RESULT, CALLBACK AND TERMINATE THE THREAD
thread.onmessage =
function(result)
{
if(callback) callback(result.data);
thread.terminate();
}
}
So, assuming you have this potentially cpu intensive function...
function HeavyWorkload(nx, ny)
{
var data = [];
for(var x = 0; x < nx; x++)
{
data[x] = [];
for(var y = 0; y < ny; y++)
{
data[x][y] = Math.random();
}
}
return data;
}
...you can now call it like this:
Async(HeavyWorkload, [1000, 1000],
function(result)
{
console.log(result);
}
);

Event loop blocking and async programming in Node JS

I am newbie to Node js programming and hence want to understand the core concepts and practises very appropriately. AFAIK node js has non-blocking I/O allowing all disk and other I/O operations running async way while its JS runs in single thread managing the resource and execution paths using Event Loop. As suggested at many places developers are advised to write custom functions/methods using callback pattern e.g.
function processData(inputData, cb){
// do some computation and other stuff
if (err) {
cb(err, null);
}else{
cb(null, result);
}
}
callback = function(err, result){
// check error and handle
// if not error then grab result and do some stuff
}
processData(someData, callback)
// checking whether execution is async or not
console.log('control reached at the end of block');
If I run this code everything runs synchronously printing console message at last. What I believed was that the console message will print first of all followed by execution of processData function code which will in turn invoke the callback function. And if it happens this way, the event loop will be unblocked and would only execute the response to a request when the final response is made ready by 'callback' function.
My question is:- Is the execution sequence alright as per node's nature Or Am I doing something wrong Or missing an important concept?
Any help would be appreciate from you guys !!!
JavaScript (just as pretty much any other language) runs sequentially. If you write
var x = 1;
x *= 2;
x += 1;
you expect the result to be 3, not 4. That would be pretty bad. The same goes with functions. When you write
foo();
bar();
you expect them to run in the very same order. This doesn't change when you nest functions:
var foo = function() {};
var bar = function(clb) {
clb();
foo();
};
var callback = function() {
};
bar(callback);
The expected execution sequence is bar -> callback -> foo.
So the biggest damage people do on the internet is that they put equals sign between asynchronous programming and callback pattern. This is wrong. There are many synchronous functions using callback pattern, e.g. Array.prototype.forEach().
What really happens is that NodeJS has this event loop under the hood. You can tell it to schedule something and run later by calling multiple special functions: setTimeout, setInterval, process.nextTick to name few. All I/O also schedules some things to be done on the event loop. Now if you do
var foo = function() {};
var bar = function(clb) {
procress.nextTick(clb); // <-- async here
foo();
};
var callback = function() {
};
bar(callback);
The expected execution sequence is this bar -> (schedule callback) -> foo and then callback will fire at some point (i.e. when all other queued tasks are processed).
That's pretty much how it works.
You are writing synchronous code all the way down and expecting it to run asynchronously. Just because you are using callbacks does not mean it's an asynchronous operation. You should try using timers, api within your functions to feel node js non blocking asynchronous pattern. I hope The Node.js Event Loop will help you to get started.

Synchronous run of included function

I have a question regarding synchronous run of included function. For example, I have the following code.
partitions(dataset, num_of_folds, function(train, test, fold) {
train_and_test(train,test, function(err, results)
{
})
})
where partitions runs num_of_folds times, for deffierent fold it returns different train and test sets. I want to run every iteration of partitions only when train_and_test is finished. How to do so?
Given your previous question: Optimum async flow for cross validation in node.js, I understand that partition is your own function, which splits the dataset into several parts, and then runs the callback (basically train_and_test here) for each of those parts.
Your issue now is that train_and_test is asynchronous, but you want to wait for each invocation to finish (which is signalled by the its own callback being called, I assume) before running the next one.
One trivial solution is do change your code to keep state, and run the next invocation from the callback. For instance:
exports.partitions = function(dataset, numOfPartitions, callback) {
var testSetCount = dataset.length / numOfPartitions;
var iPartition=0;
var iteration = function() {
if (iPartition<numOfPartitions)
{
var testSetStart = iPartition*testSetCount;
var partition = exports.partition(dataset, testSetStart, testSetCount);
callback(partition.train, partition.test, iPartition, iteration);
iPartition++;
}
};
iteration();
};
You'll then need to pass the additional callback down to your asynchronous function:
partitions(dataset, num_of_folds, function(train, test, fold, callback) {
train_and_test(train,test, function(err, results)
{
callback();
})
});
Note that I haven't tested any of the code above.
I want to run every iteration of partitions only when train_and_test is finished. How to do so?
That's entirely up to how partitions calls the callback you're giving it. If partitions calls the callback normally, then it will wait for the callback to finish before proceeding. But if it calls is asynchronously, for instance via nextTick or setTimeout or similar, then the only way you'd be able to tell it to wait would be if it provides a means of telling it that.
If I read your question another way: If train_and_test is asynchronous, and partitions isn't designed to deal with asynchronous callbacks, you can't make it wait.

Understanding node.js asynchronicity - for loop versus nested callback

I'm new to nodejs and is trying to understand its asynchronous idea. In the following code snippet, I'm trying to get two documents from mongodb database randomly. It works fine, but looks very ugly because of the nested callback functions. If I want to get 100 documents instead of 2, that would be a disaster.
app.get('/api/two', function(req, res){
dataset.count(function(err, count){
var docs = [];
var rand = Math.floor(Math.random() * count);
dataset.findOne({'index':rand}, function(err, doc){
docs.push(doc);
rand = Math.floor(Math.random() * count);
dataset.findOne({'index':rand}, function(err, doc1){
docs.push(doc1);
res.json(docs);
});
});
});
});
So I tried to use for-loop instead, however, the following code just doesn't work, and I guess I misunderstand the asynchronous method idea.
app.get('/api/two', function(req, res){
dataset.count(function(err, count){
var docs = []
for(i = 0; i < 2 ; i++){
var rand = Math.floor(Math.random() * count);
dataset.findOne({'index':rand}, function(err, doc){
docs.push(doc);
});
}
res.json(docs);
});
});
Can anyone help me with that and explain to me why it doesn't work? Thank you very much.
Can anyone help me with that and explain to me why it doesn't work?
tl;dr -- The problem is caused by running a loop over an asynchronous function (dataset.findOne) that cannot complete before the loop completes. You need to handle this with a library like async (as suggested by the other answer) or by callbacks as in the first code example.
Looping over an synchronous function
This may sound pedantic, but it's important to understand the differences between looping in a synchronous and asynchronous world. Consider this synchronous loop:
var numbers = [];
for( i = 0 ; i < 5 ; i++ ){
numbers[i] = i*2;
}
console.log("array:",numbers);
On my system, this outputs:
array: [ 0, 2, 4, 6, 8 ]
This is because the assignment to numbers[i] happens before the loop can iterate. For any synchronous ("blocking") assignment/function, you will get results in this manner.
For illustration, let's try this code:
function sleep(time){
var stop = new Date().getTime();
while(new Date().getTime() < stop + time) {}
}
for( i = 0 ; i < 5 ; i++ ){
sleep(1000);
}
If you get your watch out or throw in some console.log messages, you'll see that "sleeps" for 5 seconds.
This is because the while loop in sleep blocks...it iterates until the time milliseconds have passed before returning control back to the for loop.
Looping over an asynchronous function
The root of your problem is that dataset.findOne is asynchronous...which means it passes control back to the loop before the database has returned results. The findOne method takes a callback (the anonymous function(err, doc)) that creates a closure.
Describing closures here is beyond the scope of this answer, but if you search this site or use your favorite search engine for "javascript closures" you'll get tons of info.
The bottom line, though, is that the asynchronous call send the query off to the database. Because the transaction will take some time and it has a callback that can accept the query results, it hands control back to the for-loop. (Important: this is where node's "event loop" and it's intersection with "asynchronous programming" comes into play. Node is providing a non-blocking environment by allowing asynchronous behavior like this.)
Let's look at an example of how async issues can trip us up:
for( i = 0 ; i < 5 ; i++ ){
setTimeout(
function(){console.log("I think I is: ", i);} // anonymous callback
,1 // wait 1ms before using the callback function
)
}
console.log("I am done executing.")
You'll get output that looks like this:
I am done executing.
I think I is: 5
I think I is: 5
I think I is: 5
I think I is: 5
I think I is: 5
This is because setTimeout gets a function to call...so even though we only said "wait ONE millisecond", that's still longer than it takes for the loop to iterate 5 times and move on to the last console.log line.
What happens, then, is that the last line fires before the first anonymous callback fires. When it does fire, the loop has finished and i is equal to 5. So what you see here is that the loop is done, has moved on, even though the anonymous function handed to setTimeout still has access to the value of i. (This is "closures" in action...)
If we take this concept and use it to consider your second "broken" code example, we can see why you aren't getting the results you expected.
app.get('/api/two', function(req, res){
dataset.count(function(err, count){
var docs = []
for(i = 0; i < 2 ; i++){
var rand = Math.floor(Math.random() * count);
// THIS IS ASYNCHRONOUS.
// findOne gets a callback...
// hands control back to the for loop...
// and later pushes info into the "doc" array...
// too late for res.json, at least...
dataset.findOne({'index':rand}, function(err, doc){
docs.push(doc);
});
}
// THE LOOP HAS ENDED BEFORE any of the findOne callbacks fire...
// There's nothing in 'docs' to be sent back to the client. :(
res.json(docs);
});
});
The reason async, promises and other similar libraries are a good tool is that they help to solve the problem you are facing. async and promises can turn the "callback hell" that is created in this situation into a relatively clean solution...it's easier to read, easier to see where async stuff is happening, and when you need to make edits you don't have to worry about which callback level you are at/editing/etc.
You could use the async module. For example:
var async = require('async');
async.times(2, function(n, next) {
var rand = Math.floor(Math.random() * count);
dataset.findOne({'index':rand}, function(err, doc) {
next(err, doc);
});
}, function(err, docs) {
res.json(docs);
});
If you want to get 100 documents, you just need to change Async.times(2, to Async.times(100,.
The async module as mentioned above is a good solution. The reason this is happening is because a regular Javascript for loop is synchronous, while your calls to the database are asynchronous. The for loop does not know that you want to wait until the data is retrieved to go onto the next iteration, so it just keeps going, and finishes faster than the data retrieval.

Infinite execution of tasks for nodejs application

Suppose I have code, like this
function execute() {
var tasks = buildListOfTasks();
// ...
}
buildListOfTask creates array of functions. Functions are async, might issue HTTP requests or/and perform db operations.
If tasks list appears empty or all tasks are executed, I need to repeat same execute routine again. And again, in say "infinite loop". So, it's daemon like application.
I could quite understand how to accomplish that in sync-world, but bit confused how to make it possible in node.js async-world.
use async.js and it's queue object.
function runTask(task, callback) {
//dispatch a single asynchronous task to do some real work
task(callback);
}
//the 10 means allow up to 10 in parallel, then start queueing
var queue = async.queue(runTask, 10);
//Check for work to do and enqueue it
function refillQueue() {
buildListOfTasks().forEach(function (task) {
queue.push(task);
});
}
//queue will call this whenever all pending work is completed
//so wait 100ms and check again for more arriving work
queue.drain = function() {
setTimeout(refillQueue, 100);
};
//start things off initially
refillQueue();
If you're already familiar with libraries like async, you can use the execute() as the final callback to restart the tasks:
function execute(err) {
if (!err) {
async.series(buildListOfTasks(), execute);
} else {
// ...
}
}
I think you have to use async.js, probably the parallel function. https://github.com/caolan/async#parallel
In the global callback, just call execute to make a recursive call.
async.parallel(tasks,
function(err, results){
if(!err) execute();
}
);

Categories