Using async.waterfall - javascript

I'm using node.js and the async package.
Here's the code I have:
async.waterfall(
[
function(callback) {
var data = getSomeData();
callback(null, data);
},
function(data, callback) {
someFunctionThatNeedsData(data);
callback(null, 'done');
}
],
function(err, result) {
}
);
getSomeData has an asynchronous HTTP request that grabs some data from a web service. I'd like to wait until I get a response, and then return that data and pass it to someFunctionThatNeedsData.
What I expected was that getSomeData -- including the callback inside of it -- would have to complete before moving on to invoke someFunctionThatNeedsData.
The problem is that, despite using the waterfall function here, data is undefined by the time it gets to someFunctionThatNeedsData.
Additionally, from console.log I can see that the end of getSomeData is reached before the callback inside of getSomeData even begins.
Am I using waterfall incorrectly, or is it just not the right tool here? If it's just not right, what can I use to achieve the desired effect?
Or do I have to resign to having deeply nested callbacks (which, with future work, I will) and have to just mitigate it by extracting inline code into named functions?

getSomeData() has an asynchronous http request that grabs some data from a web service.
This is the issue. The execution flow already continued to the callback and executed it. This is how asynchronous functions work!
You have to pass the callback to getSomeData, which calls it once the HTTP request finished. So yes: You may need to nest the callbacks.

If you have async operation. You don't necessary to use async.waterfall. You could just do that in a promise chain style.
getSomeData().then(function(data)
{
var changeData = changeYourData(data);
return changeData;
}).then(function(changedData)
{
// some more stuff with it. You can keep on forwarding to the next `then`
}).catch(function(err)
{
// if any error throw at any point will get catch here
}).finally(function()
{
// this one will guarantee get call no matter what,
// exactly the same like async.waterfall end of chain callback
});
This example will work with Q, When, and any promise lib that follow standard.
If you need to use async.waterfall (because you could drive it with an Array.map)
You just need to callback in your then
async.waterfall(
[
function(callback) {
// A
getSomeData().then(function(data)
{
callback(null, data);
});
// B - just throw the whole thing in
callback(null , getSomeData());
},
function(data, callback) {
// A
someFunctionThatNeedsData(data);
// B
data.then(function(resolvedData)
{
someFunctionThatNeedsData(resolvedData);
callback(null, 'done');
});
}
],
function(err, result) {
});
Hope this help.

Related

Why does fs.readFile need a wrapper when it includes a callback as an argument?

I'm pretty new to Node.js and I had a confusing interaction with fs.readFile().
Initially, I thought that I could read a file using
fs.readFile("file.txt",function(err,data) {
if(err) {throw err;}
console.log(data);
});
However, this printed null. I consulted the documentation and it gives the exact same example and claims that it works fine!
Then I consulted stack overflow and found in this post and this post that a solution is to wrap fs.readFile() in your own function that takes a callback:
function read(file,callback) {
fs.readFile("file.txt",function(err,data) {
if(err) {throw err;}
callback(data);
});
}
read(file, function(data) {
console.log(data);
});
Alternatively, it's possible to just assign data to a new variable:
var content;
fs.readFile("file.txt",function(err,data) {
if(err) {throw err;}
content = data;
console.log(content);
});
My understanding is that when an asynchronous function completes and returns some value (here the contents of the file) then the callback runs on the returned data.
If fs.readFile(file,callback) expects to be passed a callback function, then why does it seemingly run the callback before fs.readFile() has completed?
Why does assigning the data to another variable change the way it behaves?
Thanks.
fs.readFile("file.txt",function(err,data) {
if(err) {throw err;}
console.log(data);
});
Would actually work.
What wouldn't work is:
var content;
fs.readFile("file.txt",function(err,data) {
if(err) {throw err;}
content = data;
});
console.log(content);
(which is the example that was in the post you had referenced)
And the reason is that fs.readLine is asynchronous to your code, meaning that the execution will carry on immediately without waiting for fs.readLine's response.
So in the case of the latter example,
console.log(content);
would execute before fs.readLine returns with an answer (i.e. the callback is triggered).
In general, you cannot run asynchronous methods and expect them to have the answer right away, the whole idea of asynchronous is that it doesn't block, the execution/program carries on before the asynchronous method returns with an answer.
The whole purpose of the callback in asynchronous method is to provide a way for the method to notify you when it's done and feed you with the result.
wrapping fs.readLine doesn't solve the problem, it just provides a more clean interface for reading files (instead of calling "fs.readLine" you would just call "read")

First function gets stuck in async.js

I tried to execute a series of events in my Node.js project with async as follows:
server.js:
var express = require('express');
var async = require('async');
...
app.get('/my_page', function (req, res) {
async.series([
function() { console.log("calling foo()"); },
function() { foo(); },
function() { console.log("foo() done"); },
]);
res.render('my_page', {});
}
but I only get the first console output and it's stuck.. the function()s were function(callback)s before. I thought it was waiting a value to be returned but removing them doesn't change the situation..
What am I doing wrong? (I'm newbie to Node.js)
Thanks for any help,
Please read the documentation. Each function should get callback as a parameter, and should call callback once it's done, thus telling async it's time to move on to the next function. So you should have something like:
function(callback) {
console.log("calling foo()");
callback();
},
If there was an error in one of the functions, call callback with the error as the 1st parameter. If you want res.render('my_page', {}); to be executed only after the last function is executed, wrap it in a function and put it as the 2nd parameter to async.series.
And of course, if non of the functions is asynchronous, you should consider not using async.
Firstly, I'm not sure why you're using async here, since none of the functions you've provided to the async.series function are asynchronous.
async.series is useful for calling multiple asynchronous functions in a row and avoiding callback hell.
var async = require("async");
function asyncA(callback){
// do something
callback(null, "value-a");
}
function asyncB(callback){
// do something else
callback(null, "value-b");
}
asyncA(function(err, valueA){
console.log(valueA); // "value-a"
asyncB(function(err, valueB){
console.log(valueB); // "value-b"
});
});
async.series([asyncA, asyncB], function(err, results){
console.log(results); // ["value-a", "value-b"]
}
I suggest doing some reading on asynchronous functions and callbacks and consider whether you really need the async library. I recommend the You Don't Know JS chapter on asynchronous javascript and the series as a whole :)

Node.JS async.parallel doesn't wait until all the tasks have completed

I am using aync.parallel to run two functions in parallel. The functions request RSS feeds. Then the RSS feeds are parsed and added to my web page.
But for some reason async.parallel runs the callback method without waiting until the two functions have completed
The documentation says:
Once the tasks have completed, the results are passed to the final
callback as an array.
My code.
require('async').parallel([ function(callback) {
fetchRss(res, bbcOpts); // Needs time to request and parse
callback();
}, function(callback) {
// Very fast.
callback();
} ], function done(err, results) {
if (err) {
throw err;
}
res.end("Done!");
});
In fact I only have "Done!" on my web page. Why?
Why do I need to call res.end()?
The Node.JS documentation says:
The method, response.end(), MUST be called on each response.
If I don't call it, my web page will be being "downloaded" (I mean a progress bar in the address line of my browser).
I suppose your fetchRss function is asynchronous: is is performed later and the callback is immediately called. You should send the callback to fetchRss:
function fetchRss(res, bbcOpts, callback) {
doSomething();
callback();
}
require('async').parallel([ function(callback) {
fetchRss(res, bbcOpts, callback); // Needs time to request and parse
}, function(callback) {
// Very fast.
callback();
} ], function done(err, results) {
if (err) {
throw err;
}
res.end("Done!");
});
On a side note, you should call res.end() in order for Node to know that the message is complete and that everything (headers + body) has been written. Otherwise, the socket will stay open and will wait forever for another message (which is why the browser shows a progress bar: it doesn't know that the request has ended).
You can use async.reflect or async.reflectAll to do make sure all functions are completed
http://caolan.github.io/async/docs.html#reflect

Join thread in JavaScript

Probably asked before, but after the serious searching I'm still not able to find a proper solution. Please consider something like this:
function compute() {
asyncCall(args, function(err, result) {
});
/* 'join thread here' */
}
Even though asyncCall is asynchronous I'd like to use the result and return it from the function compute synchronously. asyncCall is a library call and I can't modify it in any way.
How to wait properly for the asynchronous result without setTimeout and watching a conditional variable? This is possible but suboptimal.
not sure how you can really use something that doesn't exist yet, but it's easy enough to return a slot where the result will be:
function compute() {
var rez=[];
asyncCall(args, function(err, result) {
rez[0]=result;
if(rez.onchange){ rez.onchange(result); }
});
/* 'join thread here' */
return rez;
}
now, you can refer to the [0] property of the return, and once the callback comes in, compute()[0] will have the result. It will also fire an event handler you can attach to the returned array that will fire when the data updates inside the callback.
i would use something more formal like a promise or secondary callback, but that's me...
EDIT: how to integrate a callback upstream:
// sync (old and busted):
function render(){
var myView=compute();
mainDiv.innerHTML=myView;
}
//async using my re-modified compute():
function render(){
var that=compute();
that.onchange=function(e){ mainDiv.innerHTML=e; }
}
see how making it wait only added a single wrapper in the render function?
There's no await syntax in browsers that is widely available. Your options are generally limited to Callback patterns or Promises.
NodeJS follows a callback pattern for most async methods.
function someAsyncMethod(options, callback) {
//callback = function(error, data)
// when there is an error, it is the first parameter, otherwise use null
doSomethingAsync(function(){
callback(null, response);
});
}
....
someAsyncMethod({...}, function(err, data) {
if (err) return alert("OMG! FAilZ!");
// use data
});
Another common implementation is promises, such as jQuery's .ajax() method...
var px = $.ajax({...});
px.data(function(data, xhr, status){
//runs when data returns.
});
px.fail(function(err,xhr, status){
//runs when an error occurs
});
Promises are similar to events...
Of the two methods above, the callback syntax tends to be easier to implement and follow, but can lead to deeply nested callback trees, though you can use utility patterns, methods like async to overcome this.

How to end on first async parallel task completion in Node?

I have a list of tasks that I want to run in parallel using https://github.com/caolan/async.
I want the program to proceed (probably through a callback) after the first of these parallel tasks is complete, not all of them. So I don't think the naive
async.parallel([task1, task2], callback)
works for me.
Alternatively I could spawn two tasks and cancel the incomplete one, but I can't figure out how to do that using async either.
Thanks!
-Charlie
Parallel Race
You can get async to initiate the final callback by returning an error that evaluates as true but isn't actually an error.
I've put together an example that uses -1 as an error code. In the final callback I check the error value and if it's not -1 then it's an actual error. If the error value is -1 then we'll have a valid value in results. At that point, we just need to remove extra elements from results of the other async functions that have not completed yet.
In the below example I've used the request module to pull html pages and the underscore module to filter the results in the final callback.
var request = require('request');
var _ = require('underscore');
exports.parallel = function(req, res) {
async.parallel([
/* Grab Google.jp */
function(callback) {
request("http://google.jp", function(err, response, body) {
if(err) { console.log(err); callback(true); return; }
callback(-1,"google.jp");
});
},
/* Grab Google.com */
function(callback) {
request("http://google.com", function(err, response, body) {
if(err) { console.log(err); callback(true); return; }
callback(-1,"google.com");
});
}
],
/* callback handler */
function(err, results) {
/* Actual error */
if(err && err!=-1) {
console.log(err);
return;
}
/* First data */
if(err===-1) {
/*
* async#parallel returns a list, one element per parallel function.
* Functions that haven't finished yet are in the list as undefined.
* use underscore to easily filter the one result.
*/
var one = _.filter(results, function(x) {
return (x===undefined ? false : true);
})[0];
console.log(results);
console.log(one);
res.send(one);
}
}
);
};
Remaining Function Results
When you setup async#parallel to work like this you won't have access to the results of the other asynchronous functions. If you're only interested in the first one to respond then this isn't a problem. However, you will not be able to cancel the other requests. That's most likely not a problem, but it might be a consideration.
The async.parallel documentation says:
If any of the functions pass an error to its callback, the main callback is immediately called
with the value of the error.
So you could return an error object from all of your parallel functors, and the first one to finish would jump you to the completion callback. Perhaps even your own special error class, so you can tell the difference between an actual error and a "hey I won" error.
Having said that, you would still have your parallel functions running, potentially waiting for callbacks to complete or whatever. Perhaps you could use async.parallelLimit to make sure you're not firing off too many tasks in parallel ?
Having said all that, it's possible you are better served by trying another method from the async library for this task - firing off parallel tasks then having these tasks race each other may not be the best idea.

Categories