I'm using some functions from the async library, and want to make sure I understand how they're doing things internally; however, I'm stuck on async.waterfall (implementation here). The actual implementation uses other functions from within the library, and without much experience, I'm finding it difficult to follow.
Could someone, without worrying about optimization, provide a very simple implementation that achieves waterfall's functionality? Probably something comparable to this answer.
From the docs, waterfall's description:
Runs the tasks array of functions in series, each passing their
results to the next in the array. However, if any of the tasks pass an
error to their own callback, the next function is not executed, and
the main callback is immediately called with the error.
An example:
async.waterfall([
function(callback) {
callback(null, 'one', 'two');
},
function(arg1, arg2, callback) {
// arg1 now equals 'one' and arg2 now equals 'two'
callback(null, 'three');
},
function(arg1, callback) {
// arg1 now equals 'three'
callback(null, 'done');
}
], function (err, result) {
// result now equals 'done'
});
Well, here is a simple implementation for chaining functions by queuing them.
First of all, the function:
function waterfall(arr, cb){} // takes an array and a callback on completion
Now, we need to keep track of the array and iterate it:
function waterfall(arr, cb){
var fns = arr.slice(); // make a copy
}
Let's start with handling passed and empty array, by adding an extra parameter so we can pass results around called result:
function waterfall(arr, cb, result){ // result is the initial result
var fns = arr.slice(); // make a copy
if(fns.length === 0){
process.nextTick(function(){ // don't cause race conditions
cb(null, result); // we're done, nothing more to do
});
}
}
That's nice:
waterfall([], function(err, data){
console.log("Done!");
});
Now, let's handle actually having stuff in:
function waterfall(arr, cb, result){ // result is the initial result
var fns = arr.slice(1); // make a copy, apart from the first element
if(!arr[0]){ // if there is nothing in the first position
process.nextTick(function(){ // don't cause race conditions
cb(null, result); // we're done, nothing more to do
});
return;
}
var first = arr[0]; // get the first function
first(function(err, data){ // invoke it
// when it is done
if(err) return cb(err); // early error, terminate entire call
// perform the same call, but without the first function
// and with its result as the result
waterfall(fns, cb, data);
});
}
And that's it! We overcome the fact we can't loop with callbacks by using recursion basically. Here is a fiddle illustrating it.
It's worth mentioning that if we were implementing it with promises we could have used a for loop.
For those who like to keep it short :
function waterfall(fn, done){
fn.length ? fn.shift()(function(err){ err ? done(err) : waterfall(fn, done) }) : done();
}
Related
I have a list things that I'm sending to PHP one at a time via $.post. I want to wait for each to complete before calling the next. I want to do this with JS not doing the looping with PHP as I want the return value from each to display.
var list = ["a", "b", "c"];
for (i = 0; i < list.length; i++) {
$.post(con, {
callScript: list[i],
}, function(data, status) {
//Do stuff here with the data on success
});
}
I have looked at $.when but just can't sort out how to use it. Ever example assumes that there is a set number of functions and not the same function n times. I also know that async false is not allowed.
Is there a way to get that to run?
Recursion is your good friend here. You can create a function that invokes itself for each item, calling the next one after the async operation of the current one is finished. The recursion stops when we run out of items.
function postInOrder(list, index = 0) {
if (index < list.length) {
$.post(con, {
callScript: list[index],
}, function success(data, status) {
// do stuff here with the data on success
// ...
postInOrder(list, index + 1); // <-- run next script on completion
});
}
}
postInOrder(["a", "b", "c"])
Here's an example with a fake post method:
function postInOrder(list, index = 0) {
if (index < list.length) {
console.log(`Start: ${list[index]}`)
post(function success() {
console.log(`Finish: ${list[index]}`)
postInOrder(list, index + 1); // <-- run next script on completion
});
}
}
postInOrder(["a", "b", "c"])
function post(cb) { setTimeout(cb, 1000); }
You may also reduce it to a promise queue:
list.reduce(function(prom,listEl){
return prom.then(function(){
return new Promise(function(next){
$.post(con, {
callScript: listEl,
}, function(data, status) {
//Do stuff here with the data on success
next();
});
});
});
},Promise.resolve());
(I think wrapping into a promise is not neccessary, may someone whos shure about jquerys syntax feels free to edit ;))
There is a VERY widely used library called "async" that makes this sort of thing very easy. Here are the docs for async series which is probably what you want to do here.
Here is the example from the docs:
async.series([
function(callback) {
// do some stuff ...
callback(null, 'one');
},
function(callback) {
// do some more stuff ...
callback(null, 'two');
}
],
// optional callback
function(err, results) {
// results is now equal to ['one', 'two']
});
The pattern for the callback is pass an error as the first parameter if something went wrong, otherwise pass a null as the first parameter then the actual return value as the second parameter. That pattern is used all over asynchronous JavaScript on the server and in the browser.
That final function is what gets executed once the series is complete.
You can also get tricky with this knowing that variables can be functions in JavaScript and build your queue like this:
var processMe = [];
processMe.push(callScript('two'));
processMe.push(callScript('four'));
processMe.push(callScript('six'));
async.series([ processMe ],
function(err, results) {
// results is now equal to ['two', 'four', 'six']
});
function callScript(value, callback) {
// do something here
callback(null, value);
}
You can also use waterfall if you need to pass results from one step to another.
If it truly is the same code multiple times, use the times operator to simply iterate the same function N times:
// Pretend this is some complicated async factory
var createUser = function(id, callback) {
callback(null, {
id: 'user' + id
});
};
// generate 5 users
async.times(5, function(n, next) {
createUser(n, function(err, user) {
next(err, user);
});
}, function(err, users) {
// we should now have 5 users
});
I'm ok with javascript and callbacks, but I'm getting really annoyed at this and need to call on the the world of stackoverflow for help.
I have written a function, to be used in the following way:
var meth = lib.funcTion(a,b); // meth should hold an array of properties { c, d } once executed
So now inside lib.js, we have a structure like:
exports.funcTion = function (a,b) {
database.connect(params, function(err,get){
get.query(querylang, function(err, results){
var varsIwantToReturn = { var1: results[i].foo, var2: results[i].bar };
});
});
// Now how do i return 'varsIwantToReturn'?
};
I have seen some things about incorporating callback() into the function, but I'm not exactly sure how this works. I've also seen some people use exec() - again, im not sure on how or why to use it.
Please help :) thanks in advance.
Well, it's all asynchronous so if you attempt to return it - it'll return undefined. In JavaScript (Sans the new yield keyword) functions execute from top to bottom synchronously. When you make an IO call like a database call - it still executes synchronously. In fact- when varIwantToReturn gets population the function has long run and terminated.
What is left is to do the same thing async functions like database.connect and get.query do and have the function take a callback:
exports.function = function (a,b, callback) {
database.connect(params, function(err,get){
if(err) return callback(err, null); // don't suppress errors
get.query(querylang, function(err, results){
if(err) return callback(err, null); // don't suppress errors
var varsIwantToReturn = { var1: results[i].foo, var2: results[i].bar };
callback(null, varsIwantToReturn);
});
});
};
Then you'd call it like
myExportedFunction(myA,myB, function(err, resp){
if(err) recoverFromError(err);
// varsIWantToReturn are contained in `resp`
});
I have used async.waterfall to flatten out a nested function as per the below.
function convertPortfolio(trademarks, fn){
async.waterfall([function(callback){
groupByCountry(trademarks, callback)
},
function(TMsGroupedByCountry, callback){
addEUTrademarks(TMsGroupedByCountry['EU'], TMsGroupedByCountry, callback)
},
function(revisedGroupByCountry, callback){
groupTrademarksByStatus(revisedGroupByCountry, callback)
}], function(err, splitByStatus){
fn(null, splitByStatus);
})
}
Nested alternative
function convertPortfolio(trademarks, fn){
groupByCountry(trademarks, function(err, TMsGroupedByCountry){
addEUTrademarks(TMsGroupedByCountry['EU'], TMsGroupedByCountry, function(err, revisedGroupByCountry){
groupTrademarksByStatus(revisedGroupByCountry, function(err, splitByStatus){
fn(null, splitByStatus)
});
});
});
}
Subsequently when I call this function once as part of another function, it works perfectly. However when I attempt to call the function multiple times using a separate value from a forEach call on an array, it fails to work, when the nested version works fine. I am at a loss to explain why this would be the case and in all honesty, I'm not sure where to start. My understanding is that the forEach call should simply mean that each separate value is processed accordingly and closed over so this shouldn't be the issue.
Async function works correctly and returns value in this instance
exports.convertPortfolioAndAddToGeoJSON = function(geojson, trademarks, fn){
convertPortfolio(trademarks, function(err, revisedTMs){
addToGeoJson(geojson, revisedTMs, function(err, gj){
fn(null, gj)
});
});
}
But in this instance the end target object is not populated:
exports.convertYearPortfolioAndAddToGeoJSON = function(geojson, trademarks, fn){
var target = {};
Object.keys(trademarks).forEach(function(year){
convertPortfolio(trademarks[year], function(err, revisedTMs){
addToGeoJson(geojson, revisedTMs, function(err, revisedGeoJSON){
target[year] = revisedGeoJSON;
});
});
});
fn(null, target);
}
Using console.log at certain points shows that in the nested example, the return values are logged prior to the target object being logged, whereas with the async.waterfall example, the target object is logged prior to the returned data being available (so I suppose it is not surprising that logging the target results in an empty object).
I thought in each case that the presence of callbacks, would mean that the logging of target would only take place once all previous processed had been completed, but although this appears to be the case with the nested alternative, this is not so with the async version at least when it is called multiple times.
Any guidance would be appreciated.
UPDATE Out of interest here's the revised code using async.forEach:
exports.convertYearPortfolioAndAddToGeoJSON = function(geojson, trademarks, fn){
var target = {};
async.forEach(Object.keys(trademarks), function(year, callback){
async.waterfall([
async.apply(convertPortfolio, trademarks[year]),
function(revisedTMs, callback){
addToGeoJson(geojson, revisedTMs, callback)
}],
function(err, revisedGeoJSON){
target[year] = revisedGeoJSON;
callback()
})
}, function(err){
fn(null, target);
});
}
Object.keys(trademarks).forEach is synchronous and doesn't properly track closures/scope. You need to use async.forEach(Object.keys(trademarks), function(year, callback).... there, and adjust the async control flow accordingly.
Also just FYI when you have this pattern with a tiny wrapper function:
function convertPortfolio(trademarks, fn){
async.waterfall([function(callback){
groupByCountry(trademarks, callback)
},
You can use async.apply for that boilerplate:
function convertPortfolio(trademarks, fn){
async.waterfall([async.apply(groupByCountry, trademarks),
...
So basically I am making a database query, to get all posts with a certain id, then add them to a list, so I can return. But the list is returned, before the callback has finished.
How do I prevent it from being returned before callback has finished?
exports.getBlogEntries = function(opid) {
var list12 =[];
Entry.find({'opid' : opid}, function(err, entries) {
if(!err) {
console.log("adding");
entries.forEach( function(currentEntry){
list12.push(currentEntry);
});
}
else {
console.log("EEEERROOR");
}
//else {console.log("err");}
});
console.log(list12);
return list12;
};
ALL callback is asynchronous, so we don't have any guarantee if they will run exactly in the order we have leave them.
To fix it and make the process "synchronous" and guarantee an order executation you have two solutions:
First: make all process in nested list:
instead of this:
MyModel1.find({}, function(err, docsModel1) {
callback(err, docsModel1);
});
MyModel2.find({}, function(err, docsModel2) {
callback(err, docsModel2);
});
use this:
MyModel1.find({}, function(err, docsModel1) {
MyModel2.find({}, function(err, docsModel2) {
callback(err, docsModel1, docsModel2);
});
});
The last snippet above guarantee us that MyModel2 will be executed AFTER MyModel1 is executed.
Second: Use some framework as Async. This framework is awesome and have several helper functions to execute code in series, parallels, whatever way we want.
Example:
async.series(
{
function1 : function(callback) {
//your first code here
//...
callback(null, 'some result here');
},
function2 : function(callback) {
//your second code here (called only after the first one)
callback(null, 'another result here');
}
},
function(err, results) {
//capture the results from function1 and function2
//if function1 raise some error, function2 will not be called.
results.function1; // 'some result here'
results.function2; // 'another result here'
//do something else...
}
);
You could use sync database calls but that would work around the concept of node.js.
The proper way is to pass a callback to the function that queries the database and then call the provided callback inside the database query callback.
How do I prevent it from being returned before callback has finished?
The callback is asynchronous, and you cannot avoid that. Hence, you must not return a list.
Instead, offer a callback for when it's filled. Or return a Promise for the list. Example:
exports.getBlogEntries = function(opid, callback) {
Entry.find({'opid': opid}, callback); // yes, that's it.
// Everything else was boilerplate code
};
There is an alternate way to handle this scenario. You can use the async module and when the forEach has finished then make the return call. Please find the code snippet below for the same:
var async = requires('async');
exports.getBlogEntries = function(opid) {
var list12 =[];
Entry.find({'opid' : opid}, function(err, entries) {
if(!err) {
console.log("adding");
async.forEachSeries(entries,function(entry,returnFunction){
list12.push(entry);
},function(){
console.log(list12);
return list12;
});
}
else{
console.log("EEEERROOR");
}
});
};
I have a function that returns an array of items from MongoDB:
var getBooks = function(callback){
Comment.distinct("doc", function(err, docs){
callback(docs);
}
});
};
Now, for each of the items returned in docs, I'd like to execute another mongoose query, gather the count for specific fields, gather them all in a counts object, and finally pass that on to res.render:
getBooks(function(docs){
var counts = {};
docs.forEach(function(entry){
getAllCount(entry, ...){};
});
});
If I put res.render after the forEach loop, it will execute before the count queries have finished. However, if I include it in the loop, it will execute for each entry. What is the proper way of doing this?
I'd recommend using the popular NodeJS package, async. It's far easier than doing the work/counting, and eventual error handling would be needed by another answer.
In particular, I'd suggest considering each (reference):
getBooks(function(docs){
var counts = {};
async.each(docs, function(doc, callback){
getAllCount(entry, ...);
// call the `callback` with a error if one occured, or
// empty params if everything was OK.
// store the value for each doc in counts
}, function(err) {
// all are complete (or an error occurred)
// you can access counts here
res.render(...);
});
});
or you could use map (reference):
getBooks(function(docs){
async.map(docs, function(doc, transformed){
getAllCount(entry, ...);
// call transformed(null, theCount);
// for each document (or transformed(err); if there was an error);
}, function(err, results) {
// all are complete (or an error occurred)
// you can access results here, which contains the count value
// returned by calling: transformed(null, ###) in the map function
res.render(...);
});
});
If there are too many simultaneous requests, you could use the mapLimit or eachLimit function to limit the amount of simultaneous asynchronous mongoose requests.
forEach probably isn't your best bet here, unless you want all of your calls to getAllCount happening in parallel (maybe you do, I don't know — or for that matter, Node is still single-threaded by default, isn't it?). Instead, just keeping an index and repeating the call for each entry in docs until you're done seems better. E.g.:
getBooks(function(docs){
var counts = {},
index = 0,
entry;
loop();
function loop() {
if (index < docs.length) {
entry = docs[index++];
getAllCount(entry, gotCount);
}
else {
// Done, call `res.render` with the result
}
}
function gotCount(count) {
// ...store the count, it relates to `entry`...
// And loop
loop();
}
});
If you want the calls to happen in parallel (or if you can rely on this working in the single thread), just remember how many are outstanding so you know when you're done:
// Assumes `docs` is not sparse
getBooks(function(docs){
var counts = {},
received = 0,
outstanding;
outstanding = docs.length;
docs.forEach(function(entry){
getAllCount(entry, function(count) {
// ...store the count, note that it *doesn't* relate to `entry` as we
// have overlapping calls...
// Done?
--outstanding;
if (outstanding === 0) {
// Yup, call `res.render` with the result
}
});
});
});
In fact, getAllCount on first item must callback getAllCount on second item, ...
Two way: you can use a framework, like async : https://github.com/caolan/async
Or create yourself the callback chain. It's fun to write the first time.
edit
The goal is to have a mechanism that proceed like we write.
getAllCountFor(1, function(err1, result1) {
getAllCountFor(2, function(err2, result2) {
...
getAllCountFor(N, function(errN, resultN) {
res.sender tout ca tout ca
});
});
});
And that's what you will construct with async, using the sequence format.