I have used async.waterfall to flatten out a nested function as per the below.
function convertPortfolio(trademarks, fn){
async.waterfall([function(callback){
groupByCountry(trademarks, callback)
},
function(TMsGroupedByCountry, callback){
addEUTrademarks(TMsGroupedByCountry['EU'], TMsGroupedByCountry, callback)
},
function(revisedGroupByCountry, callback){
groupTrademarksByStatus(revisedGroupByCountry, callback)
}], function(err, splitByStatus){
fn(null, splitByStatus);
})
}
Nested alternative
function convertPortfolio(trademarks, fn){
groupByCountry(trademarks, function(err, TMsGroupedByCountry){
addEUTrademarks(TMsGroupedByCountry['EU'], TMsGroupedByCountry, function(err, revisedGroupByCountry){
groupTrademarksByStatus(revisedGroupByCountry, function(err, splitByStatus){
fn(null, splitByStatus)
});
});
});
}
Subsequently when I call this function once as part of another function, it works perfectly. However when I attempt to call the function multiple times using a separate value from a forEach call on an array, it fails to work, when the nested version works fine. I am at a loss to explain why this would be the case and in all honesty, I'm not sure where to start. My understanding is that the forEach call should simply mean that each separate value is processed accordingly and closed over so this shouldn't be the issue.
Async function works correctly and returns value in this instance
exports.convertPortfolioAndAddToGeoJSON = function(geojson, trademarks, fn){
convertPortfolio(trademarks, function(err, revisedTMs){
addToGeoJson(geojson, revisedTMs, function(err, gj){
fn(null, gj)
});
});
}
But in this instance the end target object is not populated:
exports.convertYearPortfolioAndAddToGeoJSON = function(geojson, trademarks, fn){
var target = {};
Object.keys(trademarks).forEach(function(year){
convertPortfolio(trademarks[year], function(err, revisedTMs){
addToGeoJson(geojson, revisedTMs, function(err, revisedGeoJSON){
target[year] = revisedGeoJSON;
});
});
});
fn(null, target);
}
Using console.log at certain points shows that in the nested example, the return values are logged prior to the target object being logged, whereas with the async.waterfall example, the target object is logged prior to the returned data being available (so I suppose it is not surprising that logging the target results in an empty object).
I thought in each case that the presence of callbacks, would mean that the logging of target would only take place once all previous processed had been completed, but although this appears to be the case with the nested alternative, this is not so with the async version at least when it is called multiple times.
Any guidance would be appreciated.
UPDATE Out of interest here's the revised code using async.forEach:
exports.convertYearPortfolioAndAddToGeoJSON = function(geojson, trademarks, fn){
var target = {};
async.forEach(Object.keys(trademarks), function(year, callback){
async.waterfall([
async.apply(convertPortfolio, trademarks[year]),
function(revisedTMs, callback){
addToGeoJson(geojson, revisedTMs, callback)
}],
function(err, revisedGeoJSON){
target[year] = revisedGeoJSON;
callback()
})
}, function(err){
fn(null, target);
});
}
Object.keys(trademarks).forEach is synchronous and doesn't properly track closures/scope. You need to use async.forEach(Object.keys(trademarks), function(year, callback).... there, and adjust the async control flow accordingly.
Also just FYI when you have this pattern with a tiny wrapper function:
function convertPortfolio(trademarks, fn){
async.waterfall([function(callback){
groupByCountry(trademarks, callback)
},
You can use async.apply for that boilerplate:
function convertPortfolio(trademarks, fn){
async.waterfall([async.apply(groupByCountry, trademarks),
...
Related
I ran into an issue whilst attempting to create the logic to add rows to a new table I made on my MySql database. When adding a row I need to query the database 4 times to check other rows and to then add the correct value to the new row. I am using node.js and the mysql module to accomplish this. While coding I ran into a snag, the code does not wait for the 4 queries to finish before inserting the new row, this then gives the values being found a value of 0 every time. After some research I realize a callback function would be in order, looking something like this:
var n = 0;
connection.query("select...", function(err, rows){
if(err) throw err;
else{
if(rows.length === 1) ++n;
}
callback();
});
function callback(){
connection.query("insert...", function(err){
if(err) throw err;
});
}
Note: The select queries can only return one item so the if condition should not effect this issue.
A callback function with only one query to wait on is clear to me, but I become a bit lost for multiple queries to wait on. The only idea that I had would be to create another variable that increments before the callback is called, and is then passed in the callback function's arguments. Then inside the callback the query could be encapsulated by an if statement with a condition of this being the variable equaling the number of queries that need to be called, 4 for my purposes here. I could see this working but wasn't sure if this sort of situation already has a built in solution or if there are other, better, solutions already developed.
You need async (https://github.com/caolan/async). You can do a very complex logic with this module.
var data = {} //You can do this in many ways but one way is defining a global object so you can add things to this object and every function can see it
firstQueryFunction(callback){
//do your stuff with mysql
data.stuff = rows[0].stuff; //you can store stuff inside your data object
callback(null);
}
secondQueryFunction(callback){
//do your stuff with mysql
callback(null);
}
thirdQueryFunction(callback){
//do your stuff with mysql
callback(null);
}
fourthQueryFunction(callback){
//do your stuff with mysql
callback(null);
}
//This functions will be executed at the same time
async.parallel([
firstQueryFunction,
secondQueryFunction,
thirdQueryFunction,
fourthQueryFunction
], function (err, result) {
//This code will be executed after all previous queries are done (the order doesn't matter).
//For example you can do another query that depends of the result of all the previous queries.
});
As per Gesper's answer I'd recommend the async library, however, I would probably recommend running in parallel (unless the result of the 1st query is used as input to the 2nd query).
var async = require('async');
function runQueries(param1, param2, callback) {
async.parallel([query1, query2, query3(param1, param2), query4],
function(err, results) {
if(err) {
callback(err);
return;
}
var combinedResult = {};
for(var i = 0; i < results.length; i++) {
combinedResult.query1 = combinedResult.query1 || result[i].query1;
combinedResult.query2 = combinedResult.query2 || result[i].query2;
combinedResult.query3 = combinedResult.query3 || result[i].query3;
combinedResult.query4 = combinedResult.query4 || result[i].query4;
}
callback(null, combinedResult);
});
}
function query1(callback) {
dataResource.Query(function(err, result) {
var interimResult = {};
interimResult.query1 = result;
callback(null, interimResult);
});
}
function query2(callback) {
dataResource.Query(function(err, result) {
var interimResult = {};
interimResult.query2 = result;
callback(null, interimResult);
});
}
function query3(param1, param2) {
return function(callback) {
dataResource.Query(param1, param2, function(err, result) {
var interimResult = {};
interimResult.query3 = result;
callback(null, interimResult);
});
}
}
function query4(callback) {
dataResource.Query(function(err, result) {
var interimResult = {};
interimResult.query4 = result;
callback(null, interimResult);
});
}
Query3 shows the use of parameters being 'passed through' to the query function.
I'm sure someone can show me a much better way of combining the result, but that is the best I have come up with so far. The reason for the use of the interim object, is that the "results" parameter passed to your callback is an array of results, and it can be difficult to determine which result is for which query.
Good luck.
I'm using some functions from the async library, and want to make sure I understand how they're doing things internally; however, I'm stuck on async.waterfall (implementation here). The actual implementation uses other functions from within the library, and without much experience, I'm finding it difficult to follow.
Could someone, without worrying about optimization, provide a very simple implementation that achieves waterfall's functionality? Probably something comparable to this answer.
From the docs, waterfall's description:
Runs the tasks array of functions in series, each passing their
results to the next in the array. However, if any of the tasks pass an
error to their own callback, the next function is not executed, and
the main callback is immediately called with the error.
An example:
async.waterfall([
function(callback) {
callback(null, 'one', 'two');
},
function(arg1, arg2, callback) {
// arg1 now equals 'one' and arg2 now equals 'two'
callback(null, 'three');
},
function(arg1, callback) {
// arg1 now equals 'three'
callback(null, 'done');
}
], function (err, result) {
// result now equals 'done'
});
Well, here is a simple implementation for chaining functions by queuing them.
First of all, the function:
function waterfall(arr, cb){} // takes an array and a callback on completion
Now, we need to keep track of the array and iterate it:
function waterfall(arr, cb){
var fns = arr.slice(); // make a copy
}
Let's start with handling passed and empty array, by adding an extra parameter so we can pass results around called result:
function waterfall(arr, cb, result){ // result is the initial result
var fns = arr.slice(); // make a copy
if(fns.length === 0){
process.nextTick(function(){ // don't cause race conditions
cb(null, result); // we're done, nothing more to do
});
}
}
That's nice:
waterfall([], function(err, data){
console.log("Done!");
});
Now, let's handle actually having stuff in:
function waterfall(arr, cb, result){ // result is the initial result
var fns = arr.slice(1); // make a copy, apart from the first element
if(!arr[0]){ // if there is nothing in the first position
process.nextTick(function(){ // don't cause race conditions
cb(null, result); // we're done, nothing more to do
});
return;
}
var first = arr[0]; // get the first function
first(function(err, data){ // invoke it
// when it is done
if(err) return cb(err); // early error, terminate entire call
// perform the same call, but without the first function
// and with its result as the result
waterfall(fns, cb, data);
});
}
And that's it! We overcome the fact we can't loop with callbacks by using recursion basically. Here is a fiddle illustrating it.
It's worth mentioning that if we were implementing it with promises we could have used a for loop.
For those who like to keep it short :
function waterfall(fn, done){
fn.length ? fn.shift()(function(err){ err ? done(err) : waterfall(fn, done) }) : done();
}
I'm ok with javascript and callbacks, but I'm getting really annoyed at this and need to call on the the world of stackoverflow for help.
I have written a function, to be used in the following way:
var meth = lib.funcTion(a,b); // meth should hold an array of properties { c, d } once executed
So now inside lib.js, we have a structure like:
exports.funcTion = function (a,b) {
database.connect(params, function(err,get){
get.query(querylang, function(err, results){
var varsIwantToReturn = { var1: results[i].foo, var2: results[i].bar };
});
});
// Now how do i return 'varsIwantToReturn'?
};
I have seen some things about incorporating callback() into the function, but I'm not exactly sure how this works. I've also seen some people use exec() - again, im not sure on how or why to use it.
Please help :) thanks in advance.
Well, it's all asynchronous so if you attempt to return it - it'll return undefined. In JavaScript (Sans the new yield keyword) functions execute from top to bottom synchronously. When you make an IO call like a database call - it still executes synchronously. In fact- when varIwantToReturn gets population the function has long run and terminated.
What is left is to do the same thing async functions like database.connect and get.query do and have the function take a callback:
exports.function = function (a,b, callback) {
database.connect(params, function(err,get){
if(err) return callback(err, null); // don't suppress errors
get.query(querylang, function(err, results){
if(err) return callback(err, null); // don't suppress errors
var varsIwantToReturn = { var1: results[i].foo, var2: results[i].bar };
callback(null, varsIwantToReturn);
});
});
};
Then you'd call it like
myExportedFunction(myA,myB, function(err, resp){
if(err) recoverFromError(err);
// varsIWantToReturn are contained in `resp`
});
So basically I am making a database query, to get all posts with a certain id, then add them to a list, so I can return. But the list is returned, before the callback has finished.
How do I prevent it from being returned before callback has finished?
exports.getBlogEntries = function(opid) {
var list12 =[];
Entry.find({'opid' : opid}, function(err, entries) {
if(!err) {
console.log("adding");
entries.forEach( function(currentEntry){
list12.push(currentEntry);
});
}
else {
console.log("EEEERROOR");
}
//else {console.log("err");}
});
console.log(list12);
return list12;
};
ALL callback is asynchronous, so we don't have any guarantee if they will run exactly in the order we have leave them.
To fix it and make the process "synchronous" and guarantee an order executation you have two solutions:
First: make all process in nested list:
instead of this:
MyModel1.find({}, function(err, docsModel1) {
callback(err, docsModel1);
});
MyModel2.find({}, function(err, docsModel2) {
callback(err, docsModel2);
});
use this:
MyModel1.find({}, function(err, docsModel1) {
MyModel2.find({}, function(err, docsModel2) {
callback(err, docsModel1, docsModel2);
});
});
The last snippet above guarantee us that MyModel2 will be executed AFTER MyModel1 is executed.
Second: Use some framework as Async. This framework is awesome and have several helper functions to execute code in series, parallels, whatever way we want.
Example:
async.series(
{
function1 : function(callback) {
//your first code here
//...
callback(null, 'some result here');
},
function2 : function(callback) {
//your second code here (called only after the first one)
callback(null, 'another result here');
}
},
function(err, results) {
//capture the results from function1 and function2
//if function1 raise some error, function2 will not be called.
results.function1; // 'some result here'
results.function2; // 'another result here'
//do something else...
}
);
You could use sync database calls but that would work around the concept of node.js.
The proper way is to pass a callback to the function that queries the database and then call the provided callback inside the database query callback.
How do I prevent it from being returned before callback has finished?
The callback is asynchronous, and you cannot avoid that. Hence, you must not return a list.
Instead, offer a callback for when it's filled. Or return a Promise for the list. Example:
exports.getBlogEntries = function(opid, callback) {
Entry.find({'opid': opid}, callback); // yes, that's it.
// Everything else was boilerplate code
};
There is an alternate way to handle this scenario. You can use the async module and when the forEach has finished then make the return call. Please find the code snippet below for the same:
var async = requires('async');
exports.getBlogEntries = function(opid) {
var list12 =[];
Entry.find({'opid' : opid}, function(err, entries) {
if(!err) {
console.log("adding");
async.forEachSeries(entries,function(entry,returnFunction){
list12.push(entry);
},function(){
console.log(list12);
return list12;
});
}
else{
console.log("EEEERROOR");
}
});
};
I have a function that returns an array of items from MongoDB:
var getBooks = function(callback){
Comment.distinct("doc", function(err, docs){
callback(docs);
}
});
};
Now, for each of the items returned in docs, I'd like to execute another mongoose query, gather the count for specific fields, gather them all in a counts object, and finally pass that on to res.render:
getBooks(function(docs){
var counts = {};
docs.forEach(function(entry){
getAllCount(entry, ...){};
});
});
If I put res.render after the forEach loop, it will execute before the count queries have finished. However, if I include it in the loop, it will execute for each entry. What is the proper way of doing this?
I'd recommend using the popular NodeJS package, async. It's far easier than doing the work/counting, and eventual error handling would be needed by another answer.
In particular, I'd suggest considering each (reference):
getBooks(function(docs){
var counts = {};
async.each(docs, function(doc, callback){
getAllCount(entry, ...);
// call the `callback` with a error if one occured, or
// empty params if everything was OK.
// store the value for each doc in counts
}, function(err) {
// all are complete (or an error occurred)
// you can access counts here
res.render(...);
});
});
or you could use map (reference):
getBooks(function(docs){
async.map(docs, function(doc, transformed){
getAllCount(entry, ...);
// call transformed(null, theCount);
// for each document (or transformed(err); if there was an error);
}, function(err, results) {
// all are complete (or an error occurred)
// you can access results here, which contains the count value
// returned by calling: transformed(null, ###) in the map function
res.render(...);
});
});
If there are too many simultaneous requests, you could use the mapLimit or eachLimit function to limit the amount of simultaneous asynchronous mongoose requests.
forEach probably isn't your best bet here, unless you want all of your calls to getAllCount happening in parallel (maybe you do, I don't know — or for that matter, Node is still single-threaded by default, isn't it?). Instead, just keeping an index and repeating the call for each entry in docs until you're done seems better. E.g.:
getBooks(function(docs){
var counts = {},
index = 0,
entry;
loop();
function loop() {
if (index < docs.length) {
entry = docs[index++];
getAllCount(entry, gotCount);
}
else {
// Done, call `res.render` with the result
}
}
function gotCount(count) {
// ...store the count, it relates to `entry`...
// And loop
loop();
}
});
If you want the calls to happen in parallel (or if you can rely on this working in the single thread), just remember how many are outstanding so you know when you're done:
// Assumes `docs` is not sparse
getBooks(function(docs){
var counts = {},
received = 0,
outstanding;
outstanding = docs.length;
docs.forEach(function(entry){
getAllCount(entry, function(count) {
// ...store the count, note that it *doesn't* relate to `entry` as we
// have overlapping calls...
// Done?
--outstanding;
if (outstanding === 0) {
// Yup, call `res.render` with the result
}
});
});
});
In fact, getAllCount on first item must callback getAllCount on second item, ...
Two way: you can use a framework, like async : https://github.com/caolan/async
Or create yourself the callback chain. It's fun to write the first time.
edit
The goal is to have a mechanism that proceed like we write.
getAllCountFor(1, function(err1, result1) {
getAllCountFor(2, function(err2, result2) {
...
getAllCountFor(N, function(errN, resultN) {
res.sender tout ca tout ca
});
});
});
And that's what you will construct with async, using the sequence format.