I have a list of tasks that I want to run in parallel using https://github.com/caolan/async.
I want the program to proceed (probably through a callback) after the first of these parallel tasks is complete, not all of them. So I don't think the naive
async.parallel([task1, task2], callback)
works for me.
Alternatively I could spawn two tasks and cancel the incomplete one, but I can't figure out how to do that using async either.
Thanks!
-Charlie
Parallel Race
You can get async to initiate the final callback by returning an error that evaluates as true but isn't actually an error.
I've put together an example that uses -1 as an error code. In the final callback I check the error value and if it's not -1 then it's an actual error. If the error value is -1 then we'll have a valid value in results. At that point, we just need to remove extra elements from results of the other async functions that have not completed yet.
In the below example I've used the request module to pull html pages and the underscore module to filter the results in the final callback.
var request = require('request');
var _ = require('underscore');
exports.parallel = function(req, res) {
async.parallel([
/* Grab Google.jp */
function(callback) {
request("http://google.jp", function(err, response, body) {
if(err) { console.log(err); callback(true); return; }
callback(-1,"google.jp");
});
},
/* Grab Google.com */
function(callback) {
request("http://google.com", function(err, response, body) {
if(err) { console.log(err); callback(true); return; }
callback(-1,"google.com");
});
}
],
/* callback handler */
function(err, results) {
/* Actual error */
if(err && err!=-1) {
console.log(err);
return;
}
/* First data */
if(err===-1) {
/*
* async#parallel returns a list, one element per parallel function.
* Functions that haven't finished yet are in the list as undefined.
* use underscore to easily filter the one result.
*/
var one = _.filter(results, function(x) {
return (x===undefined ? false : true);
})[0];
console.log(results);
console.log(one);
res.send(one);
}
}
);
};
Remaining Function Results
When you setup async#parallel to work like this you won't have access to the results of the other asynchronous functions. If you're only interested in the first one to respond then this isn't a problem. However, you will not be able to cancel the other requests. That's most likely not a problem, but it might be a consideration.
The async.parallel documentation says:
If any of the functions pass an error to its callback, the main callback is immediately called
with the value of the error.
So you could return an error object from all of your parallel functors, and the first one to finish would jump you to the completion callback. Perhaps even your own special error class, so you can tell the difference between an actual error and a "hey I won" error.
Having said that, you would still have your parallel functions running, potentially waiting for callbacks to complete or whatever. Perhaps you could use async.parallelLimit to make sure you're not firing off too many tasks in parallel ?
Having said all that, it's possible you are better served by trying another method from the async library for this task - firing off parallel tasks then having these tasks race each other may not be the best idea.
Related
Admittedly I'm a novice with node, but it seems like this should be working fine. I am using multiparty to parse a form, which returns an array. I am then using a for each to step through the array. However - the for each is not waiting for the inner code to execute. I am a little confused as to why it is not, though.
var return_GROBID = function(req, res, next) {
var form = new multiparty.Form();
var response_array = [];
form.parse(req, function(err, fields, files) {
files.PDFs.forEach(function (element, index, array) {
fs.readFile(element.path, function (err, data) {
var newPath = __dirname + "/../public/PDFs/" + element.originalFilename;
fs.writeFile(newPath, data, function (err) {
if(err) {
res.send(err);
}
GROBIDrequest.GROBID2js(newPath, function(response) {
response_array.push(response);
if (response_array.length == array.length) {
res.locals.body = response_array;
next();
}
});
});
});
});
});
}
If someone can give me some insight on the proper way to do this that would be great.
EDIT: The mystery continues. I ran this code on another machine and IT WORKED. What is going on? Why would one machine be inconsistent with another?
I'd guess the PDFs.forEach is you just calling the built-in forEach function, correct?
In Javascript many things are asynchronous - meaning that given:
linea();
lineb();
lineb may be executed before linea has finished whatever operation it started (because in asynchronous programming, we don't wait around until a network request comes back, for example).
This is different from other programming languages: most languages will "block" until linea is complete, even if linea could take time (like making a network request). (This is called synchronous programming).
With that preamble done, back to your original question:
So forEach is a synchronous function. If you rewrote your code like the following, it would work (but not be useful):
PDFs.forEach(function (element, index, array) {
console.log(element.path)
}
(console.log is a rare synchronous method in Javascript).
But in your forEach loop you have fs.readFile. Notice that last parameter, a function? Node will call that function back when the operation is complete (a callback).
Your code will currently, and as observed, hit that fs.readFile, say, "ok, next thing", and move on to the next item in the loop.
One way to fix this, with the least changing the code, is to use the async library.
async.forEachOf(PDFs, function(value, key, everythingAllDoneCallback) {
GROBIDrequest.GROBID2js(newPath, function(response) {
response_array.push(response);
if (response_array.length = array.length) {
...
}
everythingAllDoneCallback(null)
} );
With this code you are going through all your asynchronous work, then triggering the callback when it's safe to move on to the next item in the list.
Node and callbacks like this are a very common Node pattern, it should be well covered by beginner material on Node. But it is one of the most... unexpected concepts in Node development.
One resource I found on this was (one from a set of lessons) about NodeJS For Beginners: Callbacks. This, and playing around with blocking (synchronous) and non-blocking (asynchronous) functions, and hopefully this SO answer, may provide some enlightenment :)
Lets imagine an asynchronous function that loads a file first and does something asynchronously with it afterwards. The function can't continue without the file, so my assumption is that loading this file could be done synchronously (*):
const asyncFnWithSyncCode(filePath, next) {
// Load file
const file = fs.readFileSync(filePath)
// Continue to process file with async functions
// ...
next(null, processedFile)
}
asyncFnWithSyncCode could be called several times for different files:
async.parallel([
(done) => { asyncFnWithSyncCode('a.json', done) },
(done) => { asyncFnWithSyncCode('b.json', done) },
(done) => { asyncFnWithSyncCode('c.json', done) }
], next)
My question is: How does this impact the performance? Will the sync function cause the other readFileSyncs to be delayed? Will it have an impact at all?
Best-practices, resources and opinions are welcome. Thanks!
(*) I know that I could simply use the async readFile-version, but I would really like to know how it works in this special construction.
Will the sync function cause the other readFileSyncs to be delayed?
Yes. NodeJS runs all of your JavaScript code on a single thread, using an event loop (job queue), which is one of the reasons that using asynchronous system calls is strongly encouraged over synchronous ones.
readFile schedules the read operation and then lets other things happen on the JavaScript thread while the I/O layer is waiting for the data to come in; Node's I/O layer queues a task for the JavaScript thread when data is available, which is what ultimately makes your readFile callback get called.
In contrast, readFileSync holds up that one single JavaScript thread, waiting for the file data to become available. Since there's only one thread, that holds up everything else your code might otherwise be doing, including other readFileSync calls.
Your code doesn't need to use readFileSync (you almost never do); just use readFile's callback:
const asyncFnWithSyncCode(filePath, next) {
// Load file
fs.readFile(filePath, function(err, file) {
if (err) {
// ...handle error...
// ...continue if appropriate:
next(err, null);
} else {
// ...use `file`...
// Continue to process file with async functions
// ...
next(null, processedFile);
}
});
}
I have a phonegap application and I want it to upload multiple files to a server which is an async operation. However, because of bandwidth concerns I want to upload the files sequentially and alert the user to the progress.
However, I'm a bit stuck. Since the api is non-blocking and I am (somewhat) attempting to block, I'm not sure exactly how to do this.
I need to do something like this:
files[0].upload().done = function() {
files[1].upload().done = function() {
files[2].upload().done = function() {
files[3].....files[n]
}
}
}
How can I do this? At this time, I don't care about failed uploads.
In JavaScript you can define a function that returns a function.
function get_callback(index){
return function(){
// TODO: check if files[index] exists
files[index].upload().done = get_callback(index+1);
}
}
files[0].upload().done = get_callback(1);
You could use a Promises/Futures library, such as FuturesJS and one of its components, Sequence.
The sequence module allows one to chain asynchronous functions through callbacks. You first need to create the sequence, then append as many callbacks as you need. Every callback to the sequence object receives at least two arguments, next and err. You need to call next when the asynchronous function ends.
Something like this should work:
var Sequence = require('sequence').Sequence,
sequence = Sequence.create();
sequence
.then(function (next, err) {
if (err) {...}
files[1].upload(next);
})
.then(function (next, err) {
if (err) {...}
files[2].upload(next);
})
.then(...)
.then(function (next, err) {
console.log('all files uploaded');
});
Your upload function must receive a callback in order to the code above to work. If it doesn't, just change it in the following way:
var upload = function (callback) {
//the rest of your code
//at the very end
callback();
}
I'm using node.js and the async package.
Here's the code I have:
async.waterfall(
[
function(callback) {
var data = getSomeData();
callback(null, data);
},
function(data, callback) {
someFunctionThatNeedsData(data);
callback(null, 'done');
}
],
function(err, result) {
}
);
getSomeData has an asynchronous HTTP request that grabs some data from a web service. I'd like to wait until I get a response, and then return that data and pass it to someFunctionThatNeedsData.
What I expected was that getSomeData -- including the callback inside of it -- would have to complete before moving on to invoke someFunctionThatNeedsData.
The problem is that, despite using the waterfall function here, data is undefined by the time it gets to someFunctionThatNeedsData.
Additionally, from console.log I can see that the end of getSomeData is reached before the callback inside of getSomeData even begins.
Am I using waterfall incorrectly, or is it just not the right tool here? If it's just not right, what can I use to achieve the desired effect?
Or do I have to resign to having deeply nested callbacks (which, with future work, I will) and have to just mitigate it by extracting inline code into named functions?
getSomeData() has an asynchronous http request that grabs some data from a web service.
This is the issue. The execution flow already continued to the callback and executed it. This is how asynchronous functions work!
You have to pass the callback to getSomeData, which calls it once the HTTP request finished. So yes: You may need to nest the callbacks.
If you have async operation. You don't necessary to use async.waterfall. You could just do that in a promise chain style.
getSomeData().then(function(data)
{
var changeData = changeYourData(data);
return changeData;
}).then(function(changedData)
{
// some more stuff with it. You can keep on forwarding to the next `then`
}).catch(function(err)
{
// if any error throw at any point will get catch here
}).finally(function()
{
// this one will guarantee get call no matter what,
// exactly the same like async.waterfall end of chain callback
});
This example will work with Q, When, and any promise lib that follow standard.
If you need to use async.waterfall (because you could drive it with an Array.map)
You just need to callback in your then
async.waterfall(
[
function(callback) {
// A
getSomeData().then(function(data)
{
callback(null, data);
});
// B - just throw the whole thing in
callback(null , getSomeData());
},
function(data, callback) {
// A
someFunctionThatNeedsData(data);
// B
data.then(function(resolvedData)
{
someFunctionThatNeedsData(resolvedData);
callback(null, 'done');
});
}
],
function(err, result) {
});
Hope this help.
Suppose I have code, like this
function execute() {
var tasks = buildListOfTasks();
// ...
}
buildListOfTask creates array of functions. Functions are async, might issue HTTP requests or/and perform db operations.
If tasks list appears empty or all tasks are executed, I need to repeat same execute routine again. And again, in say "infinite loop". So, it's daemon like application.
I could quite understand how to accomplish that in sync-world, but bit confused how to make it possible in node.js async-world.
use async.js and it's queue object.
function runTask(task, callback) {
//dispatch a single asynchronous task to do some real work
task(callback);
}
//the 10 means allow up to 10 in parallel, then start queueing
var queue = async.queue(runTask, 10);
//Check for work to do and enqueue it
function refillQueue() {
buildListOfTasks().forEach(function (task) {
queue.push(task);
});
}
//queue will call this whenever all pending work is completed
//so wait 100ms and check again for more arriving work
queue.drain = function() {
setTimeout(refillQueue, 100);
};
//start things off initially
refillQueue();
If you're already familiar with libraries like async, you can use the execute() as the final callback to restart the tasks:
function execute(err) {
if (!err) {
async.series(buildListOfTasks(), execute);
} else {
// ...
}
}
I think you have to use async.js, probably the parallel function. https://github.com/caolan/async#parallel
In the global callback, just call execute to make a recursive call.
async.parallel(tasks,
function(err, results){
if(!err) execute();
}
);