queue.js: difference between array of functions & array of closures? - javascript

So, I am trying to make sure that a series of HTTP GET requests happen before I try to render the data gotten into a visualization. Typical deal, right?
I'm using queue.js, and seeing this on the queue.js github page (https://github.com/mbostock/queue):
Or, if you wanted to run a bazillion asynchronous tasks (here represented as an array of closures) serially:
var q = queue(1);
tasks.forEach(function(t) { q.defer(t); });
q.awaitAll(function(error, results) { console.log("all done!"); });
Queue.js can be run inside Node.js or in a browser.
So, what I did was made an array of functions, each of which contained a Meteor.http.get call (as I'm using Meteor.js) and then followed this line by line.
It seems like what is happening is that while my array -- which has 8 functions in it, all with what looks like the right function in each slot -- gets populated (and then passed as in the code exceprt to defer,) only one actually runs.
Here's what I'm wondering. Well, overall it's, why is only one function running though 8 are passed in to defer? But specifically it's -- having a hazy understanding of closures, I really have an array of functions. Is there something I missed there, since the documentation specifically says closures, which is why all the functions aren't executing?
Thank you for looking at this!

Here is perhaps the literal part of the statement you quoted, found in the test suite:
"queue of asynchronous closures, processed serially": {
topic: function() {
var tasks = [],
task = asynchronousTask(),
n = 10,
q = queue(1);
while (--n >= 0) tasks.push(task);
tasks.forEach(function(t) { q.defer(t); });
q.awaitAll(this.callback)
},
"does not fail": function(error, results) {
assert.isNull(error);
},
"executes all tasks in series": function(error, results) {
assert.deepEqual(results, [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]);
}
},
https://github.com/mbostock/queue/blob/master/test/queue-test.js#L103
Which runs task = asynchronousTask(), which is what is pulled off the queue and invoked
function asynchronousTask() {
var active = 0;
return function(callback) {
++active;
process.nextTick(function() {
try {
callback(null, active);
} finally {
--active;
}
});
};
}
The above, inner return function() {...} is what I believe is being referenced as a closure which retains it's reference in scope to the outer active variable as each asynchronous function is called off the queue.
This is, of course, fairly powerful in terms of callbacks and handlers, since it gives you the means to maintain and manipulate a locally shared variable, for instance if you want to know how many functions were returned, and when the list has been exhausted. In other words, a queue.
The following is not used in the example up top, but use it as a reference point to see how it differs from the synchronousTask function above.
function synchronousTask() {
var active = 0;
return function(callback) {
try {
callback(null, ++active);
} finally {
--active;
}
};
}
https://github.com/mbostock/queue/blob/master/test/queue-test.js#L265

Related

Asynchronous code and loops within Node.js?

I've done hours of research on asynchronous programming, but I just can't seem to grasp this one single concept within Node, so I was wondering if someone here could help me.
I've written the following code sample to return / output a simple string which is a concatenation of strings from an object:
var itemCollection = {
item1 : [{ foo : "bar" }, { foo : "bar" }, { foo : "bar" }],
item2 : [{ foo : "bar" }, { foo : "bar" }, { foo : "bar" }],
item3 : [{ foo : "bar" }, { foo : "bar" }, { foo : "bar" }]
}
var aString = "";
for(item in itemCollection){
for (var i = 0; i < itemCollection[item].length; i++) {
var anItem = itemCollection[item][i];
//someFunctionThatDoesALongIOOperation takes an item as a param, plus a callback.
someFunctionThatDoesALongIOOperation(anItem, function(dataBackFromThisFunction){
// Do something with the data returned from this function
aString += dataBackFromThisFunction.dataToAppend;
});
};
}
console.log(aString);
So from what I understand, languages other than Javascript would run someFunctionThatDoesALongIOOperation synchronously and the script would run in a 'blocking mode'. This would mean that the value aString would get returned / outputted with its correct value.
However, as Node runs asynchronously, code can continue to run at anytime and tasks may not complete in order. This is because of the way the event loop works in Node. I think I get this.
So this is where my question comes in. If I wanted the value aString to be returned / outputted with its correct value like it would in other languages, what would I need to do to the loops within my code example? Or to put my question in more technical words: What is the correct approach for making aString return the expected result, so that IO operations (which take a longer amount of time to run) aren't completed after the script has finished executing when aString has already been returned?
I hope my question makes sense, if it doesn't, please let me know and I will make edits where appropriate.
Thank you
Since the function you apply to each item is asynchronous, the loop that processes them also must be asynchronous (likewise the function that consumes the result of this loop must also be async). Check out Bob Nystrom's "What Color is Your Function?" for more insight on this particular point.
There's two ways to do this (both using caolan's async library to wrap all the nasty callback logic):
Do one async operation one at a time, waiting for the previous to finish before the next can begin. This is probably most similar to the way a traditional synchronous loop runs. We can do this with async.reduce:
async.reduce(itemCollection, "", function(memo, item, callback) {
someFunctionThatDoesALongIOOperation(item, function(dataBackFromThisFunction) {
callback(null, memo + dataBackFromThisFunction.dataToAppend);
});
}, function(err, result) {
var aString = result;
});
Of course, there's little point in having async code if we don't actually reap it's benefits and execute many things at once. We can do all the async operations in parallel and reduce all at once in a single step afterwards. I've found this is great if processing each item requires some long operation such as network I/O, since we can kick off and wait for many requests at once. We use async.map to achieve this:
async.map(itemCollection, function(item, cb) {
someFunctionThatDoesALongIOOperation(item, function(dataBackFromThisFunction) {
cb(null, dataBackFromThisFunction.dataToAppend);
});
}, function(err, results) {
var aString = results.join('');
});

How can I make node wait? or perhaps a different solution?

I am using https://github.com/gpittarelli/node-ssq to query of a bunch of TF2 game servers to find out if they are on, and if so, how many players are inside.
Once I find a server that is on and has less than 6 players in it, I want to use that server's Database ID to insert into somewhere else.
Code looks like this:
for (var i = 0;i < servers.length;i++) {
ssq.info(""+servers[i].ip, servers[i].port, function (err, data) {
serverData = deepCopy(data);
serverError = deepCopy(err);
});
if (!serverError) {
if (serverData.numplayers < 6){
//its ok
currentServer = servers[i].id;
i = 99999;
}
}
else {
if (i == servers.length-1){
currentServer = 666;
}
}
}
And then right after I insert into database with https://github.com/felixge/node-mysql .
If I put a console.log(serverData) in there, the info will show up in the console AFTER it inserted into the DB and did a couple other stuff.
So how do I "stop" node, or should I be looking at this / doing this differently?
Update:
A simple solution here is to just move your if statements inside the callback function:
for (var i = 0;i < servers.length;i++) {
ssq.info(""+servers[i].ip, servers[i].port, function (err, data) {
serverData = deepCopy(data);
serverError = deepCopy(err);
// moving inside the function, so we have access to defined serverData and serverError
if (!serverError) {
if (serverData.numplayers < 6){
//its ok
currentServer = servers[i].id;
i = 99999;
/* add an additional function here, if necessary */
}
}
else {
if (i == servers.length-1){
currentServer = 666;
/* add an additional function here, if necessary */
}
}
});
// serverData and serverError are undefined outside of the function
// because node executes these lines without waiting to see if ``ssq.info``
// has finished.
}
Any additional functions within the callback to ssq.info will have access to variables defined within that function. Do be careful with nesting too many anonymous functions.
Original (nodesque) Answer
If ssq.info is an Asynchronous function (which it seem it is), Node is going to immediately execute it and move on, only dealing with the callback function (which you passed as a last parameter) when ssq.info has finished. That is why your console.log statement is going to execute immediately. This is the beauty/terror of node's asynchronous nature : )
You can use setTimeout to make Node wait, but that will hold up every other process on your server.
The better solution, in my opinion, would be to make use of Node's Event Emiters, to:
watch for an event (in this case, when a player leaves a server)
Check if the number of players is less than 6
If so, execute your query function (using a callback)
A good primer on this is: Mixu's Node Book - Control Flow. Also, see this SO post.
You should use a callback,
connection.query('INSERT INTO table', function(err, rows, fields) {
if (err) throw err;
//entry should be inserted here.
});
also the http://www.sequelizejs.com/ library is a bit more matrue, it could be an implementation problem with node-mysql

Which is a better way of writing callbacks?

Just by seeing what I've wrote now, I can see that one is much smaller, so in terms of code golf Option 2 is the better bet, but as far as which is cleaner, I prefer Option 1. I would really love the community's input on this.
Option 1
something_async({
success: function(data) {
console.log(data);
},
error: function(error) {
console.log(error);
}
});
Option 2
something_async(function(error,data){
if(error){
console.log(error);
}else{
console.log(data);
}
});
They are not exactly the same. Option 2 will still log the (data), whereas Option 1 will only log data on success. (Edit: At least it was that way before you changed the code)
That said, Option 1 is more readable. Programming is not / should not be a competition to see who can write the fewest lines that do the most things. The goal should always be to create maintainable, extendable (if necessary) code --- in my humble opinion.
Many people will find option#1 easier to read and to maintain - two different callback functions for two different purposes. It is commonly used by all Promise Libraries, where two arguments will be passed. Of course, the question Multiple arguments vs. options object is independent from that (while the object is useful in jQuery.ajax, it doesn't make sense for promise.then).
However, option#2 is Node.js convention (see also NodeGuide) and used in many libraries that are influenced by it, for example famous async.js. However, this convention is discussable, top google results I found are WekeRoad: NodeJS Callback Conventions and Stackoverflow: What is the suggested callback style for Node.js libraries?.
The reason for the single callback function with an error argument is that it always reminds the developer to handle errors, which is especially important in serverside applications. Many beginners at clientside ajax functions don't care forget about error handling for example, asking themselves why the success callback doesn't get invoked. On the other hand, promises with then-chaining are based on the optionality of error callbacks, propagating them to the next level - of course it still needs to be catched there.
In all honesty, I prefer to take them one step further, into Promises/Futures/Deferreds/etc...
Or (/and) go into a "custom event" queue, using a Moderator (or an observer/sub-pub, if there is good reason for one particular object to be the source for data).
This isn't a 100% percent of the time thing. Sometimes, you just need a single callback. However, if you have multiple views which need to react to a change (in model data, or to visualize user-interaction), then a single callback with a bunch of hard-coded results isn't appropriate.
moderator.listen("my-model:timeline_update", myView.update);
moderator.listen("ui:data_request", myModel.request);
button.onclick = function () { moderator.notify("ui:data_request", button.value); }
Things are now much less dependent upon one big callback and you can mix and match and reuse code.
If you want to hide the moderator, you can make it a part of your objects:
var A = function () {
var sys = null,
notify = function (msg, data) {
if (sys && sys.notify) { sys.notify(msg, data); }
},
listen = function (msg, callback) {
if (sys && sys.listen) { sys.listen(msg, callback); }
},
attach = function (messenger) { sys = messenger; };
return {
attach : attach
/* ... */
};
},
B = function () { /* ... */ },
shell = Moderator(),
a = A(),
b = B();
a.attach(shell);
b.attach(shell);
a.listen("do something", a.method.bind(a));
b.notify("do something", b.property);
If this looks a little familiar, it's similar behaviour to, say Backbone.js (except that they extend() the behaviour onto objects, and others will bind, where my example has simplified wrappers to show what's going on).
Promises would be the other big-win for usability, maintainable and easy to read code (as long as people know what a "promise" is -- basically it passes around an object which has the callback subscriptions).
// using jQuery's "Deferred"
var ImageLoader = function () {
var cache = {},
public_function = function (url) {
if (cache[url]) { return cache[url].promise(); }
var img = new Image(),
loading = $.Deferred(),
promise = loading.promise();
img.onload = function () { loading.resolve(img); };
img.onerror = function () { loading.reject("error"); };
img.src = url;
cache[url] = loading;
return promise;
};
return public_function;
};
// returns promises
var loadImage = ImageLoader(),
myImg = loadImage("//site.com/img.jpg");
myImg.done( lightbox.showImg );
myImg.done( function (img) { console.log(img.width); } );
Or
var blog_comments = [ /* ... */ ],
comments = BlogComments();
blog_comments.forEach(function (comment) {
var el = makeComment(comment.author, comment.text),
img = loadImage(comment.img);
img.done(el.showAvatar);
comments.add(el);
});
All of the cruft there is to show how powerful promises can be.
Look at the .forEach call there.
I'm using Image loading instead of AJAX, because it might seem a little more obvious in this case:
I can load hundreds of blog comments, if the same user makes multiple posts, the image is cached, and if not, I don't have to wait for images to load, or write nested callbacks. Images load in any order, but still appear in the right spots.
This is 100% applicable to AJAX calls, as well.
Promises have proven to be the way to go as far as async and libraries like bluebird embrace node-style callbacks (using the (err, value) signature). So it seems beneficial to utilize node-style callbacks.
But the examples in the question can be easily be converted into either format with the functions below. (untested)
function mapToNodeStyleCallback(callback) {
return {
success: function(data) {
return callback(null, data)
},
error: function(error) {
return callback(error)
}
}
}
function alterNodeStyleCallback(propertyFuncs) {
return function () {
var args = Array.prototype.slice.call(arguments)
var err = args.shift()
if (err) return propertyFuncs.err.apply(null, [err])
return propertyFuncs.success.apply(null, args)
}
}

How to deal with async function results in JavaScript

Coming from a c# background, I'm probably looking at JavaScript from a completely wrong perspective, so please bear with me.
Leaving the advantages of async aside for a minute,
let's say I simply want to retreive a value from an SQLite database in an HTML5 page.
What I want to see is something like
var something = db.getPicture(1);
Now consider a (perhaps very naive) implementation of this:
this.getPicture(id)
{
this.database.transaction(function(tx)
{
tx.executeSql('SELECT ......', null, function(tx, results)
{
if (results.rows.length == 1)
return results.rows.items(0).Url; //This of course does not resturn
//anything to the caller of .getPicture(id)
}
},
function(error)
{
//do some error handling
},
function(tx)
{
//no error
});
}
First off, it's one big mess of nested functions and second... there's no way for me to return the result I got from the database as the value of the .getPicture() function.
And this is the easy version, what if I wanted to retreive an index from a table first,
then use that index in the next query and so on...
Is this normal for JavaScript developers, am I doing it completely wrong, is there a solution, etc...
The basic pattern to follow in JavaScript (in asynchronous environments like a web browser or Node.js) is that the work you need to do when an operation is finished should happen in the "success" callback that the API provides. In your case, that'd be the function passed in to your "executeSql()" method.
this.getPicture = function(id, whenFinished)
{
this.database.transaction(function(tx)
{
tx.executeSql('SELECT ......', null, function(tx, results)
{
if (results.rows.length == 1)
whenFinished(results.rows.items(0).Url);
}
},
In that setup, the result of the database operation is passed as a parameter to the function provided when "getPicture()" was invoked.
Because JavaScript functions form closures, they have access to the local variables in the calling context. That is, the function you pass in to "getPicture()" as the "whenFinished" parameters will have access to the local variables that were live at the point "getPicture()" is called.

Coordinating parallel execution in node.js

The event-driven programming model of node.js makes it somewhat tricky to coordinate the program flow.
Simple sequential execution gets turned into nested callbacks, which is easy enough (though a bit convoluted to write down).
But how about parallel execution? Say you have three tasks A,B,C that can run in parallel and when they are done, you want to send their results to task D.
With a fork/join model this would be
fork A
fork B
fork C
join A,B,C, run D
How do I write that in node.js ? Are there any best practices or cookbooks? Do I have to hand-roll a solution every time, or is there some library with helpers for this?
Nothing is truly parallel in node.js since it is single threaded. However, multiple events can be scheduled and run in a sequence you can't determine beforehand. And some things like database access are actually "parallel" in that the database queries themselves are run in separate threads but are re-integrated into the event stream when completed.
So, how do you schedule a callback on multiple event handlers? Well, this is one common technique used in animations in browser side javascript: use a variable to track the completion.
This sounds like a hack and it is, and it sounds potentially messy leaving a bunch of global variables around doing the tracking and in a lesser language it would be. But in javascript we can use closures:
function fork (async_calls, shared_callback) {
var counter = async_calls.length;
var callback = function () {
counter --;
if (counter == 0) {
shared_callback()
}
}
for (var i=0;i<async_calls.length;i++) {
async_calls[i](callback);
}
}
// usage:
fork([A,B,C],D);
In the example above we keep the code simple by assuming the async and callback functions require no arguments. You can of course modify the code to pass arguments to the async functions and have the callback function accumulate results and pass it to the shared_callback function.
Additional answer:
Actually, even as is, that fork() function can already pass arguments to the async functions using a closure:
fork([
function(callback){ A(1,2,callback) },
function(callback){ B(1,callback) },
function(callback){ C(1,2,callback) }
],D);
the only thing left to do is to accumulate the results from A,B,C and pass them on to D.
Even more additional answer:
I couldn't resist. Kept thinking about this during breakfast. Here's an implementation of fork() that accumulates results (usually passed as arguments to the callback function):
function fork (async_calls, shared_callback) {
var counter = async_calls.length;
var all_results = [];
function makeCallback (index) {
return function () {
counter --;
var results = [];
// we use the arguments object here because some callbacks
// in Node pass in multiple arguments as result.
for (var i=0;i<arguments.length;i++) {
results.push(arguments[i]);
}
all_results[index] = results;
if (counter == 0) {
shared_callback(all_results);
}
}
}
for (var i=0;i<async_calls.length;i++) {
async_calls[i](makeCallback(i));
}
}
That was easy enough. This makes fork() fairly general purpose and can be used to synchronize multiple non-homogeneous events.
Example usage in Node.js:
// Read 3 files in parallel and process them together:
function A (c){ fs.readFile('file1',c) };
function B (c){ fs.readFile('file2',c) };
function C (c){ fs.readFile('file3',c) };
function D (result) {
file1data = result[0][1];
file2data = result[1][1];
file3data = result[2][1];
// process the files together here
}
fork([A,B,C],D);
Update
This code was written before the existence of libraries like async.js or the various promise based libraries. I'd like to believe that async.js was inspired by this but I don't have any proof of it. Anyway.. if you're thinking of doing this today take a look at async.js or promises. Just consider the answer above a good explanation/illustration of how things like async.parallel work.
For completeness sake the following is how you'd do it with async.parallel:
var async = require('async');
async.parallel([A,B,C],D);
Note that async.parallel works exactly the same as the fork function we implemented above. The main difference is it passes an error as the first argument to D and the callback as the second argument as per node.js convention.
Using promises, we'd write it as follows:
// Assuming A, B & C return a promise instead of accepting a callback
Promise.all([A,B,C]).then(D);
I believe that now the "async" module provides this parallel functionality and is roughly the same as the fork function above.
The futures module has a submodule called join that I have liked to use:
Joins asynchronous calls together similar to how pthread_join works for threads.
The readme shows some good examples of using it freestyle or using the future submodule using the Promise pattern. Example from the docs:
var Join = require('join')
, join = Join()
, callbackA = join.add()
, callbackB = join.add()
, callbackC = join.add();
function abcComplete(aArgs, bArgs, cArgs) {
console.log(aArgs[1] + bArgs[1] + cArgs[1]);
}
setTimeout(function () {
callbackA(null, 'Hello');
}, 300);
setTimeout(function () {
callbackB(null, 'World');
}, 500);
setTimeout(function () {
callbackC(null, '!');
}, 400);
// this must be called after all
join.when(abcComplete);
A simple solution might be possible here: http://howtonode.org/control-flow-part-ii scroll to Parallel actions. Another way would be to have A,B, and C all share the same callback function, have that function have an global or at least out-of-the-function incrementor, if all three have called the callback then let it run D, ofcourse you will have to store the results of A,B, and C somewhere as well.
Another option could be the Step module for Node: https://github.com/creationix/step
You may want to try this tiny library: https://www.npmjs.com/package/parallel-io
In addition to popular promises and async-library, there is 3rd elegant way - using "wiring":
var l = new Wire();
funcA(l.branch('post'));
funcB(l.branch('comments'));
funcC(l.branch('links'));
l.success(function(results) {
// result will be object with results:
// { post: ..., comments: ..., links: ...}
});
https://github.com/garmoshka-mo/mo-wire

Categories