Coordinating parallel execution in node.js

Coordinating parallel execution in node.js - javascript

The event-driven programming model of node.js makes it somewhat tricky to coordinate the program flow.
Simple sequential execution gets turned into nested callbacks, which is easy enough (though a bit convoluted to write down).
But how about parallel execution? Say you have three tasks A,B,C that can run in parallel and when they are done, you want to send their results to task D.
With a fork/join model this would be
fork A
fork B
fork C
join A,B,C, run D
How do I write that in node.js ? Are there any best practices or cookbooks? Do I have to hand-roll a solution every time, or is there some library with helpers for this?

Nothing is truly parallel in node.js since it is single threaded. However, multiple events can be scheduled and run in a sequence you can't determine beforehand. And some things like database access are actually "parallel" in that the database queries themselves are run in separate threads but are re-integrated into the event stream when completed.
So, how do you schedule a callback on multiple event handlers? Well, this is one common technique used in animations in browser side javascript: use a variable to track the completion.
This sounds like a hack and it is, and it sounds potentially messy leaving a bunch of global variables around doing the tracking and in a lesser language it would be. But in javascript we can use closures:
function fork (async_calls, shared_callback) {
var counter = async_calls.length;
var callback = function () {
counter --;
if (counter == 0) {
shared_callback()
}
}
for (var i=0;i<async_calls.length;i++) {
async_calls[i](callback);
}
}
// usage:
fork([A,B,C],D);
In the example above we keep the code simple by assuming the async and callback functions require no arguments. You can of course modify the code to pass arguments to the async functions and have the callback function accumulate results and pass it to the shared_callback function.
Additional answer:
Actually, even as is, that fork() function can already pass arguments to the async functions using a closure:
fork([
function(callback){ A(1,2,callback) },
function(callback){ B(1,callback) },
function(callback){ C(1,2,callback) }
],D);
the only thing left to do is to accumulate the results from A,B,C and pass them on to D.
Even more additional answer:
I couldn't resist. Kept thinking about this during breakfast. Here's an implementation of fork() that accumulates results (usually passed as arguments to the callback function):
function fork (async_calls, shared_callback) {
var counter = async_calls.length;
var all_results = [];
function makeCallback (index) {
return function () {
counter --;
var results = [];
// we use the arguments object here because some callbacks
// in Node pass in multiple arguments as result.
for (var i=0;i<arguments.length;i++) {
results.push(arguments[i]);
}
all_results[index] = results;
if (counter == 0) {
shared_callback(all_results);
}
}
}
for (var i=0;i<async_calls.length;i++) {
async_calls[i](makeCallback(i));
}
}
That was easy enough. This makes fork() fairly general purpose and can be used to synchronize multiple non-homogeneous events.
Example usage in Node.js:
// Read 3 files in parallel and process them together:
function A (c){ fs.readFile('file1',c) };
function B (c){ fs.readFile('file2',c) };
function C (c){ fs.readFile('file3',c) };
function D (result) {
file1data = result[0][1];
file2data = result[1][1];
file3data = result[2][1];
// process the files together here
}
fork([A,B,C],D);
Update
This code was written before the existence of libraries like async.js or the various promise based libraries. I'd like to believe that async.js was inspired by this but I don't have any proof of it. Anyway.. if you're thinking of doing this today take a look at async.js or promises. Just consider the answer above a good explanation/illustration of how things like async.parallel work.
For completeness sake the following is how you'd do it with async.parallel:
var async = require('async');
async.parallel([A,B,C],D);
Note that async.parallel works exactly the same as the fork function we implemented above. The main difference is it passes an error as the first argument to D and the callback as the second argument as per node.js convention.
Using promises, we'd write it as follows:
// Assuming A, B & C return a promise instead of accepting a callback
Promise.all([A,B,C]).then(D);

I believe that now the "async" module provides this parallel functionality and is roughly the same as the fork function above.

The futures module has a submodule called join that I have liked to use:
Joins asynchronous calls together similar to how pthread_join works for threads.
The readme shows some good examples of using it freestyle or using the future submodule using the Promise pattern. Example from the docs:
var Join = require('join')
, join = Join()
, callbackA = join.add()
, callbackB = join.add()
, callbackC = join.add();
function abcComplete(aArgs, bArgs, cArgs) {
console.log(aArgs[1] + bArgs[1] + cArgs[1]);
}
setTimeout(function () {
callbackA(null, 'Hello');
}, 300);
setTimeout(function () {
callbackB(null, 'World');
}, 500);
setTimeout(function () {
callbackC(null, '!');
}, 400);
// this must be called after all
join.when(abcComplete);

A simple solution might be possible here: http://howtonode.org/control-flow-part-ii scroll to Parallel actions. Another way would be to have A,B, and C all share the same callback function, have that function have an global or at least out-of-the-function incrementor, if all three have called the callback then let it run D, ofcourse you will have to store the results of A,B, and C somewhere as well.

Another option could be the Step module for Node: https://github.com/creationix/step

You may want to try this tiny library: https://www.npmjs.com/package/parallel-io

In addition to popular promises and async-library, there is 3rd elegant way - using "wiring":
var l = new Wire();
funcA(l.branch('post'));
funcB(l.branch('comments'));
funcC(l.branch('links'));
l.success(function(results) {
// result will be object with results:
// { post: ..., comments: ..., links: ...}
});
https://github.com/garmoshka-mo/mo-wire

Related

How to monkey patch a recursive function

I'm using a library (playcanvas) that exposes a function clone() that is called recursively for all the nodes in a hierarchy.
If I monkey patch the function to execute some additional code, this will be executed multiple times.
Instead, I need to execute my code at the end of the whole recursive calls, but I can't find a way to do it.
pc.Entity.prototype.clone = function() {
... some code
// then for each child it calls itself
}
If I try this way I get "my stuff" executed multiple times.
pc.Entity.prototype.cloneOriginal = pc.Entity.prototype.clone;
pc.Entity.prototype.clone = function() {
var c = this.cloneOriginal();
// do my stuff
return c;
}
I need to "override" the clone method so that after all its recursive calls, I can execute my code.

You can achieve that by temporarily restoring the original function before launching it. And when it is finished, you set your trap again, and perform your post processing:
const orig_clone = pc.Entity.prototype.clone; // Save original clone
// Set trap:
pc.Entity.prototype.clone = function patched_clone(...args) {
pc.Entity.prototype.clone = orig_clone; // Restore original function
let result = this.clone(...args); // Execute it
// All is done, including recursion.
pc.Entity.prototype.clone = patched_clone; // Set trap again
// Your code comes here
console.log('post processing');
return result;
}

I'd still go with a simple flag to determine wether I'm inside a recursion or not.
//no need to store that function on the prototype itself
const cloneOriginal = pc.Entity.prototype.clone;
let inRecursion = false;
pc.Entity.prototype.clone = function() {
//just pass through the call to the original function.
if(inRecursion)
return cloneOriginal.call(this);
inRecursion = true;
var c = cloneOriginal.call(this);
inRecursion = false;
// do my stuff
return c;
}
inRecursion is a flag, specific for this single implementation. Maybe you want to wrap this code in a block or an iife to ensure that the variables are not accessible from outside of your clone-method.
could you point me to some more info about the optimization you are speaking about. What should I google?
You'll find most about v8 optimizations on google, but most modern browsers do similar stuff. I just googled and came across this article Performance Tips for JavaScript in V8. It's a bit older, but I think it's a good point to start getting an understanding on the kind of optimizations that JS engines do to your code to make it faster.
But as the article it mentions, don't loose yourself in (pointless) optimizations.

Understanding callback function purpose

I realise this is more of a general question, but I've read through similar answers on here but I can't find more of an overview. I'm new to callbacks and I'm trying to understand when they should be used.
The MDN web docs has this example;
function greeting(name) {
alert('Hello ' + name);
}
function processUserInput(callback) {
var name = prompt('Please enter your name.');
callback(name);
}
processUserInput(greeting);
However I'm struggling to see how this is more beneficial than the following, where I'm not passing the greeting function as a parameter?
function greeting(name) {
alert('Hello ' + name);
}
function processUserInput() {
var name = prompt('Please enter your name.');
greeting(name);
}
processUserInput();

As Javascript is async, sometimes it is difficult to handle response from non-blocking functions, for ex. if you are making an ajax call then it'll be executed asynchronously and results will be returned sometime later, by that time the main execution flow will pass the ajax code and starts executing following statements, in that case, its very difficult to catch the response to process further.
To handle those cases, callbacks comes into picture where you pass a function as the parameter to the ajax function and once the response is returned then call the callback by passing response data as a parameter to process further.
more info here http://callbackhell.com/

In simple terms you can say a callback is a way of asking a question (or requesting a task) in advance, i.e. when you're done with this do this (usually with the result). The whole point is to set aside functions that are to be done later, usually because you don't have the required inputs to do them now.
The 2 main differences between your implementation and the MDN one is that yours is harder to maintain and harder to reason about hence test.
1. Maintanance / Reusability
Imagine you're a few thousand lines of code into a code base then you're required to change what processUserInput() does. Its much easier to change or write a new callback function instead of changing the function processUserInput(). This would be evident if processUserInput was a bit more complicated. This also means the MDN one is much more useful in various scenarios unlike your implementation. You can reuse it in different situations like saying good bye, capitalizing names etc simply by writing different callbacks to plug into processUserInput().
2. Testing / Easier to reason about
The MDN implementation is much more easier to understand. Its easier to assume that the function processUserInput(greeting) will probably return a greeting than it is to assume what processUserInput() does. This makes it easier to test because you can always be sure the MDN implementation will always return the same output given an input.

Callbacks can be extremely useful depending on the circumstances; for example, when working with JavaScript for Google Chrome browser extension development, a callback can be used for intercepring web requests once it has been setup.
The purpose of a callback in general is to have the callback routine executed upon a trigger - the trigger being an event of some kind. Usually, functionality follows an interface of chained APIs. By implementing callback support, you can redirect execution flow during a stage of an operation. Callbacks are especially useful to third-party developers when dealing with someone elses library depending on what they are trying to do. Think of them like a notification system.
Functions in general taking in parameters is useful for flexibility and maintenance. If you use different functions for different things, the functions can be simply re-used over and over again to provide different functionality - whilst still preventing bloating the source code with more-or-less the same code over and over again. At the same time, if you use functions to your own library and a bug shows up, you can simply patch it for the one function and then it will be solved.
In your example, your passing a callback routine to the function you're calling - the function you're calling will call the callback function and pass the correct parameters. This is flexible because it allows you to have a callback routine called for printing the contents of the variable, and another for calculating the length of the string passed in, or another for logging it somewhere, etc. It allows you to re-use the function you setup, and have a different function called with the correct parameters without re-making the original function.

This example is not appropriate for understanding callbacks
In simple Language callbacks functions are used when we have to do some stuff after or in response of some other event or function or expression.
i.e when the parent function completes its execution then callback gets executed.
simple Example
function hungerStatus(status,cb){
return cb(status)
}
function whatToDo(status){
return status ? "order Pizza" : "lets play"
}
hungerStatus(false,whatToDo)
Another example
// global variable
var allUserData = [];

// generic logStuff function that prints to console
function logStuff (userData) {
if ( typeof userData === "string")
{
console.log(userData);
}
else if ( typeof userData === "object")
{
for (var item in userData) {
console.log(item + ": " + userData[item]);
}

}

}

// A function that takes two parameters, the last one a callback function
function getInput (options, callback) {
allUserData.push (options);
callback (options);

}

// When we call the getInput function, we pass logStuff as a parameter.
// So logStuff will be the function that will called back (or executed) inside the getInput function
getInput ({name:"Rich", speciality:"JavaScript"}, logStuff);
refer callback exaplanation

Difference between using callbacks and calling methods

I'm very new to Javascript programming and was researching ways in dealing with asynchronous functions. I came across really helpful resource which lists this as an example:
var fs = require('fs')
var myNumber = undefined
function addOne(callback) {
fs.readFile('number.txt', function doneReading(err, fileContents) {
myNumber = parseInt(fileContents)
myNumber++
callback()
})
}
function logMyNumber() {
console.log(myNumber)
}
addOne(logMyNumber)
However could you achieve the same result doing this:
var fs = require('fs')
var myNumber = undefined
function addOne() {
fs.readFile('number.txt', function doneReading(err, fileContents) {
myNumber = parseInt(fileContents)
myNumber++
logMyNumber()
})
}
function logMyNumber() {
console.log(myNumber)
}
addOne()
And if you can, what would be the purpose/advantage of using callbacks?
For those interested the article came from here: https://github.com/maxogden/art-of-node#callbacks

When we use call back it depend on situation in to make things dynamic or make sure that a code of piece run after one is complete.in your current code already describe callbacks
your first example clearly state how we define callbacks.
In computer programming, a callback is a piece of executable code that is passed as an argument to other code, which is expected to call back (execute) the argument at some convenient time. The invocation may be immediate as in a synchronous callback, or it might happen at later time as in an asynchronous callback
var fs = require('fs')
var myNumber = undefined
you are using call back here which give you power to run different- different method after reading number.txt successfully
function addOne(callback) {
fs.readFile('number.txt', function doneReading(err, fileContents) {
myNumber = parseInt(fileContents)
myNumber++
callback()
})
}
in your second example there is no callback you are calling logMyNumber() directly, what if we need to run another function something like
function logMyNumber() {
console.log(myNumber)
}
function varifynumber() {
console.log(myNumber)
}
function somthingelse() {
console.log(myNumber)
}
addOne(logMyNumber)
addOne(somthingelse)
addOne(logMyNumber)
and the other best use of callbacks in JavaScript is handle asynchronous tasks, if you noticed
inside your function you are using fs.readFile('number.txt',callback) which is a asynchronous method please have look below example
console.log('start');
fs.readFile('number.txt', function doneReading(err, fileContents) {
// until the file not read completely this section will not run
// this happend because of call back
console.log('Reading complete');
})
console.log('End');
output :
start
End
Reading complete
i hope this will help you

It all depends on what you are trying to achieve. In the first example, the function addOne has no concept on what the callback parameter does, it just invokes it.
However, in the second case, the addOne function knows it will invoke logMyNumber, and therefore has a tighter coupling and concept of what exactly is going on.
The first example is often favorable in most cases, e.g. if you are splitting functions across multiple files, and don't want them to be tightly intertwined.

Which is a better way of writing callbacks?

Just by seeing what I've wrote now, I can see that one is much smaller, so in terms of code golf Option 2 is the better bet, but as far as which is cleaner, I prefer Option 1. I would really love the community's input on this.
Option 1
something_async({
success: function(data) {
console.log(data);
},
error: function(error) {
console.log(error);
}
});
Option 2
something_async(function(error,data){
if(error){
console.log(error);
}else{
console.log(data);
}
});

They are not exactly the same. Option 2 will still log the (data), whereas Option 1 will only log data on success. (Edit: At least it was that way before you changed the code)
That said, Option 1 is more readable. Programming is not / should not be a competition to see who can write the fewest lines that do the most things. The goal should always be to create maintainable, extendable (if necessary) code --- in my humble opinion.

Many people will find option#1 easier to read and to maintain - two different callback functions for two different purposes. It is commonly used by all Promise Libraries, where two arguments will be passed. Of course, the question Multiple arguments vs. options object is independent from that (while the object is useful in jQuery.ajax, it doesn't make sense for promise.then).
However, option#2 is Node.js convention (see also NodeGuide) and used in many libraries that are influenced by it, for example famous async.js. However, this convention is discussable, top google results I found are WekeRoad: NodeJS Callback Conventions and Stackoverflow: What is the suggested callback style for Node.js libraries?.
The reason for the single callback function with an error argument is that it always reminds the developer to handle errors, which is especially important in serverside applications. Many beginners at clientside ajax functions don't care forget about error handling for example, asking themselves why the success callback doesn't get invoked. On the other hand, promises with then-chaining are based on the optionality of error callbacks, propagating them to the next level - of course it still needs to be catched there.

In all honesty, I prefer to take them one step further, into Promises/Futures/Deferreds/etc...
Or (/and) go into a "custom event" queue, using a Moderator (or an observer/sub-pub, if there is good reason for one particular object to be the source for data).
This isn't a 100% percent of the time thing. Sometimes, you just need a single callback. However, if you have multiple views which need to react to a change (in model data, or to visualize user-interaction), then a single callback with a bunch of hard-coded results isn't appropriate.
moderator.listen("my-model:timeline_update", myView.update);
moderator.listen("ui:data_request", myModel.request);
button.onclick = function () { moderator.notify("ui:data_request", button.value); }
Things are now much less dependent upon one big callback and you can mix and match and reuse code.
If you want to hide the moderator, you can make it a part of your objects:
var A = function () {
var sys = null,
notify = function (msg, data) {
if (sys && sys.notify) { sys.notify(msg, data); }
},
listen = function (msg, callback) {
if (sys && sys.listen) { sys.listen(msg, callback); }
},
attach = function (messenger) { sys = messenger; };
return {
attach : attach
/* ... */
};
},
B = function () { /* ... */ },
shell = Moderator(),
a = A(),
b = B();
a.attach(shell);
b.attach(shell);
a.listen("do something", a.method.bind(a));
b.notify("do something", b.property);
If this looks a little familiar, it's similar behaviour to, say Backbone.js (except that they extend() the behaviour onto objects, and others will bind, where my example has simplified wrappers to show what's going on).
Promises would be the other big-win for usability, maintainable and easy to read code (as long as people know what a "promise" is -- basically it passes around an object which has the callback subscriptions).
// using jQuery's "Deferred"
var ImageLoader = function () {
var cache = {},
public_function = function (url) {
if (cache[url]) { return cache[url].promise(); }
var img = new Image(),
loading = $.Deferred(),
promise = loading.promise();
img.onload = function () { loading.resolve(img); };
img.onerror = function () { loading.reject("error"); };
img.src = url;
cache[url] = loading;
return promise;
};
return public_function;
};
// returns promises
var loadImage = ImageLoader(),
myImg = loadImage("//site.com/img.jpg");
myImg.done( lightbox.showImg );
myImg.done( function (img) { console.log(img.width); } );
Or
var blog_comments = [ /* ... */ ],
comments = BlogComments();
blog_comments.forEach(function (comment) {
var el = makeComment(comment.author, comment.text),
img = loadImage(comment.img);
img.done(el.showAvatar);
comments.add(el);
});
All of the cruft there is to show how powerful promises can be.
Look at the .forEach call there.
I'm using Image loading instead of AJAX, because it might seem a little more obvious in this case:
I can load hundreds of blog comments, if the same user makes multiple posts, the image is cached, and if not, I don't have to wait for images to load, or write nested callbacks. Images load in any order, but still appear in the right spots.
This is 100% applicable to AJAX calls, as well.

Promises have proven to be the way to go as far as async and libraries like bluebird embrace node-style callbacks (using the (err, value) signature). So it seems beneficial to utilize node-style callbacks.
But the examples in the question can be easily be converted into either format with the functions below. (untested)
function mapToNodeStyleCallback(callback) {
return {
success: function(data) {
return callback(null, data)
},
error: function(error) {
return callback(error)
}
}
}
function alterNodeStyleCallback(propertyFuncs) {
return function () {
var args = Array.prototype.slice.call(arguments)
var err = args.shift()
if (err) return propertyFuncs.err.apply(null, [err])
return propertyFuncs.success.apply(null, args)
}
}

How to accumulate data from various AJAX calls?

Apart from making synchronous AJAX calls if you can and think it is appropriate, what is the best way to handle something like this?
var A = getDataFromServerWithAJAXCall(whatever);
var B = getDataFromServerWithAJAXCallThatDependsOnPreviousData(A);
var C = getMoreDataFromServerWithAJAXCall(whatever2);
processAllDataAndShowResult(A,B,C);
Provided that I can pass callbacks to those functions, I know I can use closures and lambdas to get the job done like this:
var A,B,C;
getDataFromServerWithAJAXCall(whatever, function(AJAXResult) {
A= AJAXResult;
getDataFromServerWithAJAXCallThatDependsOnPreviousData(A, function(AJAXResult2) {
B= AJAXResult2;
processAllDataAndShowResult(A,B,C);
});
});
getMoreDataFromServerWithAJAXCall(whatever2, function(AJAXResult) {
C= AJAXResult;
processAllDataAndShowResult(A,B,C);
});
function processAllDataAndShowResult(A,B,C) {
if(A && B && C) {
//Do stuff
}
}
But it doesn't feel right or clean enough to me. So is there a better way or at least a cleaner way to do the same thing or is it just that I'm not used to javascript functional programming?
By the way, I'm using jQuery (1.4.2) if that helps.
Thank you.

Yes, jQuery's Deferred Object is super handy.
Here's the example from the $.when() function documentation, illustrating a solution to your problem:
$.when($.ajax("/page1.php"), $.ajax("/page2.php")).done(function(a1, a2){
/* a1 and a2 are arguments resolved for the
page1 and page2 ajax requests, respectively */
var jqXHR = a1[2]; /* arguments are [ "success", statusText, jqXHR ] */
if ( /Whip It/.test(jqXHR.responseText) ) {
alert("First page has 'Whip It' somewhere.");
}
});
Cheers!

Make the callback function of each AJAX call to check/store results in a common local storage. And have another processing function that reads from this container, maybe at regular intervals or activated by each callback. This way you keep you functions clean and the focus on the Ajax call. This also keeps the accumulation scalable to n Ajax calls easy, and you dont have to modify existing code when adding a new call.

If you can use jQuery 1.5 you should be able to accomplish your needs via using the deferred object and $.when()
$.when(getDataFromServerWithAJAXCall("Call 1"), getMoreDataFromServerWithAJAXCall("Call 2")).done(function(a1, a2) {
var jqXHR = a1[2];
jqXHR.responseText;
getDataFromServerWithAJAXCallThatDependsOnPreviousData(jqXHR.responseText);
});
Simply put when the first two functions complete then it will execute the third function.
Example on jsfiddle

Use a so-called 'countdown latch'
Each of the functions have their own callback.
Have a variable called countdownlatch be upped each time a function is called and
count-down when each of the callbacks is reached (be sure to
countdown on async error as well.
Each of the callbacks separately checks to see if countdownlatch==0 if so call function
processAllDataAndShowResult
The beauty of javascript with these kind of async synchronizations is that implementing a countdownlatch is super-easy, because javascript is single-threaded, i.e: there's no way countdownlatch could get funky numbers because of racing conditions since these are non-existent (in this situation).
EDIT
Didn't see B depended on A, but the same principle applies:
var A,B,C;
var cdlatch = 2;
getDataFromServerWithAJAXCall(whatever, function(AJAXResult) {
A= AJAXResult;
getDataFromServerWithAJAXCallThatDependsOnPreviousData(A, function(AJAXResult2) {
B= AJAXResult2;
if(--cdlatch === 0){
processAllDataAndShowResult(A,B,C);
}
});
});
getMoreDataFromServerWithAJAXCall(whatever2, function(AJAXResult) {
C= AJAXResult;
if(--cdlatch === 0){
processAllDataAndShowResult(A,B,C);
}
});
function processAllDataAndShowResult(A,B,C) {
//Do stuff
}
I must admit it's not that clear as the general case I described earlier, oh well.

We Keep Coding

JavaScript is the programming language of the Web.

Coordinating parallel execution in node.js - javascript

I believe that now the "async" module provides this parallel functionality and is roughly the same as the fork function above.

Another option could be the Step module for Node: https://github.com/creationix/step

You may want to try this tiny library: https://www.npmjs.com/package/parallel-io

Related

How to monkey patch a recursive function

Understanding callback function purpose

Difference between using callbacks and calling methods

Which is a better way of writing callbacks?

How to accumulate data from various AJAX calls?

Categories

Resources