I've done hours of research on asynchronous programming, but I just can't seem to grasp this one single concept within Node, so I was wondering if someone here could help me.
I've written the following code sample to return / output a simple string which is a concatenation of strings from an object:
var itemCollection = {
item1 : [{ foo : "bar" }, { foo : "bar" }, { foo : "bar" }],
item2 : [{ foo : "bar" }, { foo : "bar" }, { foo : "bar" }],
item3 : [{ foo : "bar" }, { foo : "bar" }, { foo : "bar" }]
}
var aString = "";
for(item in itemCollection){
for (var i = 0; i < itemCollection[item].length; i++) {
var anItem = itemCollection[item][i];
//someFunctionThatDoesALongIOOperation takes an item as a param, plus a callback.
someFunctionThatDoesALongIOOperation(anItem, function(dataBackFromThisFunction){
// Do something with the data returned from this function
aString += dataBackFromThisFunction.dataToAppend;
});
};
}
console.log(aString);
So from what I understand, languages other than Javascript would run someFunctionThatDoesALongIOOperation synchronously and the script would run in a 'blocking mode'. This would mean that the value aString would get returned / outputted with its correct value.
However, as Node runs asynchronously, code can continue to run at anytime and tasks may not complete in order. This is because of the way the event loop works in Node. I think I get this.
So this is where my question comes in. If I wanted the value aString to be returned / outputted with its correct value like it would in other languages, what would I need to do to the loops within my code example? Or to put my question in more technical words: What is the correct approach for making aString return the expected result, so that IO operations (which take a longer amount of time to run) aren't completed after the script has finished executing when aString has already been returned?
I hope my question makes sense, if it doesn't, please let me know and I will make edits where appropriate.
Thank you
Since the function you apply to each item is asynchronous, the loop that processes them also must be asynchronous (likewise the function that consumes the result of this loop must also be async). Check out Bob Nystrom's "What Color is Your Function?" for more insight on this particular point.
There's two ways to do this (both using caolan's async library to wrap all the nasty callback logic):
Do one async operation one at a time, waiting for the previous to finish before the next can begin. This is probably most similar to the way a traditional synchronous loop runs. We can do this with async.reduce:
async.reduce(itemCollection, "", function(memo, item, callback) {
someFunctionThatDoesALongIOOperation(item, function(dataBackFromThisFunction) {
callback(null, memo + dataBackFromThisFunction.dataToAppend);
});
}, function(err, result) {
var aString = result;
});
Of course, there's little point in having async code if we don't actually reap it's benefits and execute many things at once. We can do all the async operations in parallel and reduce all at once in a single step afterwards. I've found this is great if processing each item requires some long operation such as network I/O, since we can kick off and wait for many requests at once. We use async.map to achieve this:
async.map(itemCollection, function(item, cb) {
someFunctionThatDoesALongIOOperation(item, function(dataBackFromThisFunction) {
cb(null, dataBackFromThisFunction.dataToAppend);
});
}, function(err, results) {
var aString = results.join('');
});
Related
I'm trying to use the javascript bind function to pass a file that I've required into the scope, and have it use that object's functionality to execute some code.
It's a recursive function, so to put it simply, I have an object:
var tts = require('./tts')
This object uses the Web API's Speech Synthesis functionality to turn the text that was passed in into speech.
So, I have a recursive function, say something, that should say the next thing, after the first one is done.
function saySomething(idx) {
tts('first thing',saysomething.bind(this,'next thing'))
}
Unfortunately, however, this does not work.
Can anyone tell me what I'm doing wrong?
My code:
tts code
recursive function code
P.S. I'm using browserify to compile the node-style code into browser-friendly code
Assuming tts() is an asynchronous function that calls its callback when it's done, you can call the next thing when that is done like this:
function saySomething(idx) {
tts('first thing', function() {
tts('next thing');
})
}
This would say first thing and then say next thing and then be done. While, at first glance, this appears to be recursive, it does not accumulate a stack frame with each call like a regular recursive call would because of the asynchronous nature of the callback. The first call to tts() has actually returned before the second one is called.
If what you really want to do is to iterate through an array of things to say, you can do that like this:
var phrases = ["one", "two", "three", "four"];
function sayPhrases(items) {
var index = 0;
function next() {
if (index < items.length) {
tts(items[index++], next);
}
}
next();
}
sayPhrases(phrases);
This question already has answers here:
Calling an asynchronous function within a for loop in JavaScript
(10 answers)
Closed 8 years ago.
I found lots of similar questions, but I still don't know what's wrong with my code. It seems that I cannot read global variable value (urls) in the callback function: I want to update the urls latestTimestamp value in the callback function(err, articles). Here is the code that went wrong:
var urls=[
{"url": "http://www.economist.com/feeds/print-sections/77/business.xml", "latestTimestamp": new Number(0)},
{"url": "http://news.sky.com/feeds/rss/home.xml", "latestTimestamp": new Number(0)},
]; // Example RSS Feeds;
// parse RssFeeds from given websites and write them into databse
function parseRssFeeds(collection){
var feed = require('feed-read'); // require the feed-read module
// loop through our list of RSS feed urls
for (var j = 0; j < urls.length; j++)
{
console.log('Original url timestamp is: '+ urls[j].latestTimestamp.toString());
// fetch rss feed for the url:
feed(urls[j], function(err, articles)
{
// loop through the list of articles returned
for (var i = 0; i < articles.length; i++)
{
var message =
{"title": articles[i].title,
"link": articles[i].link,
"content": articles[i].content,
"published": articles[i].published.getTime()};
collection.insert(message, {safe:true}, function(err, docs) {
if (err) {
console.log('Insert error: '+err);
}else{
console.log('This item timestamp is: '+ message.published);
// get the latest timestamp
if (message.published >urls[j].latestTimestamp) {
console.log('update timestamp to be: '+ message.published);
urls[j].latestTimestamp = message.published;
}
}
});// end collection insert
} // end inner for loop
}) // end call to feed method
} // end urls for loop
}
Thanks for any help. The error is:
TypeError: Cannot read property 'latestTimestamp' of undefined
at /Users/Laura/Documents/IBM/project/TestList/app.js:244:37
at /Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/collection/core.js:123:9
at /Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/db.js:1131:7
at /Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/db.js:1847:9
at Server.Base._callHandler (/Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/connection/base.js:445:41)
at /Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/connection/server.js:478:18
at MongoReply.parseBody (/Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/responses/mongo_reply.js:68:5)
at null.<anonymous> (/Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/connection/server.js:436:20)
at emit (events.js:95:17)
at null.<anonymous> (/Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/connection/connection_pool.js:201:13)
This should probably be closed as a duplicate, but I'll put an answer here because the relationships between all the duplicate questions are often so hard to grasp for JavaScript programmers that haven't understood the fundamental problem.
There are two ways to solve this. One way is to change the way that you create the callback. Instead of using an inline anonymous function:
feed(urls[j], function(err, articles)
{
// loop through the list of articles returned
// ...
you'd create another function that returns the callback. You'd pass that function the URL, and that's what the returned function would use:
function makeFeedResultHandler(url) {
return function(err, articles) {
// loop through the list of articles returned
// ... code as before up to this line:
if (message.published > url.latestTimestamp) {
console.log('update timestamp to be: '+ message.published);
url.latestTimestamp = message.published;
}
// ... etc
};
}
Then you'd call "feed" like this:
feed(urls[j], makeFeedResultHandler(urls[j]));
The key difference is that each function passed to "feed" will have its own private copy of the object (well, a copy of the reference to be picky) from the "urls" array, so it won't need to refer to the variable "j" at all. That's the crux of the problem: "j" is shared by all the callbacks in your code. By the time the callbacks are invoked, the value of "j" is equal to the length of the "urls" array, so urls[j] is undefined.
The other approach would be to use the .forEach method, available in newer JavaScript implementations. That approach would get rid of "j" altogether:
urls.forEach(function(url) {
console.log('Original url timestamp is: '+ url.latestTimestamp.toString());
// fetch rss feed for the url:
feed(url, function(err, articles)
{
// loop through the list of articles returned
// ... code as previously, substituting "url" for "urls[j]" everywhere
});
});
Again, that makes sure that every callback sent to the "feed" function has its own copy of the "urls" element.
Expanding on what #Pointy said in his comment under your post:
The insert function you are using with MongoDB is async, but you are treating the callback like it is synchronous. What is essentially happening in your loop, is everything works as planned until you hit collection.insert. From there, the process breaks off and essentially says "I'm going to tell mongo to insert a record now.. and eventually I'll expect a response." Meanwhile, the loop continues on to the next index and doesn't synchronously wait until the callback fires.
By the time your callback fires, your loop is already done, and J doesn't represent the index anymore, which is why its coming up undefined. You also run the risk of getting a different index than what you plan also with this current method.
I would recommend reworking your loop to support the async nature of node. There is a great library called - oddly enough - async that makes this process super simple. The async.each() function should help you accomplish what you are trying to do.
So, I am trying to make sure that a series of HTTP GET requests happen before I try to render the data gotten into a visualization. Typical deal, right?
I'm using queue.js, and seeing this on the queue.js github page (https://github.com/mbostock/queue):
Or, if you wanted to run a bazillion asynchronous tasks (here represented as an array of closures) serially:
var q = queue(1);
tasks.forEach(function(t) { q.defer(t); });
q.awaitAll(function(error, results) { console.log("all done!"); });
Queue.js can be run inside Node.js or in a browser.
So, what I did was made an array of functions, each of which contained a Meteor.http.get call (as I'm using Meteor.js) and then followed this line by line.
It seems like what is happening is that while my array -- which has 8 functions in it, all with what looks like the right function in each slot -- gets populated (and then passed as in the code exceprt to defer,) only one actually runs.
Here's what I'm wondering. Well, overall it's, why is only one function running though 8 are passed in to defer? But specifically it's -- having a hazy understanding of closures, I really have an array of functions. Is there something I missed there, since the documentation specifically says closures, which is why all the functions aren't executing?
Thank you for looking at this!
Here is perhaps the literal part of the statement you quoted, found in the test suite:
"queue of asynchronous closures, processed serially": {
topic: function() {
var tasks = [],
task = asynchronousTask(),
n = 10,
q = queue(1);
while (--n >= 0) tasks.push(task);
tasks.forEach(function(t) { q.defer(t); });
q.awaitAll(this.callback)
},
"does not fail": function(error, results) {
assert.isNull(error);
},
"executes all tasks in series": function(error, results) {
assert.deepEqual(results, [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]);
}
},
https://github.com/mbostock/queue/blob/master/test/queue-test.js#L103
Which runs task = asynchronousTask(), which is what is pulled off the queue and invoked
function asynchronousTask() {
var active = 0;
return function(callback) {
++active;
process.nextTick(function() {
try {
callback(null, active);
} finally {
--active;
}
});
};
}
The above, inner return function() {...} is what I believe is being referenced as a closure which retains it's reference in scope to the outer active variable as each asynchronous function is called off the queue.
This is, of course, fairly powerful in terms of callbacks and handlers, since it gives you the means to maintain and manipulate a locally shared variable, for instance if you want to know how many functions were returned, and when the list has been exhausted. In other words, a queue.
The following is not used in the example up top, but use it as a reference point to see how it differs from the synchronousTask function above.
function synchronousTask() {
var active = 0;
return function(callback) {
try {
callback(null, ++active);
} finally {
--active;
}
};
}
https://github.com/mbostock/queue/blob/master/test/queue-test.js#L265
Just by seeing what I've wrote now, I can see that one is much smaller, so in terms of code golf Option 2 is the better bet, but as far as which is cleaner, I prefer Option 1. I would really love the community's input on this.
Option 1
something_async({
success: function(data) {
console.log(data);
},
error: function(error) {
console.log(error);
}
});
Option 2
something_async(function(error,data){
if(error){
console.log(error);
}else{
console.log(data);
}
});
They are not exactly the same. Option 2 will still log the (data), whereas Option 1 will only log data on success. (Edit: At least it was that way before you changed the code)
That said, Option 1 is more readable. Programming is not / should not be a competition to see who can write the fewest lines that do the most things. The goal should always be to create maintainable, extendable (if necessary) code --- in my humble opinion.
Many people will find option#1 easier to read and to maintain - two different callback functions for two different purposes. It is commonly used by all Promise Libraries, where two arguments will be passed. Of course, the question Multiple arguments vs. options object is independent from that (while the object is useful in jQuery.ajax, it doesn't make sense for promise.then).
However, option#2 is Node.js convention (see also NodeGuide) and used in many libraries that are influenced by it, for example famous async.js. However, this convention is discussable, top google results I found are WekeRoad: NodeJS Callback Conventions and Stackoverflow: What is the suggested callback style for Node.js libraries?.
The reason for the single callback function with an error argument is that it always reminds the developer to handle errors, which is especially important in serverside applications. Many beginners at clientside ajax functions don't care forget about error handling for example, asking themselves why the success callback doesn't get invoked. On the other hand, promises with then-chaining are based on the optionality of error callbacks, propagating them to the next level - of course it still needs to be catched there.
In all honesty, I prefer to take them one step further, into Promises/Futures/Deferreds/etc...
Or (/and) go into a "custom event" queue, using a Moderator (or an observer/sub-pub, if there is good reason for one particular object to be the source for data).
This isn't a 100% percent of the time thing. Sometimes, you just need a single callback. However, if you have multiple views which need to react to a change (in model data, or to visualize user-interaction), then a single callback with a bunch of hard-coded results isn't appropriate.
moderator.listen("my-model:timeline_update", myView.update);
moderator.listen("ui:data_request", myModel.request);
button.onclick = function () { moderator.notify("ui:data_request", button.value); }
Things are now much less dependent upon one big callback and you can mix and match and reuse code.
If you want to hide the moderator, you can make it a part of your objects:
var A = function () {
var sys = null,
notify = function (msg, data) {
if (sys && sys.notify) { sys.notify(msg, data); }
},
listen = function (msg, callback) {
if (sys && sys.listen) { sys.listen(msg, callback); }
},
attach = function (messenger) { sys = messenger; };
return {
attach : attach
/* ... */
};
},
B = function () { /* ... */ },
shell = Moderator(),
a = A(),
b = B();
a.attach(shell);
b.attach(shell);
a.listen("do something", a.method.bind(a));
b.notify("do something", b.property);
If this looks a little familiar, it's similar behaviour to, say Backbone.js (except that they extend() the behaviour onto objects, and others will bind, where my example has simplified wrappers to show what's going on).
Promises would be the other big-win for usability, maintainable and easy to read code (as long as people know what a "promise" is -- basically it passes around an object which has the callback subscriptions).
// using jQuery's "Deferred"
var ImageLoader = function () {
var cache = {},
public_function = function (url) {
if (cache[url]) { return cache[url].promise(); }
var img = new Image(),
loading = $.Deferred(),
promise = loading.promise();
img.onload = function () { loading.resolve(img); };
img.onerror = function () { loading.reject("error"); };
img.src = url;
cache[url] = loading;
return promise;
};
return public_function;
};
// returns promises
var loadImage = ImageLoader(),
myImg = loadImage("//site.com/img.jpg");
myImg.done( lightbox.showImg );
myImg.done( function (img) { console.log(img.width); } );
Or
var blog_comments = [ /* ... */ ],
comments = BlogComments();
blog_comments.forEach(function (comment) {
var el = makeComment(comment.author, comment.text),
img = loadImage(comment.img);
img.done(el.showAvatar);
comments.add(el);
});
All of the cruft there is to show how powerful promises can be.
Look at the .forEach call there.
I'm using Image loading instead of AJAX, because it might seem a little more obvious in this case:
I can load hundreds of blog comments, if the same user makes multiple posts, the image is cached, and if not, I don't have to wait for images to load, or write nested callbacks. Images load in any order, but still appear in the right spots.
This is 100% applicable to AJAX calls, as well.
Promises have proven to be the way to go as far as async and libraries like bluebird embrace node-style callbacks (using the (err, value) signature). So it seems beneficial to utilize node-style callbacks.
But the examples in the question can be easily be converted into either format with the functions below. (untested)
function mapToNodeStyleCallback(callback) {
return {
success: function(data) {
return callback(null, data)
},
error: function(error) {
return callback(error)
}
}
}
function alterNodeStyleCallback(propertyFuncs) {
return function () {
var args = Array.prototype.slice.call(arguments)
var err = args.shift()
if (err) return propertyFuncs.err.apply(null, [err])
return propertyFuncs.success.apply(null, args)
}
}
The event-driven programming model of node.js makes it somewhat tricky to coordinate the program flow.
Simple sequential execution gets turned into nested callbacks, which is easy enough (though a bit convoluted to write down).
But how about parallel execution? Say you have three tasks A,B,C that can run in parallel and when they are done, you want to send their results to task D.
With a fork/join model this would be
fork A
fork B
fork C
join A,B,C, run D
How do I write that in node.js ? Are there any best practices or cookbooks? Do I have to hand-roll a solution every time, or is there some library with helpers for this?
Nothing is truly parallel in node.js since it is single threaded. However, multiple events can be scheduled and run in a sequence you can't determine beforehand. And some things like database access are actually "parallel" in that the database queries themselves are run in separate threads but are re-integrated into the event stream when completed.
So, how do you schedule a callback on multiple event handlers? Well, this is one common technique used in animations in browser side javascript: use a variable to track the completion.
This sounds like a hack and it is, and it sounds potentially messy leaving a bunch of global variables around doing the tracking and in a lesser language it would be. But in javascript we can use closures:
function fork (async_calls, shared_callback) {
var counter = async_calls.length;
var callback = function () {
counter --;
if (counter == 0) {
shared_callback()
}
}
for (var i=0;i<async_calls.length;i++) {
async_calls[i](callback);
}
}
// usage:
fork([A,B,C],D);
In the example above we keep the code simple by assuming the async and callback functions require no arguments. You can of course modify the code to pass arguments to the async functions and have the callback function accumulate results and pass it to the shared_callback function.
Additional answer:
Actually, even as is, that fork() function can already pass arguments to the async functions using a closure:
fork([
function(callback){ A(1,2,callback) },
function(callback){ B(1,callback) },
function(callback){ C(1,2,callback) }
],D);
the only thing left to do is to accumulate the results from A,B,C and pass them on to D.
Even more additional answer:
I couldn't resist. Kept thinking about this during breakfast. Here's an implementation of fork() that accumulates results (usually passed as arguments to the callback function):
function fork (async_calls, shared_callback) {
var counter = async_calls.length;
var all_results = [];
function makeCallback (index) {
return function () {
counter --;
var results = [];
// we use the arguments object here because some callbacks
// in Node pass in multiple arguments as result.
for (var i=0;i<arguments.length;i++) {
results.push(arguments[i]);
}
all_results[index] = results;
if (counter == 0) {
shared_callback(all_results);
}
}
}
for (var i=0;i<async_calls.length;i++) {
async_calls[i](makeCallback(i));
}
}
That was easy enough. This makes fork() fairly general purpose and can be used to synchronize multiple non-homogeneous events.
Example usage in Node.js:
// Read 3 files in parallel and process them together:
function A (c){ fs.readFile('file1',c) };
function B (c){ fs.readFile('file2',c) };
function C (c){ fs.readFile('file3',c) };
function D (result) {
file1data = result[0][1];
file2data = result[1][1];
file3data = result[2][1];
// process the files together here
}
fork([A,B,C],D);
Update
This code was written before the existence of libraries like async.js or the various promise based libraries. I'd like to believe that async.js was inspired by this but I don't have any proof of it. Anyway.. if you're thinking of doing this today take a look at async.js or promises. Just consider the answer above a good explanation/illustration of how things like async.parallel work.
For completeness sake the following is how you'd do it with async.parallel:
var async = require('async');
async.parallel([A,B,C],D);
Note that async.parallel works exactly the same as the fork function we implemented above. The main difference is it passes an error as the first argument to D and the callback as the second argument as per node.js convention.
Using promises, we'd write it as follows:
// Assuming A, B & C return a promise instead of accepting a callback
Promise.all([A,B,C]).then(D);
I believe that now the "async" module provides this parallel functionality and is roughly the same as the fork function above.
The futures module has a submodule called join that I have liked to use:
Joins asynchronous calls together similar to how pthread_join works for threads.
The readme shows some good examples of using it freestyle or using the future submodule using the Promise pattern. Example from the docs:
var Join = require('join')
, join = Join()
, callbackA = join.add()
, callbackB = join.add()
, callbackC = join.add();
function abcComplete(aArgs, bArgs, cArgs) {
console.log(aArgs[1] + bArgs[1] + cArgs[1]);
}
setTimeout(function () {
callbackA(null, 'Hello');
}, 300);
setTimeout(function () {
callbackB(null, 'World');
}, 500);
setTimeout(function () {
callbackC(null, '!');
}, 400);
// this must be called after all
join.when(abcComplete);
A simple solution might be possible here: http://howtonode.org/control-flow-part-ii scroll to Parallel actions. Another way would be to have A,B, and C all share the same callback function, have that function have an global or at least out-of-the-function incrementor, if all three have called the callback then let it run D, ofcourse you will have to store the results of A,B, and C somewhere as well.
Another option could be the Step module for Node: https://github.com/creationix/step
You may want to try this tiny library: https://www.npmjs.com/package/parallel-io
In addition to popular promises and async-library, there is 3rd elegant way - using "wiring":
var l = new Wire();
funcA(l.branch('post'));
funcB(l.branch('comments'));
funcC(l.branch('links'));
l.success(function(results) {
// result will be object with results:
// { post: ..., comments: ..., links: ...}
});
https://github.com/garmoshka-mo/mo-wire