NodeJS callback scope

NodeJS callback scope - javascript

As a js/node newcomer, I'm having some problems understanding how I can get around this issue.
Basically I have a list of objects that I would like to save to a MongoDB database if they don't already exist.
Here is some code:
var getDataHandler = function (err, resp, body) {
var data = JSON.parse(body);
for (var i=0; i < data.length; i++) {
var item = data[i];
models.Entry.findOne({id: item.id}, function(err, res) {
if (err) { }
else if (result === null) {
var entry = new models.Entry(item);
feedbackEntry.save(function(err, result) {
if (err) {}
});
}
});
}
}
The problem I have is that because it is asynchronous, once the new models.Entry(item) line is executed the value of item will be equal to the last element in the data array for every single callback.
What kind of pattern can I use to avoid this issue ?
Thanks.

Two kinds of patterns are available :
1) Callbacks. That is you go on calling functions from your functions by passing them as parameters. Callbacks are generally fine but, especially server side when dealing with database or other asynchronous resources, you fast end in "callback hell" and you may grow tired of looking for tricks to reduce the indentation levels of your code. And you may sometimes wonder how you really deal with exceptions. But callbacks are the basis : you must understand how to deal with that problem using callbacks.
2) Promises. Using promises you may have something like that (example from my related blog post) :
db.on(userId) // get a connection from the pool
.then(db.getUser) // use it to issue an asynchronous query
.then(function(user){ // then, with the result of the query
ui.showUser(user); // do something
}).finally(db.off); // and return the connection to the pool
Instead of passing the next function as callback, you just chain with then (in fact it's a little more complex, you have other functions, for example to deal with collections and parallel resolution or error catching in a clean way).
Regarding your scope problem with the variable evolving before the callback is called, the standard solution is this one :
for (var i=0; i<n; i++) {
(function(i){
// any function defined here (a callback) will use the value of i fixed when iterating
})(i);
});
This works because calling a function creates a scope and the callback you create in that scope retains a pointer to that scope where it will fetch i (that's called a closure).

Related

Node.js not waiting for nested inner function calls to execute

Admittedly I'm a novice with node, but it seems like this should be working fine. I am using multiparty to parse a form, which returns an array. I am then using a for each to step through the array. However - the for each is not waiting for the inner code to execute. I am a little confused as to why it is not, though.
var return_GROBID = function(req, res, next) {
var form = new multiparty.Form();
var response_array = [];
form.parse(req, function(err, fields, files) {
files.PDFs.forEach(function (element, index, array) {
fs.readFile(element.path, function (err, data) {
var newPath = __dirname + "/../public/PDFs/" + element.originalFilename;
fs.writeFile(newPath, data, function (err) {
if(err) {
res.send(err);
}
GROBIDrequest.GROBID2js(newPath, function(response) {
response_array.push(response);
if (response_array.length == array.length) {
res.locals.body = response_array;
next();
}
});
});
});
});
});
}
If someone can give me some insight on the proper way to do this that would be great.
EDIT: The mystery continues. I ran this code on another machine and IT WORKED. What is going on? Why would one machine be inconsistent with another?

I'd guess the PDFs.forEach is you just calling the built-in forEach function, correct?
In Javascript many things are asynchronous - meaning that given:
linea();
lineb();
lineb may be executed before linea has finished whatever operation it started (because in asynchronous programming, we don't wait around until a network request comes back, for example).
This is different from other programming languages: most languages will "block" until linea is complete, even if linea could take time (like making a network request). (This is called synchronous programming).
With that preamble done, back to your original question:
So forEach is a synchronous function. If you rewrote your code like the following, it would work (but not be useful):
PDFs.forEach(function (element, index, array) {
console.log(element.path)
}
(console.log is a rare synchronous method in Javascript).
But in your forEach loop you have fs.readFile. Notice that last parameter, a function? Node will call that function back when the operation is complete (a callback).
Your code will currently, and as observed, hit that fs.readFile, say, "ok, next thing", and move on to the next item in the loop.
One way to fix this, with the least changing the code, is to use the async library.
async.forEachOf(PDFs, function(value, key, everythingAllDoneCallback) {
GROBIDrequest.GROBID2js(newPath, function(response) {
response_array.push(response);
if (response_array.length = array.length) {
...
}
everythingAllDoneCallback(null)
} );
With this code you are going through all your asynchronous work, then triggering the callback when it's safe to move on to the next item in the list.
Node and callbacks like this are a very common Node pattern, it should be well covered by beginner material on Node. But it is one of the most... unexpected concepts in Node development.
One resource I found on this was (one from a set of lessons) about NodeJS For Beginners: Callbacks. This, and playing around with blocking (synchronous) and non-blocking (asynchronous) functions, and hopefully this SO answer, may provide some enlightenment :)

node.js data consistency when iterating asynchronously

I have a tool who's basic idea is as follows:
//get a bunch of couchdb databases. this is an array
const jsonFile = require('jsonfile');
let dbList = getDbList();
const filePath = 'some/path/to/file';
const changesObject = {};
//iterate the db list. do asynchronous stuff on each iteration
dbList.forEach(function(db){
let merchantDb = nano.use(db);
//get some changes from the database. validate inside callback
merchantDb.get("_changes", function(err,changes){
validateChanges(changes);
changesObject['db'] = changes.someAttribute;
//write changes to file
jsonFile.writeFile(filePath, changesObject, function (err) {
if (err) {
logger.error("Unable to write to file: ");
}
});
})
const validateChanges = function(changes) {
if (!validateLogic(changes) sendAlertMail();
}
For performance improvements the iteration is not done synchronously. Therefore there can be multiple iterations running in 'parallel'. My question is can this cause any data inconsistencies and/or any issues with the file writing process?
Edit:
The same file gets written to on each iteration.
Edit:2
The changes are stored as a JSON object with key value pairs. The key being the db name.

If you're really writing to a single file, which you appear to be (though it's hard to be sure), then no; you have a race condition in which multiple callbacks will try to write to the same file, possibly at the same time (remember, I/O isn't done on the JavaScript thread in Node unless you use the *Sync functions), which will at best mean the last one wins and will at worst mean I/O errors because of overlap.
If you're writing to separate files for each db, then provided there's no cross-talk (shared state) amongst validateChanges, validateLogic, sendAlertMail, etc., that should be fine.
Just for detail: It will start tasks (jobs) getting the changes and then writing them out; the callbacks of the calls to get won't be run until later, when all of those jobs are queued.
You are creating closures in loops, but the way you're doing it is okay, both because you're doing it within the forEach callback and because you're not using db in the get callback (which would be fine with the forEach callback but not with some other ways you might loop arrays). Details on that aspect in this question's answers if you're interested.
This line is suspect, though:
let merchantDb = nano.use('db');
I suspect you meant (no quotes):
let merchantDb = nano.use(db);
For what it's worth, it sounds from the updates to the question and your various comments like the better solution would be not to write out the file separately each time. Instead, you want to gather up the changes and then write them out.
You can do that with the classic Node-callback APIs you're using like this:
let completed = 0;
//iterate the db list. do asynchronous stuff on each iteration
dbList.forEach(function(db) {
let merchantDb = nano.use(db);
//get some changes from the database. validate inside callback
merchantDb.get("_changes", function(err, changes) {
if (err) {
// Deal with the fact there was an error (don't return)
} else {
validateChanges(changes);
changesObject[db] = changes.someAttribute; // <=== NOTE: This line had 'db' rather than db, I assume that was meant to be just db
}
if (++completed === dbList.length) {
// All done, write changes to file
jsonFile.writeFile(filePath, changesObject, function(err) {
if (err) {
logger.error("Unable to write to file: ");
}
});
}
})
});

Why count callbacks?

I am working on problem 10: ASYNC JUGGLING in the learnyounode tutorials.
This problem is the same as the previous problem (HTTP COLLECT) in
that you need to use http.get(). However, this time you will be
provided with three URLs as the first three command-line arguments.
You must collect the complete content provided to you by each of the
URLs and print it to the console (stdout). You don't need to print out
the length, just the data as a String; one line per URL. The catch is
that you must print them out in the same order as the URLs are
provided to you as command-line arguments.
The official solution involves counting callbacks:
var http = require('http')
var bl = require('bl')
var results = []
var count = 0
function printResults () {
for (var i = 0; i < 3; i++)
console.log(results[i])
}
function httpGet (index) {
http.get(process.argv[2 + index], function (response) {
response.pipe(bl(function (err, data) {
if (err)
return console.error(err)
results[index] = data.toString()
count++
if (count == 3) // yay! we are the last one!
printResults()
}))
})
}
for (var i = 0; i < 3; i++)
httpGet(i)
The program must wait until all three responses have been received before printing them out, so they come out in the same order they were entered.
My attempt involved using a callback to ensure the correct order:
var http = require('http')
var bl = require('bl')
var results = []
function printResults () {
console.log(results[0])
console.log(results[1])
console.log(results[2])
}
function httpGet (i) {
http.get(process.argv[2 + i], function (response) {
response.pipe(bl(function (err, data) {
if (err)
return console.error(err)
results[index] = data.toString()
}))
})
}
function httpGetAll (callback) {
httpGet(0)
httpGet(1)
httpGet(2)
callback()
}
httpGetAll(printResults)
But this spits out undefined three times. So it seems as though the printResults() is being called before the three httpGet() lines are executed. Seems like I don't understand callbacks as well as I thought.
So my question is, is there any way to achieve this using a callback on httpGetAll()? Or do I have to count callbacks to httpGet()?

But this spits out undefined three times. So it seems as though the printResults() is being called before the three httpGet() lines are executed. Seems like I don't understand callbacks as well as I thought.
Yes, you are misunderstanding how asynchronous code behaves. The three httpGet() ARE executed first but their asynchronous callbacks that have the results are NOT executed until a later event loop tick. If you look in httpGet, the code indented 1 level runs on the first tick, which is just really that first line, all the code in the nested callback function that is indented 2 levels does NOT execute on the same tick. That code is just scheduled on the event queue for later after the HTTP response arrives, but node doesn't just wait, it keeps going in the interim.
So my question is, is there any way to achieve this using a callback on httpGetAll()? Or do I have to count callbacks to httpGet()?
Yes, there are ways to implement this correctly without specifically counting callbacks, however, you must "keep track" of the pending calls somehow. Counting is a straightforward and efficient way to do this, but you could also use an array as a queue of pending calls, remove an element from the queue when each response arrives, and know you are done when the queue is empty. You could also track state in an object per request with a done property that starts false and you set to true when the response arrives, and you check if they are all done by ensuring all done properties are true. It's not technically counting, but it is book-keeping of a similar nature.

Understanding node.js asynchronicity - for loop versus nested callback

I'm new to nodejs and is trying to understand its asynchronous idea. In the following code snippet, I'm trying to get two documents from mongodb database randomly. It works fine, but looks very ugly because of the nested callback functions. If I want to get 100 documents instead of 2, that would be a disaster.
app.get('/api/two', function(req, res){
dataset.count(function(err, count){
var docs = [];
var rand = Math.floor(Math.random() * count);
dataset.findOne({'index':rand}, function(err, doc){
docs.push(doc);
rand = Math.floor(Math.random() * count);
dataset.findOne({'index':rand}, function(err, doc1){
docs.push(doc1);
res.json(docs);
});
});
});
});
So I tried to use for-loop instead, however, the following code just doesn't work, and I guess I misunderstand the asynchronous method idea.
app.get('/api/two', function(req, res){
dataset.count(function(err, count){
var docs = []
for(i = 0; i < 2 ; i++){
var rand = Math.floor(Math.random() * count);
dataset.findOne({'index':rand}, function(err, doc){
docs.push(doc);
});
}
res.json(docs);
});
});
Can anyone help me with that and explain to me why it doesn't work? Thank you very much.

Can anyone help me with that and explain to me why it doesn't work?
tl;dr -- The problem is caused by running a loop over an asynchronous function (dataset.findOne) that cannot complete before the loop completes. You need to handle this with a library like async (as suggested by the other answer) or by callbacks as in the first code example.
Looping over an synchronous function
This may sound pedantic, but it's important to understand the differences between looping in a synchronous and asynchronous world. Consider this synchronous loop:
var numbers = [];
for( i = 0 ; i < 5 ; i++ ){
numbers[i] = i*2;
}
console.log("array:",numbers);
On my system, this outputs:
array: [ 0, 2, 4, 6, 8 ]
This is because the assignment to numbers[i] happens before the loop can iterate. For any synchronous ("blocking") assignment/function, you will get results in this manner.
For illustration, let's try this code:
function sleep(time){
var stop = new Date().getTime();
while(new Date().getTime() < stop + time) {}
}
for( i = 0 ; i < 5 ; i++ ){
sleep(1000);
}
If you get your watch out or throw in some console.log messages, you'll see that "sleeps" for 5 seconds.
This is because the while loop in sleep blocks...it iterates until the time milliseconds have passed before returning control back to the for loop.
Looping over an asynchronous function
The root of your problem is that dataset.findOne is asynchronous...which means it passes control back to the loop before the database has returned results. The findOne method takes a callback (the anonymous function(err, doc)) that creates a closure.
Describing closures here is beyond the scope of this answer, but if you search this site or use your favorite search engine for "javascript closures" you'll get tons of info.
The bottom line, though, is that the asynchronous call send the query off to the database. Because the transaction will take some time and it has a callback that can accept the query results, it hands control back to the for-loop. (Important: this is where node's "event loop" and it's intersection with "asynchronous programming" comes into play. Node is providing a non-blocking environment by allowing asynchronous behavior like this.)
Let's look at an example of how async issues can trip us up:
for( i = 0 ; i < 5 ; i++ ){
setTimeout(
function(){console.log("I think I is: ", i);} // anonymous callback
,1 // wait 1ms before using the callback function
)
}
console.log("I am done executing.")
You'll get output that looks like this:
I am done executing.
I think I is: 5
I think I is: 5
I think I is: 5
I think I is: 5
I think I is: 5
This is because setTimeout gets a function to call...so even though we only said "wait ONE millisecond", that's still longer than it takes for the loop to iterate 5 times and move on to the last console.log line.
What happens, then, is that the last line fires before the first anonymous callback fires. When it does fire, the loop has finished and i is equal to 5. So what you see here is that the loop is done, has moved on, even though the anonymous function handed to setTimeout still has access to the value of i. (This is "closures" in action...)
If we take this concept and use it to consider your second "broken" code example, we can see why you aren't getting the results you expected.
app.get('/api/two', function(req, res){
dataset.count(function(err, count){
var docs = []
for(i = 0; i < 2 ; i++){
var rand = Math.floor(Math.random() * count);
// THIS IS ASYNCHRONOUS.
// findOne gets a callback...
// hands control back to the for loop...
// and later pushes info into the "doc" array...
// too late for res.json, at least...
dataset.findOne({'index':rand}, function(err, doc){
docs.push(doc);
});
}
// THE LOOP HAS ENDED BEFORE any of the findOne callbacks fire...
// There's nothing in 'docs' to be sent back to the client. :(
res.json(docs);
});
});
The reason async, promises and other similar libraries are a good tool is that they help to solve the problem you are facing. async and promises can turn the "callback hell" that is created in this situation into a relatively clean solution...it's easier to read, easier to see where async stuff is happening, and when you need to make edits you don't have to worry about which callback level you are at/editing/etc.

You could use the async module. For example:
var async = require('async');
async.times(2, function(n, next) {
var rand = Math.floor(Math.random() * count);
dataset.findOne({'index':rand}, function(err, doc) {
next(err, doc);
});
}, function(err, docs) {
res.json(docs);
});
If you want to get 100 documents, you just need to change Async.times(2, to Async.times(100,.

The async module as mentioned above is a good solution. The reason this is happening is because a regular Javascript for loop is synchronous, while your calls to the database are asynchronous. The for loop does not know that you want to wait until the data is retrieved to go onto the next iteration, so it just keeps going, and finishes faster than the data retrieval.

Control flow issue with node/redis and callbacks?

Please could I ask for some advice on a control flow issue with node and redis? (aka Python coder trying to get used to JavaScript)
I don't understand why client.smembers and client.get (Redis lookups) need to be callbacks rather than simply being statements - it makes life very complicated.
Basically I'd like to query a set, and then when I have the results for the set, I need to carry out a get for each result. When I've got all the data, I need to broadcast it back to the client.
Currently I do this inside two callbacks, using a global object, which seems messy. I'm not even sure if it's safe (will the code wait for one client.get to complete before starting another?).
The current code looks like this:
var all_users = [];
// Get all the users for this page.
client.smembers("page:" + current_page_id, function (err, user_ids ) {
// Now get the name of each of those users.
for (var i = 0; i < user_ids.length; i++) {
client.get('user:' + user_ids[i] + ':name', function(err, name) {
var myobj = {};
myobj[user_ids[i]] = name;
all_users.push(myobj);
// Broadcast when we have got to the end of the loop,
// so all users have been added to the list -
// is this the best way? It seems messy.
if (i === (user_ids.length - 1)) {
socket.broadcast('all_users', all_users);
}
});
}
});
But this seems very messy. Is it really the best way to do this? How can I be sure that all lookups have been performed before calling socket.broadcast?
scratches head Thanks in advance for any advice.

I don't understand why client.smembers and client.get (Redis lookups) need to be callbacks rather than simply being statements - it makes life very complicated.
That's what Node is. (I'm pretty sure that this topic was discussed more than enough times here, look through other questions, it's definitely there)
How can I be sure that all lookups have been performed before calling socket.broadcast?
That's what is err for in callback function. This is kinda Node's standard - first parameter in callback is error object (null if everything fine). So just use something like this to be sure no errors occurred:
if (err) {
... // handle errors.
return // or not, it depends.
}
... // process results
But this seems very messy.
You'll get used to it. I'm actually finding it nice, when code is well formatted and project is cleverly structured.
Other ways are:
Using libraries to control async code-flow (Async.js, Step.js, etc.)
If spaghetti-style code is what you think mess is, define some functions to process results and pass them as parameters instead of anonymous ones.

If you totally dislike writing stuff callback-style, you might want to try streamlinejs:
var all_users = [];
// Get all the users for this page.
var user_ids = client.smembers("page:" + current_page_id, _);
// Now get the name of each of those users.
for (var i = 0; i < user_ids.length; i++) {
var name = client.get('user:' + user_ids[i] + ':name', _);
var myobj = {};
myobj[user_ids[i]] = name;
all_users.push(myobj);
}
socket.broadcast('all_users', all_users);
Note that a disadvantage of this variant is that only one username will be fetched at a time. Also, you should still be aware of what this code really does.

Async is a great library and you should take a look. Why ? Clean code / process / easy to track .. etc
Also, keep in mind that all your async function will be processed after your for loop. In you exemple, it may result in wrong "i" value. Use closure :
for (var i = 0; i < user_ids.length; i++) { (function(i) {
client.get('user:' + user_ids[i] + ':name', function(err, name) {
var myobj = {};
myobj[user_ids[i]] = name;
all_users.push(myobj);
// Broadcast when we have got to the end of the loop,
// so all users have been added to the list -
// is this the best way? It seems messy.
if (i === (user_ids.length - 1)) {
socket.broadcast('all_users', all_users);
}
});
})(i)}
What you should do to know when it's finish is use a recursive pattern like async ( i think ) do. It's much simple then doing it yourself.
async.series({
getMembers: function(callback) {
client.smembers("page:" + current_page_id, callback);
}
}, function(err, results) {
var all_users = [];
async.forEachSeries(results.getMembers, function(item, cb) {
all_users.push(item);
cb();
}, function(err) {
socket.broadcast('all_users', all_users);
});
});
This code may not be valid, but you should be able to figure out how to do it.
Step library is good too ( and only 30~ line of code i think)

I don't understand why client.smembers and client.get (Redis lookups)
need to be callbacks rather than simply being statements - it makes
life very complicated.
Right, so everyone agrees callback hell is no bueno. As of this writing, callbacks are a dying feature of Node. Unfortunately, the Redis library does not have native support for returning Promises.
But there is a module you can require in like so:
const util = require("util");
This is a standard library that is included in the Node runtime and has a bunch of utility functions we can use, one of them being "promisify":
https://nodejs.org/api/util.html#util_util_promisify_original
Now of course when you asked this question seven years ago, util.promisify(original) did not exist as it was added in with the release of -v 8.0.0, so we can now update this question with an updated answer.
So promisify is a function and we can pass it a function like client.get() and it will return a new function that take the nasty callback behavior and instead wraps it up nice and neat to make it return a Promise.
So promisify takes any function that accepts a callback as the last argument and makes it instead return a Promise and it sounds like thats the exact behavior that you wanted seven years ago and we are afforded today.
const util = require("util");
client.get = util.promisify(client.get);
So we are passing a reference to the .get() function to util.promisify().
This takes your function, wraps it up so instead of implementing a callback, it instead returns a Promise. So util.promisify() returns a new function that has been promisified.
So you can take that new function and override the existing one on client.get().
Nowadays, you do not have to use a callback for Redis lookup. So now you can use the async/await syntax like so:
const cachedMembers = await client.get('user:' + user_ids[i]);
So we wait for this to be resolved and whatever it resolves with will be assigned to cachedMembers.
The code can be even further cleaned up to be more updated by using an ES6 array helper method instead of your for loop. I hope this answer is useful for current users, otherwise the OP was obsolete.

We Keep Coding

JavaScript is the programming language of the Web.