MongoDB and Node js Asynchronous programming - javascript

I am trying to solve an exam problem, so I cannot post my exam code as it is. So I have simplified such that it addresses the core concept that I do not understand. Basically, I do not know how to slow down node's asynchronous execution so that my mongo code can catch up with it. Here is the code:
MongoClient.connect('mongodb://localhost:27017/somedb', function(err, db) {
if (err) throw err;
var orphans = [];
for (var i; i < 100000; i++) {
var query = { 'images' : i };
db.collection('albums').findOne(query, function(err, doc_album) {
if(err) throw err;
if (doc_album === null) {
orphans.push(i);
}
});
}
console.dir(orphans.length);
return db.close();
});
So I am trying to create an array of those images who do not match my query criteria. I end up with a orphans.length value of 0 since Node does not wait for the callbacks to finish. How can I modify the code such that the callbacks finish executing before I count the number of images in the array that did not meet my query criteria?
Thanks in advance for your time.
Bharat

I assume you want to do 100000 parallel DB calls. To "wait" 10000 calls completion in each call callback we increase finished calls counter and invoke main callback when last one finished. Note that very common mistake here is to use for loop variable as a closure inside callback. This does not work as expected as all 10000 handlers scheduled first and by the time first is executed loop variable is of the same, maximum value.
function getOrphans(cb) {
MongoClient.connect('mongodb://localhost:27017/somedb', function(err, db) {
if (err) cb(err);
var orphans = [];
var numResponses = 0;
var maxIndex = 100000
for (var i = 0; i < maxIndex; i++) {
// problem: by the time you get reply "i" would be 100000.
// closure variable changed to function argument:
(function(index) {
var query = { 'images' : index };
db.collection('albums').findOne(query, function(err, doc_album) {
numResponses++;
if(err) cb(err);
if (doc_album === null) {
orphans.push(index);
}
if (numResponses == maxIndex) {
db.close();
cb(null, orphans);
}
});
})(i); // this is "immediately executed function
}
});
}
getOrphans(function(err, o) {
if (err)
return console.log('error:', err);
console.log(o.length);
});

Im not suggesting this is the best way to handle this specific problem in Mongo, but if you need to wait to the DB to reply before continuing then just use the callback to start next request.
This is not obvious at first, but you can refer to the result processing function inside the function itself:
var i = 0;
var mycback = function(err, doc_album) {
// ... process i-th result ...
if (++i < 100000) {
db.collections("album").findOne({'images': i}, mycback);
} else {
// request is complete, "return" result
result_cback(null, res);
}
};
db.collections('album').findOne({'images': 0}, mycback);
This also means that your function itself will be async (i.e. will want a result_cback parameter to call with the result instead of using return).
Writing a sync function that calls an async one is just not possible.
You cannot "wait" for an event in Javascript... you must set up an handler for the result and then terminate.
Waiting for an event is done in event-based processing by writing a "nested event loop" and this is for example how message boxes are handled in most GUI frameworks. This is a capability that Javascript designers didn't want to give to programmers (not really sure why, though).

Since you know it does not wait for the call to come back. You can do the console.dir inside your callback function, this should work (although I haven't tested it)
db.collection('albums').findOne(query, function(err, doc_album) {
if(err) throw err;
if (doc_album === null) {
orphans.push(i);
}
console.dir(orphans.length);
});

You don't need to slow anything down. If you are simply trying to load 100,000 images from the albums collection, you could consider using the async framework. This will let you assign tasks until the job is complete.
Also, you probably don't want request 100,000 records one-by-one. Instead, you probably want to page them.

Related

Node.JS How to set a variable outside the current scope

I have some code that I cant get my head around, I am trying to return an array of object using a callback, I have a function that is returning the values and then pushing them into an array but I cant access this outside of the function, I am doing something stupid here but can't tell what ( I am very new to Node.JS )
for (var index in res.response.result) {
var marketArray = [];
(function () {
var market = res.response.result[index];
createOrUpdateMarket(market, eventObj , function (err, marketObj) {
marketArray.push(marketObj)
console.log('The Array is %s',marketArray.length) //Returns The Array is 1.2.3..etc
});
console.log('The Array is %s',marketArray.length) // Returns The Array is 0
})();
}
You have multiple issues going on here. A core issue is to gain an understanding of how asynchronous responses work and which code executes when. But, in addition to that you also have to learn how to manage multiple async responses in a loop and how to know when all the responses are done and how to get the results in order and what tools can best be used in node.js to do that.
Your core issue is a matter of timing. The createOrUpdateMarket() function is probably asynchronous. That means that it starts its operation when the function is called, then calls its callback sometime in the future. Meanwhile the rest of your code continues to run. Thus, you are trying to access the array BEFORE the callback has been called.
Because you cannot know exactly when that callback will be called, the only place you can reliably use the callback data is inside the callback or in something that is called from within the callback.
You can read more about the details of the async/callback issue here: Why is my variable unaltered after I modify it inside of a function? - Asynchronous code reference
To know when a whole series of these createOrUpdateMarket() operations are all done, you will have to code especially to know when all of them are done and you cannot rely on a simple for loop. The modern way to do that is to use promises which offer tools for helping you manage the timing of one or more asynchronous operations.
In addition, if you want to accumulate results from your for loop in marketArray, you have to declare and initialize that before your for loop, not inside your for loop. Here are several solutions:
Manually Coded Solution
var len = res.response.result.length;
var marketArray = new Array(len), cntr = 0;
for (var index = 0, index < len; index++) {
(function(i) {
createOrUpdateMarket(res.response.result[i], eventObj , function (err, marketObj) {
++cntr;
if (err) {
// need error handling here
}
marketArray[i] = marketObj;
// if last response has just finished
if (cntr === len) {
// here the marketArray is fully populated and all responses are done
// put your code to process the marketArray here
}
});
})(index);
}
Standard Promises Built Into Node.js
// make a version of createOrUpdateMarket that returns a promise
function createOrUpdateMarketAsync(a, b) {
return new Promise(function(resolve, reject) {
createOrUpdateMarket(a, b, function(err, marketObj) {
if (err) {
reject(err);
return;
}
resolve(marketObj);
});
});
}
var promises = [];
for (var i = 0; i < res.response.result.length; i++) {
promises.push(createorUpdateMarketAsync(res.response.result[i], eventObj));
}
Promise.all(promises).then(function(marketArray) {
// all results done here, results in marketArray
}, function(err) {
// an error occurred
});
Enhanced Promises with the Bluebird Promise library
The bluebird promise library offers Promise.map() which will iterate over your array of data and produce an array of asynchronously obtained results.
// make a version of createOrUpdateMarket that returns a promise
var Promise = require('bluebird');
var createOrUpdateMarketAsync = Promise.promisify(createOrUpdateMarket);
// iterate the res.response.result array and run an operation on each item
Promise.map(res.response.result, function(item) {
return createOrUpdateMarketAsync(item, eventObj);
}).then(function(marketArray) {
// all results done here, results in marketArray
}, function(err) {
// an error occurred
});
Async Library
You can also use the async library to help manage multiple async operations. In this case, you can use async.map() which will create an array of results.
var async = require('async');
async.map(res.response.result, function(item, done) {
createOrUpdateMarker(item, eventObj, function(err, marketObj) {
if (err) {
done(err);
} else {
done(marketObj);
}
});
}, function(err, results) {
if (err) {
// an error occurred
} else {
// results array contains all the async results
}
});

Node MySql Callback for Multiple Queries

I ran into an issue whilst attempting to create the logic to add rows to a new table I made on my MySql database. When adding a row I need to query the database 4 times to check other rows and to then add the correct value to the new row. I am using node.js and the mysql module to accomplish this. While coding I ran into a snag, the code does not wait for the 4 queries to finish before inserting the new row, this then gives the values being found a value of 0 every time. After some research I realize a callback function would be in order, looking something like this:
var n = 0;
connection.query("select...", function(err, rows){
if(err) throw err;
else{
if(rows.length === 1) ++n;
}
callback();
});
function callback(){
connection.query("insert...", function(err){
if(err) throw err;
});
}
Note: The select queries can only return one item so the if condition should not effect this issue.
A callback function with only one query to wait on is clear to me, but I become a bit lost for multiple queries to wait on. The only idea that I had would be to create another variable that increments before the callback is called, and is then passed in the callback function's arguments. Then inside the callback the query could be encapsulated by an if statement with a condition of this being the variable equaling the number of queries that need to be called, 4 for my purposes here. I could see this working but wasn't sure if this sort of situation already has a built in solution or if there are other, better, solutions already developed.
You need async (https://github.com/caolan/async). You can do a very complex logic with this module.
var data = {} //You can do this in many ways but one way is defining a global object so you can add things to this object and every function can see it
firstQueryFunction(callback){
//do your stuff with mysql
data.stuff = rows[0].stuff; //you can store stuff inside your data object
callback(null);
}
secondQueryFunction(callback){
//do your stuff with mysql
callback(null);
}
thirdQueryFunction(callback){
//do your stuff with mysql
callback(null);
}
fourthQueryFunction(callback){
//do your stuff with mysql
callback(null);
}
//This functions will be executed at the same time
async.parallel([
firstQueryFunction,
secondQueryFunction,
thirdQueryFunction,
fourthQueryFunction
], function (err, result) {
//This code will be executed after all previous queries are done (the order doesn't matter).
//For example you can do another query that depends of the result of all the previous queries.
});
As per Gesper's answer I'd recommend the async library, however, I would probably recommend running in parallel (unless the result of the 1st query is used as input to the 2nd query).
var async = require('async');
function runQueries(param1, param2, callback) {
async.parallel([query1, query2, query3(param1, param2), query4],
function(err, results) {
if(err) {
callback(err);
return;
}
var combinedResult = {};
for(var i = 0; i < results.length; i++) {
combinedResult.query1 = combinedResult.query1 || result[i].query1;
combinedResult.query2 = combinedResult.query2 || result[i].query2;
combinedResult.query3 = combinedResult.query3 || result[i].query3;
combinedResult.query4 = combinedResult.query4 || result[i].query4;
}
callback(null, combinedResult);
});
}
function query1(callback) {
dataResource.Query(function(err, result) {
var interimResult = {};
interimResult.query1 = result;
callback(null, interimResult);
});
}
function query2(callback) {
dataResource.Query(function(err, result) {
var interimResult = {};
interimResult.query2 = result;
callback(null, interimResult);
});
}
function query3(param1, param2) {
return function(callback) {
dataResource.Query(param1, param2, function(err, result) {
var interimResult = {};
interimResult.query3 = result;
callback(null, interimResult);
});
}
}
function query4(callback) {
dataResource.Query(function(err, result) {
var interimResult = {};
interimResult.query4 = result;
callback(null, interimResult);
});
}
Query3 shows the use of parameters being 'passed through' to the query function.
I'm sure someone can show me a much better way of combining the result, but that is the best I have come up with so far. The reason for the use of the interim object, is that the "results" parameter passed to your callback is an array of results, and it can be difficult to determine which result is for which query.
Good luck.

Populating async array with a function called right before.

var request = require('request'),
requests = [],
values = [],
request("url1", function());
function() {
.....
for (x in list){
requests.push(requestFunction(x));
}
}
requestFunction(x){
request("url2", function (e,r,b) {
....
return function(callback) {
values[i] = b
}
});
}
async.parallel(requests, function (allResults) {
// values array is ready at this point
// the data should also be available in the allResults array
console.log(values);
});
I new to node. Issue is that the request needs to be called to populate the requests callback array. But the issue is the async.parallel will run before the requests array is full and need run all the callbacks. Where do I move this async so it runs after the requests array is full?
Asynchronous programming is all about chaining blocks. This allows node to efficiently run its event queue, while ensuring that your steps are done in order. For example, here's a query from a web app I wrote:
app.get("/admin", isLoggedIn, isVerified, isAdmin, function (req, res) {
User.count({}, function (err, users) {
if (err) throw err;
User.count({"verified.isVerified" : true}, function (err2, verifiedUsers) {
if (err2) throw err2;
Course.count({}, function (err3, courses) {
// and this continues on and on — the admin page
// has a lot of information on all the documents in the database
})
})
})
})
Notice how I chained function calls inside of one another. Course.count({}, ...) could only be called once User.count({"verified.isVerified" : true}, ...) was called. This means the i/o is never blocked and the /admin page is never rendered without the required information.
You didn't really give enough information regarding your problem (so there might be a better way to fix it), but I think you could, for now, do this:
var request = require('request'),
requests = [],
values = [],
length; // a counter to store the number of times to run the loop
request("url1", function() {
length = Object.keys(list).length;
// referring to the list below;
// make sure list is available at this level of scope
for (var x in list){
requests.push(requestFunction(x));
length--;
if (length == 0) {
async.parallel(requests, function (allResults) {
console.log(values); // prints the values array
});
}
}
}
function requestFunction(x) {
request("url2", function (e,r,b) {
values[i] = b;
return b;
}
}
I am assuming that requestFunction() takes a while to load, which is why async.parallel is running before the for (var x in list) loop finishes. To force async.parallel to run after the loop finishes, you'll need a counter.
var length = Object.keys(list).length;
This returns the number of keys in the list associative array (aka object). Now, every time you run through the for loop, you decrement length. When length == 0, you then run your async.parallel process.
edit: You could also write the requests.push() part as:
requests.push(
(function () {
request("url2", function (e,r,b) {
values[i] = b;
return b;
}
})()
);
I think it's redundant to store b in both values and requests, but I have kept it as you had it.

Nodejs Running Functions in Series

So right now I'm trying to use Nodejs to access files in order to write them to a server and process them.
I've split it into the following steps:
Traverse directories to generate an array of all of the file paths
Put the raw text data from each of file paths in another array
Process the raw data
The first two steps are working fine, using these functions:
var walk = function(dir, done) {
var results = [];
fs.readdir(dir, function(err, list) {
if (err) return done(err);
var pending = list.length;
if (!pending) return done(null, results);
list.forEach(function(file) {
file = path.resolve(dir, file);
fs.stat(file, function(err, stat) {
if (stat && stat.isDirectory()) {
walk(file, function(err, res) {
results = results.concat(res);
if (!--pending) done(null, results);
});
} else {
results.push(file);
if (!--pending) done(null, results);
}
});
});
});
};
function processfilepaths(callback) {
// reading each file
for (var k in filepaths) { if (arrayHasOwnIndex(filepaths, k)) {
fs.readFile(filepaths[k], function (err, data) {
if (err) throw err;
rawdata[k] = data.toString().split(/ *[\t\r\n\v\f]+/g);
for (var j in rawdata[k]) { if (arrayHasOwnIndex(rawdata[k], j)) {
rawdata[k][j] = rawdata[k][j].split(/: *|: +/);
}}
});
}}
if (callback) callback();
}
Obviously, I want to call the function processrawdata() after all of the data has been loaded. However, using callbacks doesn't seem to work.
walk(rootdirectory, function(err, results) {
if (err) throw err;
filepaths = results.slice();
processfilepaths(processrawdata);
});
This never causes an error. Everything seems to run perfectly except that processrawdata() is always finished before processfilepaths(). What am I doing wrong?
You are having a problem with callback invocation and asynchronously calling functions. IMO I'll recommend that you use a library such as after-all to execute a callback once all your functions get executed.
Here's a example, here the function done will be called once all the functions wrapped with next are called.
var afterAll = require('after-all');
// Call `done` once all the functions
// wrapped with next() get called
next = afterAll(done);
// first execute this
setTimeout(next(function() {
console.log('Step two.');
}), 500);
// then this
setTimeout(next(function() {
console.log('Step one.');
}), 100);
function done() {
console.log("Yay we're done!");
}
I think for your problem, you can use async module for Node.js:
async.series([
function(){ ... },
function(){ ... }
]);
To answer you actual question, I need to explain how Node.js works:
Say, when you call an async operation (say mysql db query), Node.js sends "execute this query" to MySQL. Since this query will take some time (may be some milliseconds), Node.js performs the query using the MySQL async library - getting back to the event loop and doing something else there while waiting for MySQL to get back to us. Like handling that HTTP request.
So, In your case both functions are independent and executes almost in parallel.
For more information:
Async.js for use with Node.js
function processfilepaths(callback) {
// reading each file
for (var k in filepaths) { if (arrayHasOwnIndex(filepaths, k)) {
fs.readFile(filepaths[k], function (err, data) {
if (err) throw err;
rawdata[k] = data.toString().split(/ *[\t\r\n\v\f]+/g);
for (var j in rawdata[k]) { if (arrayHasOwnIndex(rawdata[k], j)) {
rawdata[k][j] = rawdata[k][j].split(/: *|: +/);
}}
});
}}
if (callback) callback();
}
Realize that you have:
for
readfile (err, callback) {... }
if ...
Node will call each readfile asynchronously, which only sets up the event and callback, then when it is done calling each readfile, it will do the if, before the callback probably even has a chance to get invoked.
You need to use either Promises, or a promise module like async to serialize it. What you would then do looks like:
async.XXXX(filepaths, processRawData,
function (err, ...) {
// function for when all are done
if (callback) callback();
}
);
Where XXXX is one of the functions from the library like series, parallel, each, etc... The only thing you also need to know is in your process raw data, async gives you a callback to call when done. Unless you really need sequential access (I don't think you do) use parallel so that you can queue up as many i/o events as possible, it should execute faster, maybe only marginally, but it'll better leverage the hardware.

How to write an asynchronous for-each loop in Express.js and mongoose?

I have a function that returns an array of items from MongoDB:
var getBooks = function(callback){
Comment.distinct("doc", function(err, docs){
callback(docs);
}
});
};
Now, for each of the items returned in docs, I'd like to execute another mongoose query, gather the count for specific fields, gather them all in a counts object, and finally pass that on to res.render:
getBooks(function(docs){
var counts = {};
docs.forEach(function(entry){
getAllCount(entry, ...){};
});
});
If I put res.render after the forEach loop, it will execute before the count queries have finished. However, if I include it in the loop, it will execute for each entry. What is the proper way of doing this?
I'd recommend using the popular NodeJS package, async. It's far easier than doing the work/counting, and eventual error handling would be needed by another answer.
In particular, I'd suggest considering each (reference):
getBooks(function(docs){
var counts = {};
async.each(docs, function(doc, callback){
getAllCount(entry, ...);
// call the `callback` with a error if one occured, or
// empty params if everything was OK.
// store the value for each doc in counts
}, function(err) {
// all are complete (or an error occurred)
// you can access counts here
res.render(...);
});
});
or you could use map (reference):
getBooks(function(docs){
async.map(docs, function(doc, transformed){
getAllCount(entry, ...);
// call transformed(null, theCount);
// for each document (or transformed(err); if there was an error);
}, function(err, results) {
// all are complete (or an error occurred)
// you can access results here, which contains the count value
// returned by calling: transformed(null, ###) in the map function
res.render(...);
});
});
If there are too many simultaneous requests, you could use the mapLimit or eachLimit function to limit the amount of simultaneous asynchronous mongoose requests.
forEach probably isn't your best bet here, unless you want all of your calls to getAllCount happening in parallel (maybe you do, I don't know — or for that matter, Node is still single-threaded by default, isn't it?). Instead, just keeping an index and repeating the call for each entry in docs until you're done seems better. E.g.:
getBooks(function(docs){
var counts = {},
index = 0,
entry;
loop();
function loop() {
if (index < docs.length) {
entry = docs[index++];
getAllCount(entry, gotCount);
}
else {
// Done, call `res.render` with the result
}
}
function gotCount(count) {
// ...store the count, it relates to `entry`...
// And loop
loop();
}
});
If you want the calls to happen in parallel (or if you can rely on this working in the single thread), just remember how many are outstanding so you know when you're done:
// Assumes `docs` is not sparse
getBooks(function(docs){
var counts = {},
received = 0,
outstanding;
outstanding = docs.length;
docs.forEach(function(entry){
getAllCount(entry, function(count) {
// ...store the count, note that it *doesn't* relate to `entry` as we
// have overlapping calls...
// Done?
--outstanding;
if (outstanding === 0) {
// Yup, call `res.render` with the result
}
});
});
});
In fact, getAllCount on first item must callback getAllCount on second item, ...
Two way: you can use a framework, like async : https://github.com/caolan/async
Or create yourself the callback chain. It's fun to write the first time.
edit
The goal is to have a mechanism that proceed like we write.
getAllCountFor(1, function(err1, result1) {
getAllCountFor(2, function(err2, result2) {
...
getAllCountFor(N, function(errN, resultN) {
res.sender tout ca tout ca
});
});
});
And that's what you will construct with async, using the sequence format.

Categories