Admittedly I'm a novice with node, but it seems like this should be working fine. I am using multiparty to parse a form, which returns an array. I am then using a for each to step through the array. However - the for each is not waiting for the inner code to execute. I am a little confused as to why it is not, though.
var return_GROBID = function(req, res, next) {
var form = new multiparty.Form();
var response_array = [];
form.parse(req, function(err, fields, files) {
files.PDFs.forEach(function (element, index, array) {
fs.readFile(element.path, function (err, data) {
var newPath = __dirname + "/../public/PDFs/" + element.originalFilename;
fs.writeFile(newPath, data, function (err) {
if(err) {
res.send(err);
}
GROBIDrequest.GROBID2js(newPath, function(response) {
response_array.push(response);
if (response_array.length == array.length) {
res.locals.body = response_array;
next();
}
});
});
});
});
});
}
If someone can give me some insight on the proper way to do this that would be great.
EDIT: The mystery continues. I ran this code on another machine and IT WORKED. What is going on? Why would one machine be inconsistent with another?
I'd guess the PDFs.forEach is you just calling the built-in forEach function, correct?
In Javascript many things are asynchronous - meaning that given:
linea();
lineb();
lineb may be executed before linea has finished whatever operation it started (because in asynchronous programming, we don't wait around until a network request comes back, for example).
This is different from other programming languages: most languages will "block" until linea is complete, even if linea could take time (like making a network request). (This is called synchronous programming).
With that preamble done, back to your original question:
So forEach is a synchronous function. If you rewrote your code like the following, it would work (but not be useful):
PDFs.forEach(function (element, index, array) {
console.log(element.path)
}
(console.log is a rare synchronous method in Javascript).
But in your forEach loop you have fs.readFile. Notice that last parameter, a function? Node will call that function back when the operation is complete (a callback).
Your code will currently, and as observed, hit that fs.readFile, say, "ok, next thing", and move on to the next item in the loop.
One way to fix this, with the least changing the code, is to use the async library.
async.forEachOf(PDFs, function(value, key, everythingAllDoneCallback) {
GROBIDrequest.GROBID2js(newPath, function(response) {
response_array.push(response);
if (response_array.length = array.length) {
...
}
everythingAllDoneCallback(null)
} );
With this code you are going through all your asynchronous work, then triggering the callback when it's safe to move on to the next item in the list.
Node and callbacks like this are a very common Node pattern, it should be well covered by beginner material on Node. But it is one of the most... unexpected concepts in Node development.
One resource I found on this was (one from a set of lessons) about NodeJS For Beginners: Callbacks. This, and playing around with blocking (synchronous) and non-blocking (asynchronous) functions, and hopefully this SO answer, may provide some enlightenment :)
Related
I have a tool who's basic idea is as follows:
//get a bunch of couchdb databases. this is an array
const jsonFile = require('jsonfile');
let dbList = getDbList();
const filePath = 'some/path/to/file';
const changesObject = {};
//iterate the db list. do asynchronous stuff on each iteration
dbList.forEach(function(db){
let merchantDb = nano.use(db);
//get some changes from the database. validate inside callback
merchantDb.get("_changes", function(err,changes){
validateChanges(changes);
changesObject['db'] = changes.someAttribute;
//write changes to file
jsonFile.writeFile(filePath, changesObject, function (err) {
if (err) {
logger.error("Unable to write to file: ");
}
});
})
const validateChanges = function(changes) {
if (!validateLogic(changes) sendAlertMail();
}
For performance improvements the iteration is not done synchronously. Therefore there can be multiple iterations running in 'parallel'. My question is can this cause any data inconsistencies and/or any issues with the file writing process?
Edit:
The same file gets written to on each iteration.
Edit:2
The changes are stored as a JSON object with key value pairs. The key being the db name.
If you're really writing to a single file, which you appear to be (though it's hard to be sure), then no; you have a race condition in which multiple callbacks will try to write to the same file, possibly at the same time (remember, I/O isn't done on the JavaScript thread in Node unless you use the *Sync functions), which will at best mean the last one wins and will at worst mean I/O errors because of overlap.
If you're writing to separate files for each db, then provided there's no cross-talk (shared state) amongst validateChanges, validateLogic, sendAlertMail, etc., that should be fine.
Just for detail: It will start tasks (jobs) getting the changes and then writing them out; the callbacks of the calls to get won't be run until later, when all of those jobs are queued.
You are creating closures in loops, but the way you're doing it is okay, both because you're doing it within the forEach callback and because you're not using db in the get callback (which would be fine with the forEach callback but not with some other ways you might loop arrays). Details on that aspect in this question's answers if you're interested.
This line is suspect, though:
let merchantDb = nano.use('db');
I suspect you meant (no quotes):
let merchantDb = nano.use(db);
For what it's worth, it sounds from the updates to the question and your various comments like the better solution would be not to write out the file separately each time. Instead, you want to gather up the changes and then write them out.
You can do that with the classic Node-callback APIs you're using like this:
let completed = 0;
//iterate the db list. do asynchronous stuff on each iteration
dbList.forEach(function(db) {
let merchantDb = nano.use(db);
//get some changes from the database. validate inside callback
merchantDb.get("_changes", function(err, changes) {
if (err) {
// Deal with the fact there was an error (don't return)
} else {
validateChanges(changes);
changesObject[db] = changes.someAttribute; // <=== NOTE: This line had 'db' rather than db, I assume that was meant to be just db
}
if (++completed === dbList.length) {
// All done, write changes to file
jsonFile.writeFile(filePath, changesObject, function(err) {
if (err) {
logger.error("Unable to write to file: ");
}
});
}
})
});
This is my simple task: Find images by id array and render images value into template.
router.get('/gallery', function(req, res) {
var images = [];
imagesIds.forEach(function(eachImageId) {
Images.findById(eachImageId).exec(function(findImageErr, foundImage) {
if (foundImage) {
images.push(foundImage);
}
});
});
res.render('gallery', {
images: images
});
});
The problem is the 'res.render' function does not wait for 'findById' function to finish. 'images' array always become '[]' empty.
I try to use generator but did not know how to achieve.
If someone can explain without library(like q) will be better. Because I want to know generator deeply how to deal with this problem.
Generators allow to write synchronous-like function, because they can stop its execution and resume it later.
I guess you already read some articles like this and know how to define generator function and use them.
Your asynchronous code can be represented as a simple iterator with a magic yield keyword. Generator function will run and stop here until you resume it using method next().
function* loadImages(imagesIds) {
var images = [], image;
for(imageId of imagesIds) {
image = yield loadSingleImage(imageId);
images.push(image);
}
return images;
}
Because there is a cycle, function will go though the cycle with each next() until all imagesIds will have been walked. Finally there will be executed return statement and you will get images.
Now we need to describe image loading. Our generator function need to know when current image have loaded and it can start to load next. All modern javascript runtimes (node.js and latest browsers) have native Promise object support and we will define a function which returns a promise and it will be eventually resolved with image if it will have been found.
function loadSingleImage(imageId) {
return new Promise((resolve, reject) => {
Images.findById(imageId).exec((findImageErr, foundImage) => {
if (foundImage) {
resolve(foundImage)
} else {
reject();
}
});
});
}
Well we have two functions, one for single image load and the second for putting them together. Now we need a some dispatcher for passing control from one to another function. Since your don't want to use libraries, we have to implement some helper by yourself.
It is a smaller version of spawn function, which can be simpler and better to understand, since we don't need to handle errors, but just ignore missing images.
function spawn(generator) {
function continuer(value) {
var result = generator.next(value);
if(!result.done) {
return Promise.resolve(result.value).then(continuer);
} else {
return result.value;
}
}
return continuer();
}
This functions performs a recursive calls of our generator within continuer function while the result.done is not true. Once it got, that means that generation has been successfully finished and we can return our value.
And finally, putting all together, you will get the following code for gallery loading.
router.get('/gallery', function(req, res) {
var imageGenerator = loadImages(imagesIds);
spawn(imageGenerator).then(function(images) {
res.render('gallery', {
images: images
});
});
});
Now you have a little bit pseudo-synchronous code in the loadImages function. And I hope it helps to understand how generators work.
Also note that all images will be loaded sequently, because we wait asynchronous result of loadSingleImage call to put it in array, before we can go to the next imageId. It can cause performance issues, if you are going to use this way in production.
Related links:
Mozilla Hacks – ES6 In Depth: Generators
2ality – ES6 generators in depth
Jake Archibald – ES7 async functions
It can be done without a 3rd party as you asked, but it would be cumbersome...
Anyway the bottom line is to do it inside the callback function "function(findImageErr,foundImage){..}".
1) Without a 3rd party you - you need to render only after all images were accounted for:
var images = [];
var results=0;
imagesIds.forEach(function(eachImageId) {
Images.findById(eachImageId).exec(function(findImageErr, foundImage) {
results++;
if(foundImage)
images.push(foundImage);
if(results == imagesIds.length)
res.render('gallery',{images:images});
});
});
2) I strongly recommend a 3rd party which would do the same.
I'm currently using async, but I might migrate to promises in the future.
async.map(
imageIds,
function(eachImageId,next){
Images.findById(eachImageId).exec(function(findImageErr, foundImage) {
next(null,foundImage);
// don't report errors to async, because it will abort
)
},
function(err, images){
images=_.compact(images); // remove null images, i'm using lodash
res.render('gallery',{images:images});
}
);
Edited: following your readability remark, please note if you create some wrapper function for 'findById(...).exec(...)' that ignores errors and just reports them as null (call it 'findIgnoreError'(imageId, callback)) then you could write:
async.map(
imageIds,
findIgnoreError,
function(err, images){
images=_.compact(images); // remove null images, i'm using lodash
res.render('gallery',{images:images});
}
);
In other words, it becomes a bit more readable if the reader starts to think Functions... It says "go over those imageIds in parallel, run "findIgnoreError" on each imageId, and the final section says what to do with the accumulated results...
Instead of querying mongo(or any DB) N times, I would just fire a single query using $in:
Images.find({ _id : { $in : imagesIds}},function(err,images){
if(err) return next(err);
res.render('gallery',{images:images});
});
This would also reduce the number of io's, plus you won't have to write additional code to handle res.render
I have this code:
var resources = myFunc();
myFunc2(resources);
The problem is that JavaScript calls myFunc() asynchronous, and then myFunc2(), but I don't have the results of myFunc() yet.
Is there a way to block the first call? Or a way to make this work?
The reason why this code doesn't work represents the beauty and pitfalls of async javascript. It doesn't work because it is not supposed to.
When the first line of code is executed, you have basically told node to go do something and let you know when it is done. It then moves on to execute the next line of code - which is why you don't have the response yet when you get here. For more on this, I would study the event-loop in greater detail. It's a bit abstract, but it might help you wrap your head around control flow in node.
This is where callbacks come in. A callback is basically a function you pass to another function that will execute when that second function is complete. The usual signature for a callback is (err, response). This enables you to check for errors and handle them accordingly.
//define first
var first = function ( callback ) {
//This function would do something, then
// when it is done, you callback
// if no error, hand in null
callback(err, res);
};
//Then this is how we call it
first ( function (err, res) {
if ( err ) { return handleError(err); }
//Otherwise do your thing
second(res)
});
As you might imagine, this can get complicated really quickly. It is not uncommon to end up with many nested callbacks which make your code hard to read and debug.
Extra:
If you find yourself in this situation, I would check out the async library. Here is a great tutorial on how to use it.
myFunc(), if asynchronous, needs to accept a callback or return a promise. Typically, you would see something like:
myFunc(function myFuncCallback (resources) {
myFunc2(resources);
});
Without knowing more about your environment and modules, I can't give you specific code. However, most asynchronous functions in Node.js allow you to specify a callback that will be called once the function is complete.
Assuming that myFunc calls some async function, you could do something like this:
function myFunc(callback) {
// do stuff
callSomeAsyncFunction(callback);
}
myFunc(myFunc2);
As a js/node newcomer, I'm having some problems understanding how I can get around this issue.
Basically I have a list of objects that I would like to save to a MongoDB database if they don't already exist.
Here is some code:
var getDataHandler = function (err, resp, body) {
var data = JSON.parse(body);
for (var i=0; i < data.length; i++) {
var item = data[i];
models.Entry.findOne({id: item.id}, function(err, res) {
if (err) { }
else if (result === null) {
var entry = new models.Entry(item);
feedbackEntry.save(function(err, result) {
if (err) {}
});
}
});
}
}
The problem I have is that because it is asynchronous, once the new models.Entry(item) line is executed the value of item will be equal to the last element in the data array for every single callback.
What kind of pattern can I use to avoid this issue ?
Thanks.
Two kinds of patterns are available :
1) Callbacks. That is you go on calling functions from your functions by passing them as parameters. Callbacks are generally fine but, especially server side when dealing with database or other asynchronous resources, you fast end in "callback hell" and you may grow tired of looking for tricks to reduce the indentation levels of your code. And you may sometimes wonder how you really deal with exceptions. But callbacks are the basis : you must understand how to deal with that problem using callbacks.
2) Promises. Using promises you may have something like that (example from my related blog post) :
db.on(userId) // get a connection from the pool
.then(db.getUser) // use it to issue an asynchronous query
.then(function(user){ // then, with the result of the query
ui.showUser(user); // do something
}).finally(db.off); // and return the connection to the pool
Instead of passing the next function as callback, you just chain with then (in fact it's a little more complex, you have other functions, for example to deal with collections and parallel resolution or error catching in a clean way).
Regarding your scope problem with the variable evolving before the callback is called, the standard solution is this one :
for (var i=0; i<n; i++) {
(function(i){
// any function defined here (a callback) will use the value of i fixed when iterating
})(i);
});
This works because calling a function creates a scope and the callback you create in that scope retains a pointer to that scope where it will fetch i (that's called a closure).
I have a list of tasks that I want to run in parallel using https://github.com/caolan/async.
I want the program to proceed (probably through a callback) after the first of these parallel tasks is complete, not all of them. So I don't think the naive
async.parallel([task1, task2], callback)
works for me.
Alternatively I could spawn two tasks and cancel the incomplete one, but I can't figure out how to do that using async either.
Thanks!
-Charlie
Parallel Race
You can get async to initiate the final callback by returning an error that evaluates as true but isn't actually an error.
I've put together an example that uses -1 as an error code. In the final callback I check the error value and if it's not -1 then it's an actual error. If the error value is -1 then we'll have a valid value in results. At that point, we just need to remove extra elements from results of the other async functions that have not completed yet.
In the below example I've used the request module to pull html pages and the underscore module to filter the results in the final callback.
var request = require('request');
var _ = require('underscore');
exports.parallel = function(req, res) {
async.parallel([
/* Grab Google.jp */
function(callback) {
request("http://google.jp", function(err, response, body) {
if(err) { console.log(err); callback(true); return; }
callback(-1,"google.jp");
});
},
/* Grab Google.com */
function(callback) {
request("http://google.com", function(err, response, body) {
if(err) { console.log(err); callback(true); return; }
callback(-1,"google.com");
});
}
],
/* callback handler */
function(err, results) {
/* Actual error */
if(err && err!=-1) {
console.log(err);
return;
}
/* First data */
if(err===-1) {
/*
* async#parallel returns a list, one element per parallel function.
* Functions that haven't finished yet are in the list as undefined.
* use underscore to easily filter the one result.
*/
var one = _.filter(results, function(x) {
return (x===undefined ? false : true);
})[0];
console.log(results);
console.log(one);
res.send(one);
}
}
);
};
Remaining Function Results
When you setup async#parallel to work like this you won't have access to the results of the other asynchronous functions. If you're only interested in the first one to respond then this isn't a problem. However, you will not be able to cancel the other requests. That's most likely not a problem, but it might be a consideration.
The async.parallel documentation says:
If any of the functions pass an error to its callback, the main callback is immediately called
with the value of the error.
So you could return an error object from all of your parallel functors, and the first one to finish would jump you to the completion callback. Perhaps even your own special error class, so you can tell the difference between an actual error and a "hey I won" error.
Having said that, you would still have your parallel functions running, potentially waiting for callbacks to complete or whatever. Perhaps you could use async.parallelLimit to make sure you're not firing off too many tasks in parallel ?
Having said all that, it's possible you are better served by trying another method from the async library for this task - firing off parallel tasks then having these tasks race each other may not be the best idea.