How to prevent nested promise from hanging? - javascript

The following code works (the user object is written to the console), however the process doesn't exit. I believe one of the promises must not be resolved?
var Promise = require("bluebird");
var mongodb = require('mongodb');
Promise.promisifyAll(mongodb);
mongodb.MongoClient.connectAsync("mongodb://localhost/test")
.then(function(db){
var users = db.collection('users');
return users.findOneAsync({userName: "someuser"});
})
.then(function (result) {
console.log(result);
})
.catch(function(e){
//handle error
});
What is wrong with this code?

MongoDB creates a persistent connection which you're supposed to use for the whole lifecycle of your application.
When you're done with it - close it. That is - call db.close()
If you want to write saner code, use Promise.using and a disposer for making a saner connectAsync which does resource management for you.

Related

Node.js / Mongoose - Undo Database Modifications on Error

I have quite a complex express route in my node.js server, that makes many modifications to the database via mongoose.
My goal is to implement a mechanism, that reverts all changes when any error occurs. My idea was implementing functions for undoing into the catch block.
But this is quite ugly, as I have to know what the previous values were and what if an error occurs in the catch block? It's especially difficult to revert those changes, when an error occurred during a Promise.all(array.map( /* ... */ ))
My route looks akin to this:
module.exports = (req, res) => {
var arr1, var2, var3
try {
const body = req.body
arr1 = await fetchVar1(body)
const _data = await Promise.all([
Promise.all(
arr1.map(async x => {
const y = await fetchSometing(x)
return doSometing(y)
})
),
doSomething3(arr1),
])
var2 = _data[1]
var3 = _data[2]
return res.json({ arr1, var2, var3 })
} catch (err) {
/**
* If an error occurs I have to undo
* everything that has been done
* in the try block
*/
}
}
Preferably I would like to implement something that "batches" all changes and "commits" the changes if no errors occurred.
What you are looking for is transactions: https://mongoosejs.com/docs/transactions.html
Manually undoing stuff after doing them won't protect you from every issue, so you should not rely on that. For example, exactly as you wrote: what happens if there is a crash after a partial write (some data is written, some is not), then another crash during your "rollback" code, which does not cleanup everything? If your code depends on your data being absolutely clean, then you have a problem. Your code should either be able to handle partial data correctly, or you must have some way to guarantee that your data is perfectly good at all times.
Transactions is the way to go, because it only commits everything at once if everything works.
What you’re looking for is called Transactions.
Transactions are new in MongoDB 4.0 and Mongoose 5.2.0. Transactions let you execute multiple operations in isolation and potentially undo all the operations if one of them fails. This guide will get you started using transactions with Mongoose.
For more information check the link below:
https://mongoosejs.com/docs/transactions.html

Duplicate Array Data Web Scraping

I can't seem to get the article duplicates out of my web scraper results, this is my code:
app.get("/scrape", function (req, res) {
request("https://www.nytimes.com/", function (error, response, html) {
// Load the HTML into cheerio and save it to a variable
// '$' becomes a shorthand for cheerio's selector commands, much like jQuery's '$'
var $ = cheerio.load(html);
var uniqueResults = [];
// With cheerio, find each p-tag with the "title" class
// (i: iterator. element: the current element)
$("div.collection").each(function (i, element) {
// An empty array to save the data that we'll scrape
var results = [];
// store scraped data in appropriate variables
results.link = $(element).find("a").attr("href");
results.title = $(element).find("a").text();
results.summary = $(element).find("p.summary").text().trim();
// Log the results once you've looped through each of the elements found with cheerio
db.Article.create(results)
.then(function (dbArticle) {
res.json(dbArticle);
}).catch(function (err) {
return res.json(err);
});
});
res.send("You scraped the data successfully.");
});
});
// Route for getting all Articles from the db
app.get("/articles", function (req, res) {
// Grab every document in the Articles collection
db.Article.find()
.then(function (dbArticle) {
res.json(dbArticle);
})
.catch(function (err) {
res.json(err);
});
});
Right now I am getting five copies of each article sent to the user. I have tried db.Article.distinct and various versions of this to filter the results down to only unique articles. Any tips?
In Short:
Switching the var results = [] from an Array to an Object var results = {} did the trick for me. Still haven't figured out the exact reason for the duplicate insertion of documents in database, will update as soon I find out.
Long Story:
You have multiple mistakes and points of improvement there in your code. I will try pointing them out:
Let's follow them first to make your code error free.
Mistakes
1. Although mongoose's model.create, new mongoose() does seem to work fine with Arrays but I haven't seen such a use before and it does not even look appropriate.
If you intend to create documents one after another then represent your documents using an object instead of an Array. Using an array is more mainstream when you intend to create multiple documents at once.
So switch -
var results = [];
to
var results = {};
2. Sending response headers after they are already sent will create for you an error. I don't know if you have already noticed it or not but its pretty much clear upfront as once the error is popped up the remaining documents won't get stored because of PromiseRejection Error if you haven't setup a try/catch block.
The block inside $("div.collection").each(function (i, element) runs asynchronously so your process control won't wait for each document to get processed, instead it would immediately execute res.send("You scraped the data successfully.");.
This will effectively terminate the Http connection between the client and the server and any further issue of response termination calls like res.json(dbArticle) or res.json(err) will throw an error.
So, just comment the res.json statements inside the .create's then and catch methods. This will although terminate the response even before the whole articles are saved in the DB but you need not to worry as your code would still work behind the scene saving articles in database for you (asynchronously).
If you want your response to be terminated only after you have successfully saved the data then change your middleware implementation to -
request('https://www.nytimes.com', (err, response, html) => {
var $ = cheerio.load(html);
var results = [];
$("div.collection").each(function (i, element) {
var ob = {};
ob.link = $(element).find("a").attr("href");
ob.title = $(element).find("a").text();
ob.summary = $(element).find("p.summary").text().trim();
results.push(ob);
});
db.Article.create(results)
.then(function (dbArticles) {
res.json(dbArticles);
}).catch(function (err) {
return res.json(err);
});
});
After making above changes and even after the first one, my version of your code ran fine. So if you want you can continue on with your current version, or you may try reading some points of improvement.
Points of Improvements
1. Era of callbacks is long gone:
Convert your implementation to utilise Promises as they are more maintainable and easier to reason about. Here are the things you can do -
Change request library from request to axios or any one which supports Promises by default.
2. Make effective use of mongoose methods for insertion. You can perform bulk inserts of multiple statements in just one query. You may find docs on creating documents in mongodb quite helpful.
3. Start using some frontend task automation library such as puppeteer or nightmare.js for data scraping related task. Trust me, they make life a hell lot easier than using cheerio or any other library for the same. Their docs are really good and well maintained so you won't have have hard time picking these up.

Monitoring pending async operations in Node.js promised environment

I have built in Node.js a very stable robot app that basically sends requests continuously to an API. To make sure nothing can go wrong, I handle any possible error and I have set timeouts for promises that could take too long to resolve...
Now, I would like to improve the app by removing my safety nets, and monitoring async operations to find any kind of "async leaking", eg promises pending forever or any strange outcome I'm not aware of (that's the point of my question).
Are there any tools meant to monitor the Node.js async flow ? For instance, getting the total amount of pending promises in the process at a given time ? Or getting a warning if any promise has been pending for more than a given time, and tracking that promise ?
If that may guide answers, here are the modules I use :
// Bluebird (promises)
var Promise = require("bluebird");
// Mongoose with promises
var mongoose = require('mongoose');
mongoose.Promise = require('bluebird');
// Rate limiter with promises
var Bottleneck = require("bottleneck");
// Promisified requests
var request = require('request-promise');
Sorry for not being able to formulate my question precisely : I have no clue as to what I can expect/wish for exactly...
EDIT : So far, my research has led me to :
Bluebird's resource management tools, but I can't figure out a way to make them useful
The amazing npm monitor and the shipped-in monitor-dashboard, but for some reason I can't yet make it work for my needs...
Since I'm still developing the app and have a life besides the app, I don't have much time to look into it, but I'm definitely going to address this question seriously at some point !
Sounds like you might need some kind of job management.
I've been trying kue lately to manage asynchronous jobs and it's pretty good.
It has an API for starting and running jobs. Each job can report it's progress. It comes with a builtin job dashboard that shows you the running jobs and their progress. It has an extensive set of events so that you can monitor the status of each job.
You need to install Redis to use this, but that's not difficult.
It doesn't support promises, but you can see in my code example below that it's easy to start a job and then wrap it in a promise so that you can await job completion:
const queue = kue.createQueue();
queue.process("my-job", 1, (job, done) => {
const result = ... some result ...
// Run your job here.
if (an error occurs) {
done(err); // Job failure.
return;
}
done(null, result); // Job success
});
function runMyJob () {
return new Promise((resolve, reject) => {
const job = queue.create("my-job").save();
job.on('complete', result => {
resolve(result); // Job has completed successfully, resolve the promise.
});
job.on("failed", err => {
reject(err); // Job has failed, reject the promise.
});
});
};
runMyJob()
.then(() => {
console.log("Job completed.")
})
.catch(err => {
console.error("Job failed.");
});
It's fairly easy to make this work in parallel over multiple CPUs using the Node.js cluster module.
Read more on the kue web page.

promise pending error in mongodb and nodejs

I have written node.js code for getting some number using mongodb database.this is my code for that
MongoClient.connect('mongodb://localhost:27017/mongomart', function(err, db) {
assert.equal(null, err);
var numItems=db.collection('item').find({"category":category}).count();
callback(numItems);
});
This mongodb query runs correct on mongo shell but it is giving error when using with node.js
Promise <Pending>
I don't know what is this "promise" ? Please help..
node.js code is asynchronous so that numItems won't contain count of items - it rather contains Promise that contains count of items when resolved. You defenetely have to master the basics of node.js and asynchronous programming. Try to modify your code like this
MongoClient.connect('mongodb://localhost:27017/mongomart', function(err, db) {
assert.equal(null, err);
db.collection('item').find({"category":category}).count()
.then(function(numItems) {
console.log(numItems); // Use this to debug
callback(numItems);
})
});
For native Promise check out documentation https://developer.mozilla.org/ru/docs/Web/JavaScript/Reference/Global_Objects/Promise
Also look at bluebird promises https://github.com/petkaantonov/bluebird
A promise is a substitute temporary value that is given while you wait on the real value. To get the real value do
numItems.then(function (value) { callback(value) });
Or better yet, return the promise from your function, and let they implement it using the Promises pattern, instead of the callback pattern.
Had the same problem. Don't know if it's still relevant to you, but this is what solved it for me:
var category = 'categoryToSearch';
var cursor = db.collection('item').find({'category':category});
cursor.count(function (err, num) {
if(err) {
return console.log(err);
}
return num;
});
I drove myself bananas trying to solve a similar problem, where the document.save() option just gave Promise{pending} no matter what I did. Here's what I did:
Change (req,res) to async(req,res).
Change var post = doc.save() to var post = await doc.save().
Finally, log in to MongoDB web, and change accessible IP addresses to 0.0.0.0 (all addresses). Not doing this can cause issues sometimes even when your IP is whitelisted.
try this:
MongoClient.connect('mongodb://localhost:27017/mongomart', async (err, db) => {
assert.equal(null, err);
var numItems= await db.collection('item').find({"category":category}).count();
callback(numItems);
});
(adding the await and turn this function to async function)

How can I promisify the MongoDB native Javascript driver using bluebird?

I'd like to use the MongoDB native JS driver with bluebird promises. How can I use Promise.promisifyAll() on this library?
The 2.0 branch documentation contains a better promisification guide https://github.com/petkaantonov/bluebird/blob/master/API.md#promisification
It actually has mongodb example which is much simpler:
var Promise = require("bluebird");
var MongoDB = require("mongodb");
Promise.promisifyAll(MongoDB);
When using Promise.promisifyAll(), it helps to identify a target prototype if your target object must be instantiated. In case of the MongoDB JS driver, the standard pattern is:
Get a Db object, using either MongoClient static method or the Db constructor
Call Db#collection() to get a Collection object.
So, borrowing from https://stackoverflow.com/a/21733446/741970, you can:
var Promise = require('bluebird');
var mongodb = require('mongodb');
var MongoClient = mongodb.MongoClient;
var Collection = mongodb.Collection;
Promise.promisifyAll(Collection.prototype);
Promise.promisifyAll(MongoClient);
Now you can:
var client = MongoClient.connectAsync('mongodb://localhost:27017/test')
.then(function(db) {
return db.collection("myCollection").findOneAsync({ id: 'someId' })
})
.then(function(item) {
// Use `item`
})
.catch(function(err) {
// An error occurred
});
This gets you pretty far, except it'll also help to make sure the Cursor objects returned by Collection#find() are also promisified. In the MongoDB JS driver, the cursor returned by Collection#find() is not built from a prototype. So, you can wrap the method and promisify the cursor each time. This isn't necessary if you don't use cursors, or don't want to incur the overhead. Here's one approach:
Collection.prototype._find = Collection.prototype.find;
Collection.prototype.find = function() {
var cursor = this._find.apply(this, arguments);
cursor.toArrayAsync = Promise.promisify(cursor.toArray, cursor);
cursor.countAsync = Promise.promisify(cursor.count, cursor);
return cursor;
}
I know this has been answered several times, but I wanted to add in a little more information regarding this topic. Per Bluebird's own documentation, you should use the 'using' for cleaning up connections and prevent memory leaks.
Resource Management in Bluebird
I looked all over the place for how to do this correctly and information was scarce so I thought I'd share what I found after much trial and error. The data I used below (restaurants) came from the MongoDB sample data. You can get that here: MongoDB Import Data
// Using dotenv for environment / connection information
require('dotenv').load();
var Promise = require('bluebird'),
mongodb = Promise.promisifyAll(require('mongodb'))
using = Promise.using;
function getConnectionAsync(){
// process.env.MongoDbUrl stored in my .env file using the require above
return mongodb.MongoClient.connectAsync(process.env.MongoDbUrl)
// .disposer is what handles cleaning up the connection
.disposer(function(connection){
connection.close();
});
}
// The two methods below retrieve the same data and output the same data
// but the difference is the first one does as much as it can asynchronously
// while the 2nd one uses the blocking versions of each
// NOTE: using limitAsync seems to go away to never-never land and never come back!
// Everything is done asynchronously here with promises
using(
getConnectionAsync(),
function(connection) {
// Because we used promisifyAll(), most (if not all) of the
// methods in what was promisified now have an Async sibling
// collection : collectionAsync
// find : findAsync
// etc.
return connection.collectionAsync('restaurants')
.then(function(collection){
return collection.findAsync()
})
.then(function(data){
return data.limit(10).toArrayAsync();
});
}
// Before this ".then" is called, the using statement will now call the
// .dispose() that was set up in the getConnectionAsync method
).then(
function(data){
console.log("end data", data);
}
);
// Here, only the connection is asynchronous - the rest are blocking processes
using(
getConnectionAsync(),
function(connection) {
// Here because I'm not using any of the Async functions, these should
// all be blocking requests unlike the promisified versions above
return connection.collection('restaurants').find().limit(10).toArray();
}
).then(
function(data){
console.log("end data", data);
}
);
I hope this helps someone else out who wanted to do things by the Bluebird book.
Version 1.4.9 of mongodb should now be easily promisifiable as such:
Promise.promisifyAll(mongo.Cursor.prototype);
See https://github.com/mongodb/node-mongodb-native/pull/1201 for more details.
We have been using the following driver in production for a while now. Its essentially a promise wrapper over the native node.js driver. It also adds some additional helper functions.
poseidon-mongo - https://github.com/playlyfe/poseidon-mongo

Categories