node.js and mongodb-native: wait till a collection is non-empty - javascript

I am using node.js and with the native mongodb driver (node-mongodb-native);
My current project uses node.js + now.js + mongo-db.
The system basically sends data from the browser to node.js, which is processed with haskell and later fed back to the browser again.
Via a form and node.js the text is inserted in a mongo-db collection called "messages".
A haskell thread reads the entry and stores the result in the db collection "results". This works fine.
But now I need the javascript code that waits for the result to appear in the collection results.
Pseudo code:
wait until the collection result is non-empty.
findOne() from the collection results.
delete the collection results.
I currently connect to the mongodb like this:
var mongo = require('mongodb'),
Server = mongo.Server,
Db = mongo.Db;
var server = new Server('localhost', 27017, {
auto_reconnect: true
});
var db = new Db('test', server);
My haskell knowledge is quite good but not my javascript skills.
So I did extensive searches, but I didn't get far.

glad you solved it, i was going to write something similar:
setTimeout(function(){
db.collection('results',function(coll){
coll.findOne({}, function(err, one){
if( err ) return callback(err);
coll.drop(callback); //or destroy, not really sure <-- this will drop the whole collection
});
});
} ,1000);

The solution is to use the async library.
var async = require('async');
globalCount = -1;
async.whilst(
function () {
return globalCount<1;
},
function (callback) {
console.log("inner while loop");
setTimeout(db_count(callback), 1000);
},
function (err) {
console.log(" || whilst loop finished!!");
}
);

Related

Node JS forEach memory leak issue

I've been creating a small node js app that iterates through an array of names and queries an API for the names. The issue I have is that the array is very large (400,000+ words) and my application runs out of memory before the forEach is complete.
I've been able to diagnose the issue by researching about how JS works with the call stack, web api, and callback queue. What I believe the issue to be is that the forEach loop is blocking the call stack and so the http requests continue to clog up the callback queue without getting resolved.
If anyone can provide a solution for unblocking the forEach loop or an alternative way of coding this app I would be very greatful.
Node JS App
const mongoose = require("mongoose");
const fs = require("fs");
const ajax = require("./modules/ajax.js");
// Bring in Models
let Dictionary = require("./models/dictionary.js");
//=============================
// MongoDB connection
//=============================
// Opens connection to database "test"
mongoose.connect("mongodb://localhost/bookCompanion");
let db = mongoose.connection;
// If database test encounters an error, output error to console.
db.on("error", (err)=>{
console.error("Database connection failed.");
});
db.on("open", ()=>{
console.info("Connected to MongoDB database...");
}).then(()=>{
fs.readFile("./words-2.json", "utf8", (err, data)=>{
if(err){
console.log(err);
} else {
data = JSON.parse(data);
data.forEach((word)=>{
let search = ajax.get(`API url Here?=${word}`);
search.then((response)=>{
let newWord = new Dictionary ({
word: response.word,
phonetic: response.phonetic,
meaning: response.meaning
}).save();
console.log("word saved");
}).catch((err)=>{
console.log("Word not found");
});
});
};
});
});
Check
Check whether api accepts multiple query params.
Try to use async Promises.
Resolve the promises and try to perform the save operation on the Promises by Promise#all

NodeJS: Readable object streams, patterns for generating the data asynchronously

I'd like to crawl data over SSH in a server cluster with NodeJS.
The remote scripts output JSON that is then parsed and split into an object stream.
My problem is now that the callback-oriented libraries I use (SSH2, MySQL) lead to a callback-pattern that I find hard to match with the Readable API spec. How to implement _read(size) when the data to push is behind a bunch of callbacks?
My current implementation takes advantage of the fact that Streams are also EventEmitters. I start to populate my data upon constructing the Stream instance. When all my callbacks are done, I emit an event. I then listen on the custom event, and only then do I start to push data downwards down the pipe chain.
// Calling code
var stream = new CrawlerStream(argsForTheStream);
stream.on('queue_completed', function() {
stream
.pipe(logger)
.pipe(dbWriter)
.on('end', function() {
// Close db connection etc...
});
});
A mock of the CrawlerStream would be
// Mock of the Readable stream implementation
function CrawlerStream(args) {
// boilerplate
// array holding the data to push
this.data = [];
// semi-colon separated string of commands
var cmdQueue = getCommandQueue();
var self = this;
db.query(sql, function(err, sitesToCrawl, fields) {
var servers = groupSitesByServer(sitesToCrawl);
for (var s in servers) {
sshConnect(getRemoteServer(s), function(err, conn) {
sshExec({
ssh: conn,
cmd: cmdQueue
}, function(err, stdout, stderr) {
// Stdout is parsed as JSON
// Finally I can populate self.data!
// Check if all servers are done
// If I'm the last callback to execute
self.data.push(null);
self.emit('queue_completed');
})
});
}
});
}
util.inherits(CrawlerStream, Readable);
CrawlerStream.prototype._read = function(size) {
while (this.data.length) {
this.push(this.data.shift());
}
}
I'm unsure if this is the idiomatic way to accomplish this and would like to get your advice.
Please note in your answers that I'd like to retain the vanilla NodeJS style of using callbacks (no promises) and that I'm stuck with ES5.
Thanks for your time!

Using mysql node.js driver to get an entire database as JSON

I'm working on creating a JavaScript file to get a JSON dump of an entire MySQL database, running on server side. I found and am using the MySQL driver for node.js (https://www.npmjs.com/package/mysql) for queries, it's been straight forward enough to start. My issue is that I need to call multiple queries and get the results from all of them to put into a single JSON file and I can't quite get that to work. I'm entirely new to JavaScript (basically never touched it before now) so it's probably a relatively simple solution that I'm just missing.
Currently I do a query of 'SHOW TABLES' to get a list of all the tables (this can change so I can't just assume a constant list). I then just want to basically loop through the list and call 'SELECT * from table_name' for each table, combining the results as I go to get one big JSON. Unfortunately I haven't figured out how to get the code to finish all the queries before trying to combine them, thus retuning 'undefined' for all the results. Here is what I currently have:
var mysql = require('mysql');
var fs = require('fs');
var connection = mysql.createConnection({
host: 'localhost',
user: 'root',
password: 'pass',
database: 'test_data'
});
connection.connect();
connection.query('SHOW TABLES;', function(err, results, fields)
{
if(err) throw err;
var name = fields[0].name;
var database_json = get_table(results[0][name]);
for (i = 1; i < results.length; i++)
{
var table_name = results[i][name];
var table_json = get_table(table_name);
database_json = database_table_json.concat(table_json);
}
fs.writeFile('test_data.json', JSON.stringify(database_json), function (err)
{
if (err) throw err;
});
connection.end();
});
function get_table(table_name)
{
connection.query('select * from ' + table_name + ';', function(err, results, fields) {
if(err) throw err;
return results;
});
}
This gets the table list and goes through all of it with no issue, and the information returned by the second query is correct if I just do a console.log(results) inside the query, but the for loop just keeps going before any query is completed and thus 'table_json' just ends up being 'undefined'. I really think this must be an easy solution (probably something with callbacks which I don't quite understand fully yet) but I keep stumbling.
Thanks for the help.
I'm guessing that this is for some sort of maintenance type function and not a piece that you need for your application. You're probably safe to do this asynchronously. This module is available here: https://github.com/caolan/async
You can also use Q promises, available here: https://github.com/kriskowal/q
This answer: describes both approaches pretty well: Simplest way to wait some asynchronous tasks complete, in Javascript?

Using Multiple Mongodb Databases with Meteor.js

Is it possible for 2 Meteor.Collections to be retrieving data from 2 different mongdb database servers?
Dogs = Meteor.Collection('dogs') // mongodb://192.168.1.123:27017/dogs
Cats = Meteor.Collection('cats') // mongodb://192.168.1.124:27017/cats
Update
It is now possible to connect to remote/multiple databases:
var database = new MongoInternals.RemoteCollectionDriver("<mongo url>");
MyCollection = new Mongo.Collection("collection_name", { _driver: database });
Where <mongo_url> is a mongodb url such as mongodb://127.0.0.1:27017/meteor (with the database name)
There is one disadvantage with this at the moment: No Oplog
Old Answer
At the moment this is not possible. Each meteor app is bound to one database.
There are a few ways you can get around this but it may be more complicated that its worth:
One option - Use a separate Meteor App
In your other meteor app (example running at port 6000 on same machine). You can still have reactivity but you need to proxy inserts, removes and updates through a method call
Server:
Cats = Meteor.Collection('cats')
Meteor.publish("cats", function() {
return Cats.find();
});
Meteor.methods('updateCat, function(id, changes) {
Cats.update({_id: id}, {$set:changes});
});
Your current Meteor app:
var connection = DDP.connect("http://localhost:6000");
connection.subscribe("cats");
Cats = Meteor.Collection('cats', {connection: connection});
//To update a collection
Cats.call("updateCat", <cat_id>, <changes);
Another option - custom mongodb connection
This uses the node js mongodb native driver.
This is connecting to the database as if you would do in any other node js app.
There is no reactivity available and you can't use the new Meteor.Collection type collections.
var mongodb = Npm.require("mongodb"); //or var mongodb = Meteor.require("mongodb") //if you use npm package on atmosphere
var db = mongodb.Db;
var mongoclient = mongodb.MongoClient;
var Server = mongodb.Server;
var db_connection = new Db('cats', new Server("127.0.0.1", 27017, {auto_reconnect: false, poolSize: 4}), {w:0, native_parser: false});
db.open(function(err, db) {
//Connected to db 'cats'
db.authenticate('<db username>', '<db password>', function(err, result) {
//Can do queries here
db.close();
});
});
The answer is YES: it is possible set up multiple Meteor.Collections to be retrieving data from different mongdb database servers.
As the answer from #Akshat, you can initialize your own MongoInternals.RemoteCollectionDriver instance, through which Mongo.Collections can be created.
But here's something more to talk about. Being contrary to #Akshat answer, I find that Oplog support is still available under such circumstance.
When initializing the custom MongoInternals.RemoteCollectionDriver, DO NOT forget to specify the Oplog url:
var driver = new MongoInternals.RemoteCollectionDriver(
"mongodb://localhost:27017/db",
{
oplogUrl: "mongodb://localhost:27017/local"
});
var collection = new Mongo.Collection("Coll", {_driver: driver});
Under the hood
As described above, it is fairly simple to activate Oplog support. If you do want to know what happened beneath those two lines of code, you can continue reading the rest of the post.
In the constructor of RemoteCollectionDriver, an underlying MongoConnection will be created:
MongoInternals.RemoteCollectionDriver = function (
mongo_url, options) {
var self = this;
self.mongo = new MongoConnection(mongo_url, options);
};
The tricky part is: if MongoConnection is created with oplogUrl provided, an OplogHandle will be initialized, and starts to tail the Oplog (source code):
if (options.oplogUrl && ! Package['disable-oplog']) {
self._oplogHandle = new OplogHandle(options.oplogUrl, self.db.databaseName);
self._docFetcher = new DocFetcher(self);
}
As this blog has described: Meteor.publish internally calls Cursor.observeChanges to create an ObserveHandle instance, which automatically tracks any future changes occurred in the database.
Currently there are two kinds of observer drivers: the legacy PollingObserveDriver which takes a poll-and-diff strategy, and the OplogObseveDriver, which effectively use Oplog-tailing to monitor data changes. To decide which one to apply, observeChanges takes the following procedure (source code):
var driverClass = canUseOplog ? OplogObserveDriver : PollingObserveDriver;
observeDriver = new driverClass({
cursorDescription: cursorDescription,
mongoHandle: self,
multiplexer: multiplexer,
ordered: ordered,
matcher: matcher, // ignored by polling
sorter: sorter, // ignored by polling
_testOnlyPollCallback: callbacks._testOnlyPollCallback
});
In order to make canUseOplog true, several requirements should be met. A bare minimal one is: the underlying MongoConnection instance should have a valid OplogHandle. This is the exact reason why we need to specify oplogUrl while creating MongoConnection
This is actually possible, using an internal interface:
var d = new MongoInternals.RemoteCollectionDriver("<mongo url>");
C = new Mongo.Collection("<collection name>", { _driver: d });

What is the proper way to manage connections to Mongo with MongoJS?

I'm attempting to use MongoJS as a wrapper for the native Mongo driver in Node. I'm modeling the documents in my collection as JavaScript classes with methods like populate(), save(), etc.
In most languages like C# and Java, I'm used to explicitly connecting and then disconnecting for every query. Most examples only give an example of connecting, but never closing the connection when done. I'm uncertain if the driver is able to manage this on its own or if I need to manually do so myself. Documentation is sparse.
Here's the relevant code:
User.prototype.populate = function(callback) {
var that = this;
this.db = mongo.connect("DuxDB");
this.db.collection(dbName).findOne({email : that.email}, function(err, doc){
if(!err && doc) {
that.firstName = doc.firstName;
that.lastName = doc.lastName;
that.password = doc.password;
}
if (typeof(callback) === "function"){
callback.call(that);
}
that.db.close();
});
};
I'm finding that as soon as I call the close() method on the MongoJS object, I can no longer open a new connection on subsequent calls. However, if I do not call this method, the Node process never terminates once all async calls finish, as if it is waiting to disconnect from Mongo.
What is the proper way to manage connections to Mongo with MongoJS?
You will get better performance from your application if you leave the connection(s) open, rather than disconnecting. Making a TCP connection, and, in the case of MongoDB, discovering the replica set/sharding configuration where appropriate, is relatively expensive compared to the time spent actually processing queries and updates. It is better to "spend" this time once and keep the connection open rather than constantly re-doing this work.
Don't open + close a connection for every query. Open the connection once, and re-use it.
Do something more like this reusing your db connection for all calls
User = function(db) {
this.db = db;
}
User.prototype.populate = function(callback) {
var that = this;
this.db.collection(dbName).findOne({email : that.email}, function(err, doc){
if(!err && doc) {
that.firstName = doc.firstName;
that.lastName = doc.lastName;
that.password = doc.password;
}
if (typeof(callback) === "function"){
callback.call(that);
}
});
};
I believe it actually closes the connection after each request, but it sets {auto_reconnect:true} in the mongodb server config, so it will reopen a new connection whenever one is needed.

Categories