Multiple DB in meteor keeping reactivity - javascript

There's some way to use more than one MongoDB database in meteor keeping reactivity/Oplog working? I've been reading about it (Post1), (Post2) and still I don't see a straighforward way to achieve this. It's possible? What's the right way? Thank you.

As you say; Default Connection isn't really an option as you can only have one DB and DDP is a bit superfluous when you only need a DB and none of the Meteor stuff. I'd think, therefore, your best approach would be to use the MongoInternals option.
The only thing missing from that option is reactivity; a method of enabling oplog tailing for these additional DB connections is mentioned in this answer. It essentially seems to be a case of passing the oplogUrl when creating the RemoteCollectionDriver, here's the example given in their answer:
var driver = new MongoInternals.RemoteCollectionDriver(
"mongodb://localhost:27017/db",
{
oplogUrl: "mongodb://localhost:27017/local"
});
var collection = new Mongo.Collection("Coll", {_driver: driver});

I'm going to write here what I've discovered until now, but I'm not going to mark the question as answered.
Concepts
What is reactivity?
Using reactivity, the data you show to your users will be updated in real time. You can enable or disable reactivity when you subscribe to a collection. Like this:
// enabled by default
Meteor.publish('top', function() {
return Top.find({},{reactive: false});
});
What is oplog?
Oplog is the MongoDB log. When you tell meteor to use oplog, reactivity performance is much better, unless you have a really high volumne of insert operations. In this cases may be wise to keep it disabled. Oplog can optimize your reactive DB calls by ~x5 ~x20.If you are going to use oplog you should optimize your db calls. You can read more about it here.
What methods exists to connect DBS to meteor?
Default connection:
Reactivity? yes. Oplog? yes. DB Limit? One. Description: Meteor creates a default MongoDB database when you run 'meteor'. You can set a different database using enviromental variables, or just MONGO_URL=mongodb://localhost:27017/db_name meteor.
DDP:
Reactivity? yes. Oplog? yes. DB Limit? no. Description: You need a meteor project for every DB. That's about 600MB in memory for every DB. You can read about it here and here.
MongoInternal (SOLUTION: Thanks to carlevans719):
Reactivity? yes. Oplog? yes. DB Limit? no. Description: You can specify a DB in your subscriptions file like:
ar database = new MongoInternals.RemoteCollectionDriver('mongodb://user:password#localhost:27017/meteor', {oplogUrl: "mongodb://localhost:27017/local"});
var numberOfDocs = database.open('boxes').find().count();
Last Words:
MongoInternal If you are not going to use the default db, you have to tell meteor to do not create it. To achieve this you must always run meteor as MONGO_URL=mongodb://localhost:27017 meteor

Related

Node js shared variables and multiple users and multiple entry-points

I have an node-js application that I'm switching from a single-tenant database to a multi-tenant database. The application code is called from an express api but there are also services that run through a different entrypoints, so req.session is not always available.
Currently I have database function calls all throughout the app like:
database.select.users.findByUserId(123, callback)
Since the app is changing to multi-tenant database, I need to be able to send the postgreSQL schemaName to the database functions. I know I can edit the signature of every database call to this:
database.select.users.findByUserId(schemaName, 123, callback)
But it's very labor intensive, broad sweeping, and is going to create a lot of bugs. I'm hoping to find a safe way to pass the postgres schemaName to the database wrapper, without having a race condition of some kind where this "global" schemaName variable is somehow overwritten by another caller, thus sending the wrong data.
Here's some psuedo-code of what I'm considering writing, but I'm worried it wont be "thread-safe" once we deploy.
// as early as possible in the app call:
database.session.initSchema('schema123');
//session.js
let schema = null;
module.exports.initSchema = function (s) {
schema = s;
};
module.exports.getSchema = function () {
return schema;
};
// before I hit the database, i would call getSchema() and pass it to postgreSQL
This approach works, but what if Caller2 calls initForSchema() with different values while Caller1 hasn't finished executing? How can I distinguish which caller is asking for the data when using one variable like this? Is there any way for me to solve this problem safely without editing the signature of every database function call? Thanks for the advice.
edit
I'm leaning towards this solution:
database.session.initSchema('schema123');
//then immediately call
database.select.users.findByUserId(123, callback);
The advantage here is that nothing asynchonous happens between the two calls, which should nullify the race condition possibility, while keeping the original findByUserId signature.
I don't think doing what you're thinking will work because I don't see a way you're going to get around those race conditions. If you do:
app.use((request, response, next) => {
// initialize database schema
next()
})
It would be ideal because then you can do it only once across all routes, but another request might hit the server a millisecond later and it changes the schema again.
Alternatively you can do that in each separate route, which would work, but then you're doing just as much work as just doing it in the database call in the first place. If you have to reinitialize the schema in each route then it's the same thing as just doing it in the call itself.
I was thinking for a while about a solution and then best I can come up with is whether or not you can do it in the connection pool itself. I have no idea what package you're using or how it's creating DB connections. But something like this:
const database = connection.getInstance('schemaExample')
// continue to do database things here
Just to show an example of what I'm thinking. That way you can create multiple connection pools for the different schemas on startup and you can just query on the one with the correct schema avoiding all the race conditions.
The idea being that even if another request comes in now and uses a different schema, it will be executing on a different database connection.

Using NeDB for testing, while using other DBs in FeatherJS application

I'm wondering if it is common practice to use an in-memory database for a test environment, instead of MySQL (that has to be used for development/production).
If it makes sense, how do I set this up?
I think I can create config/test.json as shown in their chat example, my app.js still requires Knex though.
Should I do something along the lines of
const knex = (NODE_ENV !== 'test') ? require('./knex') : undefined;
and then configure it only if knex !== undefined?
If I do so, all my models have to be set up twice (once for Knex, once for testing without it).
What is the right/standard way to go about this?
EDIT:
As suggested below, I use a different schema for testing.
This is done by declaring a different connection string in config/test.json.
This question is solved, thank you!
if it is common practice to use an in-memory database for a test environment, instead
Sadly it is a common practise, but not particulary good one. When you use different database for testing and other for production your tests are actually not testing that the application code is working in real database.
Other negative effect is also that you cannot use special features of any of those databases, but the code would have to use that subset of DB features, which are supported by both of the databases.
I would ran all the tests with all the supported real databases to actually make sure that code works on every targeted setup.
ps. Someone was telling to use mocks for abstracting database... that is another bad practise. They work for some small part of testing, but in general case you need to run tests against real database to be sure that code works correctly. Important thing is to setup tests in a way that you have a fast way of truncating old data and populating new test data in.

How to mock sequelize model methods in Node.js

I have been struggling a lot to mock database in one of my side projects while writing junits. Can somebody please help me out here. Below is what the scenario looks like:
Before that, the source code is here - https://github.com/sunilkumarc/track-courier
I have created a model using sequelize module in Nodejs. And I access my db through this model.
I want to mock the db calls when running junits. For example findOne method here which returns a promise (https://github.com/sunilkumarc/track-courier/blob/master/models/parcels.js#L4). Basically when running this particular endpoint I want to skip accessing the db.
Any help is appreciated!
Regards, Sunil
While people have pointed out in the comments to a similar question that it's probably wise to include a test Database, I'm going to answer the question directly.
Utilizing Jest, you can do the following to mock individual calls on specific models
const myUser = User.build({
...attributes,
});
jest.spyOn(User, 'findOne').mockImplementation((options) => Promise.resolve(myUser));
const mockedUser = await User.findOne({});
In my experience, I do agree with the commenters. Mocking specific functions like these are not always the most useful as they may not accurately depict the return values of a specific Sequelize function. For example, User#findOne may return null. If you provide rejectOnEmpty, you will have to build your own logic for handling this, which may differ from the exact logic used in the Sequelize library.
Ultimately your code is likely responsible for handling whatever Sequelize returns correctly, and with the level of integration to your data layer, this is going to be something very difficult to mock correctly - not to mention extremely tedious.
Documentation to Model#findAll, where you can see rejectOnEmtpy: https://sequelize.org/api/v6/class/src/model.js~model#static-method-findAll
I used to use Sinon.JS for mocking anything, in this case I did it as:
// assume "Users" is our table
sinon.replace(Users, "create", () => console.log(`mocked "Users" table's "create" method`));

How to tell whether my firebase node is used?

Firebase offers some overall Analytics in their App Dashboard, however, I need to know whether my stored data are ever used or they are just lying idly on a per node basis.
Why? It's simple: we are learning while developing, which makes the app a very fast evolving one. Not only the logic changes, but also the data stored need to be refactored from time to time. I would like to get rid of abandoned and forgotten data. Any ideas?
In best case, I would like to know this:
When was a node used last time? (was it used at all?)
How many times was it used in 1h/24h/1w/1M?
Differentiate between read/write operations
2017 update
Cloud Functions trigger automatically and run on server.
https://firebase.google.com/docs/functions/
https://howtofirebase.com/firebase-cloud-functions-753935e80323
2016 answer
So apparently the Firebase itself doesn't provide any of this.
The only way I can think of right now is to create wrappers for firebase query and write functions and either do the statistics in a client app or create a devoted node for storing the statistical data.
In case of storing the data in firebase, the wrapper for writing functions (set, update, push, remove, setWithPriority) is relatively easy. The query functions (on, once) will have to write in a successCallback.

Is Mongoose not scalable with document array editing and version control?

I am developing a web application with Node.js and MongoDB/Mongoose. Our most used Model, Record, has many subdocument arrays. Some of these, for instance, include "Comment", "Bookings", and "Subscribers".
In the client side application, whenever the user hits the "delete" button it fires off an AJAX request to the delete route for that specific comment. The problem I am running into is that, when many of these AJAX calls come in at once, Mongoose fails with a "Document not found" error on some (but not all) of the calls.
This only happens when the calls are made rapidly and many at a time. I think this is due to the version in Mongoose causing document conflicts. Our current process for a delete is:
Fetch the document using Record.findById()
Remove the subdocument from the appropriate array (using, say, comment.remove())
Call record.save()
I have found a solution where I can manually update the collection using Record.findByIdAndUpdate and then using the $pull operator. However, this means we can't use any of mongoose's middleware and loose the version control entirely. And the more I think about it, the more I realize situations where this would happen and I would have to use Mongoose's wrapper functions like findByIdAndUpdate or findAndRemove. The only other solution I can think of would be to put the removal attempt into a while loop and hope it works, which seems like a very poor fix.
Using the Mongoose wrappers doesn't really solve my problem as it won't allow me to use any sort of Middleware or hooks at all then, which is basically one of the huge benefits of using Mongoose.
Does this mean that Mongoose is essentially useless for anything of with rapid editing and I might as well just use native MongoDB drivers? Am I misunderstanding Mongoose's limitations?
How could I solve this problem?
Mongoose's versioned document array editing is not scalable for the simple reason that it's not an atomic operation. As a result, the more array edit activity you have, the more likely it is that two edits will collide and you'll suffer the overhead of retry/recovery from that in your code.
For scalable document array manipulation, you have to use update with the atomic array update operators: $pull[All], $push[All], $pop, $addToSet, and $. Of course, you can also use these operators with the atomic findAndModify-based methods of findByIdAndUpdate and findOneAndUpdate if you also need the original or resulting doc.
As you mentioned, the big downside of using update instead of findOne+save is that none of your Mongoose middleware and validation is executed during an update. But I don't see that you have any choice if you want a scalable system. I'd much rather manually duplicate some middleware and validation logic for the update case than have to suffer the scalability penalties of using Mongoose's versioned document array editing. Hey, at least you still get the benefits of Mongoose's schema-based type casting on updates!
I think, from our own experiences, the answer to your question is "yes". Mongoose is not scalable for rapid array-based updates.
Background
We're experiencing the same issue at HabitRPG. After a recent surge in user growth (bringing our DB to 6gb), we started experiencing VersionError for many array-based updates (background on VersionError). ensureIndex({_id:1,__v1:1}) helped a bit, but that tapered as yet more users joined. It would appear to me Mongoose is indeed not scalable for array-based updates. You can see our whole investigation process here.
Solution
If you can afford moving from an array to an object, do that. Eg, comments: Schema.Types.Array => comments: Schema.Types.Mixed, and sort by post.comments.{ID}.date, or even a manual post.comments.{ID}.position if necessary.
If you're stuck with arrays:
db.collection.ensureIndex({_id:1,__v:1})
Use your methods described above. You won't benefit from hooks and validations, but there are worse things.
I would strongly suggest pulling those arrays out into new collections. For example, a Comments collection where each document has a record ID to indicate where it belongs. This is a much more scalable solution.
You are correct, Mongoose's array operations are not atomic and therefore do not scale well.
I thought of another idea, which I'm not certain about but seems worth offering: soft-delete.
Mongoose is very concerned about array-structure changes because they make future changes ambiguous. But if you were to just tag a comment subdocument with comment.deleted=true then you might be able to do more such operations without encountering conflicts. Then you could have a cron task that goes through and actually removes those comments.
Oh, an additional idea is to use some sort of memory cache, so if an record has been accessed/edited in the last few minutes, it's available without having to pull it from the server, which means that two requests coming in at the same time are going to be modifying the same object.
Note: I'm not actually sure that either of these are good ideas in general or that they'll solve your problem, so go ahead and edit/comment/downvote if they're bad :)

Categories