Why observing oplog takes so much time in meteor / mongo?

Why observing oplog takes so much time in meteor / mongo? - javascript

I have a MongoLab cluster, which allows me to use Oplog tailing in order to improve performances, availability and redundancy in my Meteor.js app.
Problem is : since I've been using it, all my publications take more time to finish. When it only take like 200ms, that's not a problem, but it often takes much more, like here, where I'm subscribing to the publication I described here.
This publication already has a too long response time, and oplog observations are slowing it too, though it's far from being the only publication where observing oplog takes that much time.
Could anyone explain to me what's happening ? Nowhere where i search on the web I find any explanation about why observing oplog slow my publication that much.
Here are some screenshots from Kadira to illustrate what I'm saying :
Here is a screenshot from another pub/sub :
And finally, one where observing oplogs take a reasonable time (but still slow my pub/sub a bit) :

Oplog tailing is very fast. Oplog tailing isn't the issue here.
You're probably doing a lot of things that you don't realize make publications slow:
One-by-one document followed by update loops: You're probably doing a document update inside the body of a Collection.forEach call. This is incredibly slow, and the origin of your poor performance in method bodies. Every time you do a single document update that's listened to by hundreds of concurrent connections, each of those need to be updated; by doing a query following by an update one at a time, neither Mongo nor Meteor can optimize and they must wait for every single user to be updated on every change. It's a doubly-asymptotic increase in your performance. Solution: Think about how to do the update using {multi:true}.
Unique queries for every user: If you make a single change to a user document that has say, 100 concurrent unique subscriptions connected to it, the connections will be notified serially. That means the first connection will be notified in 90ms, while the last connection will be notified after 90ms * 100 users later. That's the other reason your observeChanges are slow. Solution: Think about if you really need a unique subscription on each users document. Meteor has optimizations for identical subscriptions shared between multiple concurrent collections.
Lots of documents: You're probably encoding each thread comment, post, chat message, etc. as its own document. Each document needs to be sent individually to each client, introducing some related overhead. This is the right schema for a relational database, and the wrong one for a document-based database. Solution: Try to hold every single thing you need to render a page to a user in a single document (de-normalization). With regards to chat, you should have a single "conversation" document that contains all the messages between two+ users.
Database isn't co-located with your host: If you're using MongoLab, your database may not be in the same datacenter as your web host (which I assume is Galaxy or Modulus). Intra-datacenter latencies can be very, very high, and this is probably the explanation for your poor collection reads. Indices, as other commenters have noticed, might play a role, but my bet is that you have fewer than a hundred records in any of these collections so it won't really matter.

Related

Save to 3 firebase locations with a slow internet connection

Sometimes I'm having issues with firebase when the user is on a slow mobile connection. When the user saves an entry to firebase I actually have to write to 3 different locations. Sometimes, the first one works, but if the connection is slow the 2nd and 3rd may fail.
This leaves me with entries in the first location that I constantly need to clean up.
Is there a way to help prevent this from happening?
var newTikiID = ref.child("tikis").push(tiki, function(error){
if(!error){
console.log("new tiki created")
var tikiID = newTikiID.key()
saveToUser(tikiID)
saveToGeoFire(tikiID, tiki.tikiAddress)
} else {
console.log("an error occurred during tiki save")
}
});

There is no Firebase method to write to multiple paths at once. Some future tools planned by the team (e.g. Triggers) may resolve this in the future.
This topic has been explored before and the firebase-multi-write README contains a lot of discussion on the topic. The repo also has a partial solution to client-only atomic writes. However, there is no perfect solution without a server process.
It's important to evaluate your use case and see if this really matters. If the second and third writes failed to write to a geo query, chances are, there's really no consequence. Most likely, it's essentially the same as if the first write had failed, or if all writes had failed; it won't appear in searches by geo location. Thus, the complexity of resolving this issue is probably a time sink.
Of course, it does cost a few bytes of storage. If we're working with millions of records, that may matter. A simple solution for this scenario would be to run and audit report that detects broken links between the data and geofire tables and cleans up old data.
If an atomic operation is really necessary, such as gaming mechanics where fairness or cheating could be an issue, or where integrity is lost by having partial results, there are a couple options:
1) Master Record approach
Pick a master path (the one that must exist) and use security rules to ensure other records cannot be written, unless the master path exists.
".write": "root.child('maste_path').child(newData.child('master_record_id')).exists()"
2) Server-side script approach
Instead of writing the paths separately, use a queue strategy.
Create an single event by writing a single event to a queue
Have a server-side process monitor the queue and process events
The server-side process does the multiple writes and ensures they
all succeed
If any fail, the server-side process handles
rollbacks or retries
By using the server-side queue, you remove the risk of a client going offline between writes. The server can safely survive restarts and retry events or failures when using the queue model.

I have had the same problem and I ended up choosing to use condition Conditional Request with the Firebase REST API in order to write data transactionally. See my question and answer. Firebase: How to update multiple nodes transactionally? Swift 3 .
If you need to write concurrently (but not transactionally) to several paths, you can do that now as Firebase supports multi-path updates. https://firebase.google.com/docs/database/rest/save-data
https://firebase.googleblog.com/2015/09/introducing-multi-location-updates-and_86.html

Improving Performance on massive IndexedDB Insert

We are trying to pre-cache a large sum of data on load of our web application into indexed db. From my performance testing the speed is decent on a desktop browser (e.g. Internet Explorer) where I can insert 10,000 records in around 2 seconds. But comparing the exact same functionality on the iPad it drops to 30 seconds. That comparison just blew my mind.
Does anyone know of any hints or tricks to inserting large data sets into indexedDB. I dont know if it is possible at all but if we could build up a copy of an indexedDB server side with all the data prepopulated and then just shoot it over to the client and it just stores it down to the browser. Is anything along these lines doable?
Thanks

I had problems with massive bulk insert (100.000 - 200.000 records). I've solved all my IndexedDB performance problems using Dexie library. It has this important feature:
Dexie has a kick-ass performance. It's bulk methods take advantage of
a not well known feature in indexedDB that makes it possible to store
stuff without listening to every onsuccess event. This speeds up the
performance to a maximum.
Dexie: https://github.com/dfahlander/Dexie.js

Some pretty bad IndexedDB performance problems can be caused by a prolonged period of the browser just calling onsuccess callbacks and running into event loop overhead after the work is actually done. The performance pattern observed by my app which was doing this was that it did a bunch of work, then it just went answering thousands of callbacks very inefficiently:
The right hand part of this image is the callbacks on every request. The solution to doing that is, of course, to not put a callback on every request, but it was previously unclear to me how to do this.
The way that Dexie.js accomplishes this (for details, see src/dbcore/dbcore-indexeddb.ts) is that it saves the last request (e.g. IDBObjectStore.put, etc) sent and sets an onsuccess callback on that one, which then collects the results from the rest of the requests. Thus, it avoids the callback hell.
Another approach from this is to use the IDBTransaction.oncomplete event, and not worry about the callbacks on the individual requests at all.
(note: yes, I know how old this question is, I had this problem today and wanted to put something more useful for this question which is high in Google results)

How is your data stored in the indexeddb? Is everything in a single object store of do you use multiple objectstores. Do you need all the cached data immediatly?
If you only have a single object store you can start with storing all the data you initialy need, commit that transaction and start a new for all the rest. This way you can start retrieving the initial data while inserting the rest. IndexedDB is async so it should block you.
If you have multiple object stores you can use the same stratigy. First fill up the objectstore you need immediatly and delay the others.
Or maybe consider using the AppCache API instead of the indexeddb api. Using this you can just cache a javascriptfile containing all the json objects you want to cache. This is more the case when you don't need a lot of querying on the data.

When is it appropriate to use a setTimeout vs a Cron?

I am building a Meteor application that is using a mongo database.
I have a collection that could potentially have 1000s of documents that need to be updated at different times.
Do I run setTimeouts on creation or a cron job that runs every second and loops over every document?
What are the pros and cons of doing each?
To put this into context:
I am building an online tournament system. I can have 100s of tournaments running which means I could have 1000s of matches.
Each match needs to absolutely end at a specific time, and can end earlier under a condition.

Using an OS-level cron job won't work because you can only check with a 60-second resolution. So by "cron job", I think you mean a single setTimeout (or synced-cron). Here are some thoughts:
Single setTimeout
strategy: Every second wake up and check a large number of matches, updating those which are complete. If you have multiple servers, you can prevent all but one of them from doing the check via synced-cron.
The advantage of this strategy is that it's straightforward to implement. The disadvantages are:
You may end up doing a lot of unnecessary database reads.
You have to be extremely careful that your processing time does not exceed the length of the period between checks (one second).
I'd recommend this strategy if you are confident that the runtime can be controlled. For example, if you can index your matches on an endTime so only a few matches need to be checked in each cycle.
Multiple setTimeouts
strategy: Add a setTimeout for each match on creation or when the sever starts. As each timeout expires, update the corresponding match.
The advantage of this strategy is that it potentially removes a considerable amount of unnecessary database traffic. The disadvantages are:
It may be a little more tricky to implement. E.g. you have to consider what happens on a server restart.
The naive implementation doesn't scale past a single server (see 1).
I'd recommend this strategy if you think you will use a single server for the foreseeable future.
Those are the trade-offs which occurred to me given the choices you presented. A more robust solution would probably involve technology outside of the meteor/mongo stack. For example, storing match times in redis and then listening for keyspace notifications.

This is all a matter of preference, to be honest with you.
I'm a big fan of writing small, independent programs, that each do one thing, and do it well. If you're also like this, it's probably better to write separate programs to run periodically via cron.
This way you get guaranteed OS-controlled precision for the time, and small, simple programs that are easy to debug outside the context of your webapp.
This is just a preference though.

Node.js, MongoDB, and Concurrency

I'm working on a game prototype and worried about the following case: Browser does AJAX to Node.JS, which has to do several MongoDB operations using async.series.
What prevents multiple requests at the same time causing the database issues? New events (i.e. db operations) seem like they could be run out of order or in between the async.series steps.
In other words, what happens if a user does AJAX calls very quickly, before the prior ones have finished their async.series. Hopefully that makes sense.
If this is indeed an issue, what is the proper way to handle it?

First and foremost, #fmodos's comment should be completely disregarded. It is wrong on many levels but most simply you could have any number of nodes running (say on Heroku) and there is no guarantee that subsequent requests will hit the same node.
Now, I'm going to answer your question by asking more questions. (You really didn't give me a choice here)
What are these operations doing? Inserting documents? Updating existing documents? Removing documents? This is very important because if all you're doing is simply inserting documents then why does it matter if one finishes for before the other? If you're updating documents then you should NOT be issuing a find, grabbing a ref to the object, and then calling save. (I'm making the assumption you're using mongoose, if you're not, I would) Instead what you should be doing is using built in mongo functions like $inc which properly handle concurrent requests.
http://docs.mongodb.org/manual/reference/operator/update/inc/
Does that help at all? If not, please let me know and I will give it another shot.

Mongo has database wide read/write locks. It gives preference to writes of the same collection first then fulfills reads. So, if by chance, you have Bill writing to the db and Joe is reading at the same time, Bill's write will execute first while Joe waits until the write is complete and then he is given all the data (including Bill's).

Is Mongoose not scalable with document array editing and version control?

I am developing a web application with Node.js and MongoDB/Mongoose. Our most used Model, Record, has many subdocument arrays. Some of these, for instance, include "Comment", "Bookings", and "Subscribers".
In the client side application, whenever the user hits the "delete" button it fires off an AJAX request to the delete route for that specific comment. The problem I am running into is that, when many of these AJAX calls come in at once, Mongoose fails with a "Document not found" error on some (but not all) of the calls.
This only happens when the calls are made rapidly and many at a time. I think this is due to the version in Mongoose causing document conflicts. Our current process for a delete is:
Fetch the document using Record.findById()
Remove the subdocument from the appropriate array (using, say, comment.remove())
Call record.save()
I have found a solution where I can manually update the collection using Record.findByIdAndUpdate and then using the $pull operator. However, this means we can't use any of mongoose's middleware and loose the version control entirely. And the more I think about it, the more I realize situations where this would happen and I would have to use Mongoose's wrapper functions like findByIdAndUpdate or findAndRemove. The only other solution I can think of would be to put the removal attempt into a while loop and hope it works, which seems like a very poor fix.
Using the Mongoose wrappers doesn't really solve my problem as it won't allow me to use any sort of Middleware or hooks at all then, which is basically one of the huge benefits of using Mongoose.
Does this mean that Mongoose is essentially useless for anything of with rapid editing and I might as well just use native MongoDB drivers? Am I misunderstanding Mongoose's limitations?
How could I solve this problem?

Mongoose's versioned document array editing is not scalable for the simple reason that it's not an atomic operation. As a result, the more array edit activity you have, the more likely it is that two edits will collide and you'll suffer the overhead of retry/recovery from that in your code.
For scalable document array manipulation, you have to use update with the atomic array update operators: $pull[All], $push[All], $pop, $addToSet, and $. Of course, you can also use these operators with the atomic findAndModify-based methods of findByIdAndUpdate and findOneAndUpdate if you also need the original or resulting doc.
As you mentioned, the big downside of using update instead of findOne+save is that none of your Mongoose middleware and validation is executed during an update. But I don't see that you have any choice if you want a scalable system. I'd much rather manually duplicate some middleware and validation logic for the update case than have to suffer the scalability penalties of using Mongoose's versioned document array editing. Hey, at least you still get the benefits of Mongoose's schema-based type casting on updates!

I think, from our own experiences, the answer to your question is "yes". Mongoose is not scalable for rapid array-based updates.
Background
We're experiencing the same issue at HabitRPG. After a recent surge in user growth (bringing our DB to 6gb), we started experiencing VersionError for many array-based updates (background on VersionError). ensureIndex({_id:1,__v1:1}) helped a bit, but that tapered as yet more users joined. It would appear to me Mongoose is indeed not scalable for array-based updates. You can see our whole investigation process here.
Solution
If you can afford moving from an array to an object, do that. Eg, comments: Schema.Types.Array => comments: Schema.Types.Mixed, and sort by post.comments.{ID}.date, or even a manual post.comments.{ID}.position if necessary.
If you're stuck with arrays:
db.collection.ensureIndex({_id:1,__v:1})
Use your methods described above. You won't benefit from hooks and validations, but there are worse things.

I would strongly suggest pulling those arrays out into new collections. For example, a Comments collection where each document has a record ID to indicate where it belongs. This is a much more scalable solution.
You are correct, Mongoose's array operations are not atomic and therefore do not scale well.

I thought of another idea, which I'm not certain about but seems worth offering: soft-delete.
Mongoose is very concerned about array-structure changes because they make future changes ambiguous. But if you were to just tag a comment subdocument with comment.deleted=true then you might be able to do more such operations without encountering conflicts. Then you could have a cron task that goes through and actually removes those comments.
Oh, an additional idea is to use some sort of memory cache, so if an record has been accessed/edited in the last few minutes, it's available without having to pull it from the server, which means that two requests coming in at the same time are going to be modifying the same object.
Note: I'm not actually sure that either of these are good ideas in general or that they'll solve your problem, so go ahead and edit/comment/downvote if they're bad :)

We Keep Coding

JavaScript is the programming language of the Web.