Node.js, MongoDB, and Concurrency

Node.js, MongoDB, and Concurrency - javascript

I'm working on a game prototype and worried about the following case: Browser does AJAX to Node.JS, which has to do several MongoDB operations using async.series.
What prevents multiple requests at the same time causing the database issues? New events (i.e. db operations) seem like they could be run out of order or in between the async.series steps.
In other words, what happens if a user does AJAX calls very quickly, before the prior ones have finished their async.series. Hopefully that makes sense.
If this is indeed an issue, what is the proper way to handle it?

First and foremost, #fmodos's comment should be completely disregarded. It is wrong on many levels but most simply you could have any number of nodes running (say on Heroku) and there is no guarantee that subsequent requests will hit the same node.
Now, I'm going to answer your question by asking more questions. (You really didn't give me a choice here)
What are these operations doing? Inserting documents? Updating existing documents? Removing documents? This is very important because if all you're doing is simply inserting documents then why does it matter if one finishes for before the other? If you're updating documents then you should NOT be issuing a find, grabbing a ref to the object, and then calling save. (I'm making the assumption you're using mongoose, if you're not, I would) Instead what you should be doing is using built in mongo functions like $inc which properly handle concurrent requests.
http://docs.mongodb.org/manual/reference/operator/update/inc/
Does that help at all? If not, please let me know and I will give it another shot.

Mongo has database wide read/write locks. It gives preference to writes of the same collection first then fulfills reads. So, if by chance, you have Bill writing to the db and Joe is reading at the same time, Bill's write will execute first while Joe waits until the write is complete and then he is given all the data (including Bill's).

Related

Firebase Realtime Database document ordering

I am listening for new Firebase Realtime Database documents with code something like this:
firebase.database().ref(path)
.orderByChild('timestamp')
.on('child_added', snap => {
...
});
where timestamp is set on the server with firebase.database.ServerValue.TIMESTAMP. I would like to have documents always handled in timestamp order, but I am aware that documents I add locally may arrive in the above code out of order.
I can check for and fix mis-ordered arrivals but I'd prefer not to if there is some way to have this not happen. I know about this answer (and answers that link to it) but I believe that applies to an earlier API without ordering methods like orderByChild.
I believe that I should be able to get timestamp order if I always add documents using a transaction and pass false in the applyLocally argument. I am wondering if it also works to add documents from a separate Javascript context on the same client (e.g. from a Web Worker) without a transaction.
Will either or both of these approaches guarantee timestamp ordering? Is there any other way to achieve this? Among approaches that work, is one clearly superior or are there trade-offs among them?

The local estimate/latency compensation event is only fired on the client that performs the write operation. So if you perform a write operation in a different context, the original context will only see the operation when it comes from the server.
You might even be able to accomplish this by using two FirebaseApp instances, although I couldn't get that working in a quick test here myself.

Should updates to Firstore items in AngularFire be done through the AngularFirestoreCollection?

In my app, I have a list that requires an "or" condition. But, as the docs say:
In this case, you should create a separate query for each OR condition and merge the query results in your app.
As a result, in my service, I'm managing two queries and surfacing them as a single observable list to consumers.
The problem comes in with updating. I have the choice of doing extra work to match up the item needing update to the correct collection so I can do the following:
myCollection.doc(item.id).update(item);
or I can make this much more simple and just:
angularFirestore.doc(`path/to/${item.id}`).update(item);
I'm operating under the assumption that the first method will result in faster updates as I'm using the same reference that it would optimistically update instantly. And that the latter will be slower in that it would be more round about by updating the persistence layer and then the collection referencing getting notified about later (probably still a small time).
All of the above is assumption, however. I back this just with a few random instances where I've seen it take a second or two for an update or delete to show up in an other part of the view, but I haven't been able to actually inspect the process.
Does anyone know if the above is correct? Should I be doing the extra work to write through the collection references or does angularfire(and/or firestore) handle this and make them effectively the same operation under the hood?

AngularFire2 is a thin wrapper around RxFire, which itself is a relatively thin wrapper around the Firebase JavaScript SDK.
There should be no significant performance difference between updating a document through AngularFire or updating it directly through the JavaScript SDK. In both cases the majority of the time is spent in the JavaScript SDK, and on the wire between the client and server. For this reason I typically update directly through the JavaScript SDK, since it's often a bit more direct and the AngularFire abstraction has little advantage for me in write operations. Given that AngularFire is built on top of this SDK, it picks up the changes instantly even when they're not made through AngularFire.
If you have an instance where this does not seem to be the case, I recommend creating a question with the minimal, complete/standalone code that reproduces that problem.

Database cluster - asynchronous tasks

Let's say I have a couchDB database called "products" and a frontend with a form.
Now if a user opens a document from this database in the form I want to prevent other user from editing this specific document.
Usually pretty simple:
-> read document from couchDB
-> set a variable to true like: { edit : true }
-> save (merge) document to couchDB
-> if someone else tries to open the document he will receive an error, becaus of edit:true.
BUT, what if two user open the document at the exact same time?
The function would be called twice and when the second one opens the document he would falsely receive an edit:false because the first didn't had enough time to save his edit:true. So how to prevent this behaviour?
First solution would be:
Build an array as a cue for database requests and dont allow parallel requests, so all requests would be worked off one after another. But in my opinion this is a bad solution because the system would be incredible slow at some point.
Second solution:
Store the documentIDs of the currently edited documents in an local array in the script. This would work because this is no asynchronous process and the second user would receive his error immediately.
So far so good, BUT, what if some day there are too many user and this system should run in a cluster (the node client server, not the database) - now the second solution would not work anymore because every cluster slave would have its own array of documentIDs. Sharing there would end in another asynchronous task and result in the same problem above.
Now i'm out of ideas, how do big clustered systems usually handle problems like that?

CouchDB uses MVCC to maintain consistency in your database. When a document is being updated, you must supply both the ID (_id) and revision number (_rev) otherwise your change will be rejected.
This means that if 2 clients read the document at revision 1 and both attempt to write a change using that same revision number, only the first will be accepted by the database. The 2nd client will receive an error, and it should fetch the latest revision of the document in order to proceed.
In a single-node environment, this model prevents conflicts outright. However, in cases where replication is occurring, it is still possible to get conflicts, even when using MVCC. This is because conflicting revisions can technically be written to different nodes before they have been replicated to one another. In this case, CouchDB will record the conflict and your application is responsible to resolve them.
CouchDB has stellar documentation, in particular they have an article all about conflicts and replication that I highly recommend for this subject.

Improving Performance on massive IndexedDB Insert

We are trying to pre-cache a large sum of data on load of our web application into indexed db. From my performance testing the speed is decent on a desktop browser (e.g. Internet Explorer) where I can insert 10,000 records in around 2 seconds. But comparing the exact same functionality on the iPad it drops to 30 seconds. That comparison just blew my mind.
Does anyone know of any hints or tricks to inserting large data sets into indexedDB. I dont know if it is possible at all but if we could build up a copy of an indexedDB server side with all the data prepopulated and then just shoot it over to the client and it just stores it down to the browser. Is anything along these lines doable?
Thanks

I had problems with massive bulk insert (100.000 - 200.000 records). I've solved all my IndexedDB performance problems using Dexie library. It has this important feature:
Dexie has a kick-ass performance. It's bulk methods take advantage of
a not well known feature in indexedDB that makes it possible to store
stuff without listening to every onsuccess event. This speeds up the
performance to a maximum.
Dexie: https://github.com/dfahlander/Dexie.js

Some pretty bad IndexedDB performance problems can be caused by a prolonged period of the browser just calling onsuccess callbacks and running into event loop overhead after the work is actually done. The performance pattern observed by my app which was doing this was that it did a bunch of work, then it just went answering thousands of callbacks very inefficiently:
The right hand part of this image is the callbacks on every request. The solution to doing that is, of course, to not put a callback on every request, but it was previously unclear to me how to do this.
The way that Dexie.js accomplishes this (for details, see src/dbcore/dbcore-indexeddb.ts) is that it saves the last request (e.g. IDBObjectStore.put, etc) sent and sets an onsuccess callback on that one, which then collects the results from the rest of the requests. Thus, it avoids the callback hell.
Another approach from this is to use the IDBTransaction.oncomplete event, and not worry about the callbacks on the individual requests at all.
(note: yes, I know how old this question is, I had this problem today and wanted to put something more useful for this question which is high in Google results)

How is your data stored in the indexeddb? Is everything in a single object store of do you use multiple objectstores. Do you need all the cached data immediatly?
If you only have a single object store you can start with storing all the data you initialy need, commit that transaction and start a new for all the rest. This way you can start retrieving the initial data while inserting the rest. IndexedDB is async so it should block you.
If you have multiple object stores you can use the same stratigy. First fill up the objectstore you need immediatly and delay the others.
Or maybe consider using the AppCache API instead of the indexeddb api. Using this you can just cache a javascriptfile containing all the json objects you want to cache. This is more the case when you don't need a lot of querying on the data.

Is Mongoose not scalable with document array editing and version control?

I am developing a web application with Node.js and MongoDB/Mongoose. Our most used Model, Record, has many subdocument arrays. Some of these, for instance, include "Comment", "Bookings", and "Subscribers".
In the client side application, whenever the user hits the "delete" button it fires off an AJAX request to the delete route for that specific comment. The problem I am running into is that, when many of these AJAX calls come in at once, Mongoose fails with a "Document not found" error on some (but not all) of the calls.
This only happens when the calls are made rapidly and many at a time. I think this is due to the version in Mongoose causing document conflicts. Our current process for a delete is:
Fetch the document using Record.findById()
Remove the subdocument from the appropriate array (using, say, comment.remove())
Call record.save()
I have found a solution where I can manually update the collection using Record.findByIdAndUpdate and then using the $pull operator. However, this means we can't use any of mongoose's middleware and loose the version control entirely. And the more I think about it, the more I realize situations where this would happen and I would have to use Mongoose's wrapper functions like findByIdAndUpdate or findAndRemove. The only other solution I can think of would be to put the removal attempt into a while loop and hope it works, which seems like a very poor fix.
Using the Mongoose wrappers doesn't really solve my problem as it won't allow me to use any sort of Middleware or hooks at all then, which is basically one of the huge benefits of using Mongoose.
Does this mean that Mongoose is essentially useless for anything of with rapid editing and I might as well just use native MongoDB drivers? Am I misunderstanding Mongoose's limitations?
How could I solve this problem?

Mongoose's versioned document array editing is not scalable for the simple reason that it's not an atomic operation. As a result, the more array edit activity you have, the more likely it is that two edits will collide and you'll suffer the overhead of retry/recovery from that in your code.
For scalable document array manipulation, you have to use update with the atomic array update operators: $pull[All], $push[All], $pop, $addToSet, and $. Of course, you can also use these operators with the atomic findAndModify-based methods of findByIdAndUpdate and findOneAndUpdate if you also need the original or resulting doc.
As you mentioned, the big downside of using update instead of findOne+save is that none of your Mongoose middleware and validation is executed during an update. But I don't see that you have any choice if you want a scalable system. I'd much rather manually duplicate some middleware and validation logic for the update case than have to suffer the scalability penalties of using Mongoose's versioned document array editing. Hey, at least you still get the benefits of Mongoose's schema-based type casting on updates!

I think, from our own experiences, the answer to your question is "yes". Mongoose is not scalable for rapid array-based updates.
Background
We're experiencing the same issue at HabitRPG. After a recent surge in user growth (bringing our DB to 6gb), we started experiencing VersionError for many array-based updates (background on VersionError). ensureIndex({_id:1,__v1:1}) helped a bit, but that tapered as yet more users joined. It would appear to me Mongoose is indeed not scalable for array-based updates. You can see our whole investigation process here.
Solution
If you can afford moving from an array to an object, do that. Eg, comments: Schema.Types.Array => comments: Schema.Types.Mixed, and sort by post.comments.{ID}.date, or even a manual post.comments.{ID}.position if necessary.
If you're stuck with arrays:
db.collection.ensureIndex({_id:1,__v:1})
Use your methods described above. You won't benefit from hooks and validations, but there are worse things.

I would strongly suggest pulling those arrays out into new collections. For example, a Comments collection where each document has a record ID to indicate where it belongs. This is a much more scalable solution.
You are correct, Mongoose's array operations are not atomic and therefore do not scale well.

I thought of another idea, which I'm not certain about but seems worth offering: soft-delete.
Mongoose is very concerned about array-structure changes because they make future changes ambiguous. But if you were to just tag a comment subdocument with comment.deleted=true then you might be able to do more such operations without encountering conflicts. Then you could have a cron task that goes through and actually removes those comments.
Oh, an additional idea is to use some sort of memory cache, so if an record has been accessed/edited in the last few minutes, it's available without having to pull it from the server, which means that two requests coming in at the same time are going to be modifying the same object.
Note: I'm not actually sure that either of these are good ideas in general or that they'll solve your problem, so go ahead and edit/comment/downvote if they're bad :)

We Keep Coding

JavaScript is the programming language of the Web.