Is Mongoose not scalable with document array editing and version control?

Is Mongoose not scalable with document array editing and version control? - javascript

I am developing a web application with Node.js and MongoDB/Mongoose. Our most used Model, Record, has many subdocument arrays. Some of these, for instance, include "Comment", "Bookings", and "Subscribers".
In the client side application, whenever the user hits the "delete" button it fires off an AJAX request to the delete route for that specific comment. The problem I am running into is that, when many of these AJAX calls come in at once, Mongoose fails with a "Document not found" error on some (but not all) of the calls.
This only happens when the calls are made rapidly and many at a time. I think this is due to the version in Mongoose causing document conflicts. Our current process for a delete is:
Fetch the document using Record.findById()
Remove the subdocument from the appropriate array (using, say, comment.remove())
Call record.save()
I have found a solution where I can manually update the collection using Record.findByIdAndUpdate and then using the $pull operator. However, this means we can't use any of mongoose's middleware and loose the version control entirely. And the more I think about it, the more I realize situations where this would happen and I would have to use Mongoose's wrapper functions like findByIdAndUpdate or findAndRemove. The only other solution I can think of would be to put the removal attempt into a while loop and hope it works, which seems like a very poor fix.
Using the Mongoose wrappers doesn't really solve my problem as it won't allow me to use any sort of Middleware or hooks at all then, which is basically one of the huge benefits of using Mongoose.
Does this mean that Mongoose is essentially useless for anything of with rapid editing and I might as well just use native MongoDB drivers? Am I misunderstanding Mongoose's limitations?
How could I solve this problem?

Mongoose's versioned document array editing is not scalable for the simple reason that it's not an atomic operation. As a result, the more array edit activity you have, the more likely it is that two edits will collide and you'll suffer the overhead of retry/recovery from that in your code.
For scalable document array manipulation, you have to use update with the atomic array update operators: $pull[All], $push[All], $pop, $addToSet, and $. Of course, you can also use these operators with the atomic findAndModify-based methods of findByIdAndUpdate and findOneAndUpdate if you also need the original or resulting doc.
As you mentioned, the big downside of using update instead of findOne+save is that none of your Mongoose middleware and validation is executed during an update. But I don't see that you have any choice if you want a scalable system. I'd much rather manually duplicate some middleware and validation logic for the update case than have to suffer the scalability penalties of using Mongoose's versioned document array editing. Hey, at least you still get the benefits of Mongoose's schema-based type casting on updates!

I think, from our own experiences, the answer to your question is "yes". Mongoose is not scalable for rapid array-based updates.
Background
We're experiencing the same issue at HabitRPG. After a recent surge in user growth (bringing our DB to 6gb), we started experiencing VersionError for many array-based updates (background on VersionError). ensureIndex({_id:1,__v1:1}) helped a bit, but that tapered as yet more users joined. It would appear to me Mongoose is indeed not scalable for array-based updates. You can see our whole investigation process here.
Solution
If you can afford moving from an array to an object, do that. Eg, comments: Schema.Types.Array => comments: Schema.Types.Mixed, and sort by post.comments.{ID}.date, or even a manual post.comments.{ID}.position if necessary.
If you're stuck with arrays:
db.collection.ensureIndex({_id:1,__v:1})
Use your methods described above. You won't benefit from hooks and validations, but there are worse things.

I would strongly suggest pulling those arrays out into new collections. For example, a Comments collection where each document has a record ID to indicate where it belongs. This is a much more scalable solution.
You are correct, Mongoose's array operations are not atomic and therefore do not scale well.

I thought of another idea, which I'm not certain about but seems worth offering: soft-delete.
Mongoose is very concerned about array-structure changes because they make future changes ambiguous. But if you were to just tag a comment subdocument with comment.deleted=true then you might be able to do more such operations without encountering conflicts. Then you could have a cron task that goes through and actually removes those comments.
Oh, an additional idea is to use some sort of memory cache, so if an record has been accessed/edited in the last few minutes, it's available without having to pull it from the server, which means that two requests coming in at the same time are going to be modifying the same object.
Note: I'm not actually sure that either of these are good ideas in general or that they'll solve your problem, so go ahead and edit/comment/downvote if they're bad :)

Related

Is there a way to get `updated_by` field when handling a webhook? (Strapi)

Problem: I'm creating a activity log thing in Strapi using webhooks. When dealing with collections I created I know there's this model option you set to get who created and updated the collection. However I also needed to extend this to Media Library and as far as I'm concerned that should be possible because Strapi tables for ML already have the attributes create_by and updated_by.
My toughts: So I came up with a knex custom select that you can see down below:
await knex('upload_file').where('id', media.id).select();
It works just fine. However this would call the database twice and this is a concern as I work in a really big company and that might raise costs a lot.
Final Question: So is there a solution to that? Maybe the same approach as collections I created? Or even allow this option globally so every model on strapi would return this two fields. (I might extend this for all collections I have in the future).

If you're looking to an answer just go for middleware, it's easier and more reliable.
Follow this tutorial and you should be good.

Should updates to Firstore items in AngularFire be done through the AngularFirestoreCollection?

In my app, I have a list that requires an "or" condition. But, as the docs say:
In this case, you should create a separate query for each OR condition and merge the query results in your app.
As a result, in my service, I'm managing two queries and surfacing them as a single observable list to consumers.
The problem comes in with updating. I have the choice of doing extra work to match up the item needing update to the correct collection so I can do the following:
myCollection.doc(item.id).update(item);
or I can make this much more simple and just:
angularFirestore.doc(`path/to/${item.id}`).update(item);
I'm operating under the assumption that the first method will result in faster updates as I'm using the same reference that it would optimistically update instantly. And that the latter will be slower in that it would be more round about by updating the persistence layer and then the collection referencing getting notified about later (probably still a small time).
All of the above is assumption, however. I back this just with a few random instances where I've seen it take a second or two for an update or delete to show up in an other part of the view, but I haven't been able to actually inspect the process.
Does anyone know if the above is correct? Should I be doing the extra work to write through the collection references or does angularfire(and/or firestore) handle this and make them effectively the same operation under the hood?

AngularFire2 is a thin wrapper around RxFire, which itself is a relatively thin wrapper around the Firebase JavaScript SDK.
There should be no significant performance difference between updating a document through AngularFire or updating it directly through the JavaScript SDK. In both cases the majority of the time is spent in the JavaScript SDK, and on the wire between the client and server. For this reason I typically update directly through the JavaScript SDK, since it's often a bit more direct and the AngularFire abstraction has little advantage for me in write operations. Given that AngularFire is built on top of this SDK, it picks up the changes instantly even when they're not made through AngularFire.
If you have an instance where this does not seem to be the case, I recommend creating a question with the minimal, complete/standalone code that reproduces that problem.

Count number comments in post in Meteor

If I want to count the number of comments a post has got, I will have to save the number of comments every time a new comment is either created or removed.
What is the most efficient and secure way to ensure the posts are updated with the number of comments every time a comment is either created or removed? I have tried Curser.observe() but it seems it causes some problems sometimes. I have looked through my code and it should be OK but sometimes some changes happend when they shouldn't so I'm afraid that observe() causes some problems when multiple objects are created at the same time.
I have looked at meteor-collection-hooks and they don't use observe. I thought observe was the best choice since it is native. How does others solve this?

Don't use observe. It consumes resources and doesn't scale past one server (in N servers are observing the change, you will have N increments). I can recommend two possible options:
hooks
As you suggested, you can use collection-hooks to modify the count. Specifically you'd probably want to use after.insert and after.remove on your Comments collection. Hooks don't require extra resources - they just patch the underlying collection code to run your callback.
Recommended reading: A Look At Meteor Collection Hooks
methods
If you use methods to insert and remove your comments, you can also modify your comment counts at the same time. This has the advantage of not requiring an external package, however it also requires some mixing of concerns in your methods.

Node.js, MongoDB, and Concurrency

I'm working on a game prototype and worried about the following case: Browser does AJAX to Node.JS, which has to do several MongoDB operations using async.series.
What prevents multiple requests at the same time causing the database issues? New events (i.e. db operations) seem like they could be run out of order or in between the async.series steps.
In other words, what happens if a user does AJAX calls very quickly, before the prior ones have finished their async.series. Hopefully that makes sense.
If this is indeed an issue, what is the proper way to handle it?

First and foremost, #fmodos's comment should be completely disregarded. It is wrong on many levels but most simply you could have any number of nodes running (say on Heroku) and there is no guarantee that subsequent requests will hit the same node.
Now, I'm going to answer your question by asking more questions. (You really didn't give me a choice here)
What are these operations doing? Inserting documents? Updating existing documents? Removing documents? This is very important because if all you're doing is simply inserting documents then why does it matter if one finishes for before the other? If you're updating documents then you should NOT be issuing a find, grabbing a ref to the object, and then calling save. (I'm making the assumption you're using mongoose, if you're not, I would) Instead what you should be doing is using built in mongo functions like $inc which properly handle concurrent requests.
http://docs.mongodb.org/manual/reference/operator/update/inc/
Does that help at all? If not, please let me know and I will give it another shot.

Mongo has database wide read/write locks. It gives preference to writes of the same collection first then fulfills reads. So, if by chance, you have Bill writing to the db and Joe is reading at the same time, Bill's write will execute first while Joe waits until the write is complete and then he is given all the data (including Bill's).

Receiving notifications of child_added from that moment on

I am using Firebase both on the Browser client-side and on the server side NodeJS in order to make the site SEO friendly. And so, I'm rendering a list of items on the server side, and then listening to new updates on the client side.
I was not able to find a proper chaining for on('child_added') that will start listening to child_added notification from beginning a specific ID.
With endAt() or startAt(), these functions are endpoint inclusive, and so they return the last child as well.
Some other answers seem to indicate using endAt().limit(1) but this seems flawed, as there could be multiple child_added in between the gap. Also endAt() is inclusive and so the endAt().limit(1) still returns the last added child.

You could do a once('value'), then start the .on('child_added') listener in the once() callback, and compare the names of new children with those read initially in the once() call.
This is a little ugly though, and requires keeping a bunch of children names in memory. If you describe your use case in a little more detail, it might help people think of a better solution. How is your site designed such that you need to load only future additions, and can't throw out the query endpoints?

You could push() your own dummy data, then startAt(null, the-name-of-the-dummy).on('child_added'), and throw out the first result.

We Keep Coding

JavaScript is the programming language of the Web.