I am a newbie for MongoDB.
Please help me with the following query:
I am using oplogs for adding Triggers for MongoDB operation. On doing insert/update operations, I receive the complete information about all fields added/updated in the collection.
My problem is:
When I do delete operations in MongoDB, the oplog triggers received contains ONLY the object_id.
Can somebody please point out to some example where I can receive complete information in all fields - for the deleted row in the trigger.
Thanks
You have to fetch that document by its ObjectID, which will not be possible on the current node you are tailing the oplog from because by the time you have received the delete operation from the oplog, the document is gone. Which I believe means you have two choices:
Make sure that all deletes are preceded by an update operation which allows you to see the document fields you require prior to deletion (this will make deletes more expensive of course)
Run a secondary with a slave delay and then query that node for the document that has been deleted (either directly or by using tags).
For number 2, the issue is having a delay that is long enough to guarantee that you can fetch the document and short enough to make sure you are getting an up to date version of the document. Unless you add versioning to the document as a check (which is then getting similar to option 1, you would likely want to update the version before deleting), this would have to be essentially an optimistic, best efforts solution.
Related
I want to create ticketing system . Where ticket get cancelled after given period of time ?.For deleting after some time I am going to use indexing feature by MongoDb . But before it gets expire or after the expiry of that particular ticket I want to retrive or save it in different collection for future is it possible using mongodb ?
In the current version of MongoDB, 5.0.8 as I'm writing this, it's not directly supported, but it will be in MongoDB 6.0 (CF this jira ticket) and there is a workaround that you can use in the meantime (keep reading!).
Let me explain. What you are trying to do is:
set up a TTL index that will remove automatically the docs in your MongoDB collection when the time is passed by X seconds.
set up a Change Streams on this collection with a filter set to only keep delete operations.
In 5.0.8, this change stream event will only contain the _id field of the deleted document, and nothing else, as that's the only information currently available in the oplog.
In 6.0, you will be able to access the previous state of this document (so it's last state before being deleted).
That being said, there is a workaround that Pavel Duchovny explained in his blog post. You can easily accommodate his notification system to achieve your desired behaviour.
Does anyone know how to implement e.g. post counter to mongo db? I think I would do
accept /post with data
get mongo collection.count
add this custom id as {id: collection.count + 1}
but now I don't really know what will happen if 2 /posts will come at the same time. It will be queued in db? Or it will has 2 same fake id?
You can set the id field as unique to handle the case where 2 posts come at the same time...but, you will lose one document, since MongoDB won't allow another object for the same id.
You will have to write another piece of code to handle that case, which will be much more complex than a simple create operation.
Ideally, you shouldn't use a separate field in this way, unless it is required by your application and there is no alternative.
Here are a few caveats of using this approach:
For each post received, there are 2 DB operations being performed.
To prevent the loss of documents when the id field is set to unique, you will have to add another block of code, which might have to make a 3rd DB call to finally be stored in the database
Bottom line: Always use _id unless you have a reason not to do so.
I have a large collection of documents on Cloud Firestore, and each document is quite big too. I need to download them all to the front-end application, to only read one attribute from each (location).
Retrieving all the documents would use a lot of bandwidth, and computers with a slow internet connection would take 10-30 seconds to download. I need this to be done quicker, so I was thinking of using a SELECT query, to get only the location attribute, but my question is: Is the whole document information still downloaded to the front end and then slice-off the unwanted attributes, or am I only getting from the backend only the location.
If the later was the case, then the time it takes to get all documents would be less, as each document size would be a lot smaller (as only location is retrieved). Could anyone confirm how that works?
If anyone has any other ideas of how to approach this, it would be great.
Thanks,
Carlino
Is the whole document information still downloaded to the front end and then slice-off the unwanted attributes, or am I only getting from the backend only the location.
Yes, it is downloaded the entire document but it is not sliced in any way. Cloud Firestore listeners fire on the document level. There is no way to get triggered with just particular fields in a document or split the document to get only one property. It's the entire document, or nothing. So the Firestore client-side SDKs always returns complete documents. Unfortunately, there is no way to request only a part of the document with the client-side SDK, although this option does exist in the server-side SDK's select() method.
If the later was the case, then the time it takes to get all documents would be less, as each document size would be a lot smaller (as only location is retrieved). Could anyone confirm how that works?
It is not the case since you cannot get only a single property of a document.
If anyone has any other ideas of how to approach this, it would be great.
The common approach in this case is to denormalize the data. This means that you should create a new collection in which you should store the same documents but those document will only contain one property. In this case the size of a document will be very small.
Trying to count how many child nodes a particular node contains on my database
I am planning on having many users, and want to the expierence to be as fast as possible, so I know I don't want to download the parent node and count
I've thought of simply having a counter field stored, and everytime a user does something to add to that parent also incriment that value.. however I am pretty inexpierenced with this and am worried
that somehow two users adding something at the same time or so will cause that value to be incorrect..which from my reading is whata transaction operation is create for
I remeber when I used to use Parse a while ago, there was something called CloudCode that would constantly run on the server and in particular I would use it for maintence operations on the database
Would running a transaction operation be the solution here? Currious to hear how others handle stuff like this.. do they have some sort of monitering server maintaining it
I'm interested how does google docs store documents on server side because I need to create similar application.
Does it use pure RTF/ODF files or own database?
How do they make possible versioning and undo/redo feature?
If anybody have knowing according this question please share with me.
To answer you question specifically to how Google Docs works. They use a technology called
Operational Transformation
You may be able to use one of operational transformation engines listed on: https://en.wikipedia.org/wiki/Operational_transform#OT_software
The basic idea is that every operation has a context, e.g. "delete the fourth word in the fifth paragraph" or "add an input box after the button". The clients all send each other operations thru the server. The clients and server each keep their own version of the document and apply operations as they come.
When operations have overlapping contexts, there are a bunch of rules that kick in to resolve conflicts. Like you can't modify something that's been deleted, so the delete must come last in a sequence of concurrent operations on that context.
It's possible that the various clients and server will get out of sync, so you need a secondary algorithm to maintain consistency. One way would be to reload the data from the server whenever a conflict is detected.
--This is an answer I got from a professor when I asked the same thing a couple of years ago.
You should use a database. Perhaps a table storing each document revision. First, find a way to determine whether an update is significant or not. You can store minor changes client side for redo/undo, and then, either periodically or per some condition (e.g., user hits save), create a database entry per revision (you can store things like bytes changed, bytes added, bytes deleted, etc.).
Take a look at MediaWiki, which is open source, and essentially does what you're asking (i.e., take a look at their tables and code).
RTF/ODF would typically be generated, and served, when a user requests exporting the document.
Possibly, you should consider utilizing Google Drive's public API. See link for details.