Im developing a client-server real-time program and I need my server to be up-to-date always. Unill now I have been implement a GET request from the server every X seconds that returns all the entities from the MongoDB.
Now I have big amount of entities and I need to GET only the entities which have been updated since the last GET request.
I think about running sequence in the db for each entity and check every X seconds if the sequens have been increased.
But I will prefer a better way.
Is there any way to get only the recent changes from mongo? Or any nicer architecture ?
You can have a last updated time in the collection. In the client side, you can maintain a last get time.
In the subsequent requests, get all the documents from collection where last updated time is greater than last get time. This way you will get the documents that got updated or inserted since you last get the data (I.e. delta).
Edit:
MongoDB maintains the date object in UTC format. As long as the date at client side is maintained in UTC and send the same data in the subsequent request, it should retrieve the latest updated records.
Related
I want to create ticketing system . Where ticket get cancelled after given period of time ?.For deleting after some time I am going to use indexing feature by MongoDb . But before it gets expire or after the expiry of that particular ticket I want to retrive or save it in different collection for future is it possible using mongodb ?
In the current version of MongoDB, 5.0.8 as I'm writing this, it's not directly supported, but it will be in MongoDB 6.0 (CF this jira ticket) and there is a workaround that you can use in the meantime (keep reading!).
Let me explain. What you are trying to do is:
set up a TTL index that will remove automatically the docs in your MongoDB collection when the time is passed by X seconds.
set up a Change Streams on this collection with a filter set to only keep delete operations.
In 5.0.8, this change stream event will only contain the _id field of the deleted document, and nothing else, as that's the only information currently available in the oplog.
In 6.0, you will be able to access the previous state of this document (so it's last state before being deleted).
That being said, there is a workaround that Pavel Duchovny explained in his blog post. You can easily accommodate his notification system to achieve your desired behaviour.
I have a node app that is retrieving records from an API and upserting them to a MySQL DB.
The app will unconditionally pull all records from the last 6 months from the API (this will be run on a nightly basis). Let's say ~20k records will be downloaded every day.
Most of these 20k will probably be unaltered. Only a small amount may actually have changes in their rows.
Regardless, performing an INSERT INTO ...ON DUPLICATE KEY.. will UPSERT all 20k rows in the DB (not just the altered records). I can confirm this by printing out the result.affectedRows in the callback and verifying it matches the count from the API. This isn't particularly a problem as it only takes a few seconds to upsert all rows.
However, after this, I want to retrieve all rows that had a deleted field get changed from false to true. So..let's say only 1 in 50 of the upserted rows actually had this get changed. I need to then somehow detect these 100 rows and then will delete them from another application (using this other app's REST-based API). So in a nutshell...this node app is a broker to synchronize records between 2 different apps.
Was thinking I could loop through all 20k..find their respective records in DB..and only upsert if there was a change? This is obviously inefficient and won't scale bc of the number of DB hits.
What kind of pattern can I use here to detect this? I'd rather not loop through every record.
I am doing practice interviews and specifically prepping for the Design portion. One mentions:
Design a weather widget that pull data from a service API which makes data available every hour. Avoid pulling the data from it all the time if there are no changes. Then what happens when you scale this to lots of users.
My first thought would be obviously create a function that fetches the data from the GET endpoint and then parsing the JSON.
The part that would throw me off though is: "Avoid pulling the data from it all the time if there are no changes" . How can I know there are no changes without first pulling the data? My only thought would be to create a ignore flag:
Pull the data, mark the temperature as 55 Degrees. Create a flag that ignores values of +/- 3 degrees from this temperature value.
Next hour, pull the data and see the temperature is 56 Degrees. That is within the ignore flag range: (ex: if (Math.abs(temperature - nextTemp) > 3) { ignoreFor5Hours = true; } . Then this will stop the hourly pulling for 5 hours, or however long someone set it to.
Does this make sense or am I thinking about this the wrong way?
Assuming the data is not updated regularly
It sounds quite confusing as there should be no method for the client side to actively know whether there is an update on the data before pulling it from the server.
One way I would suggest is to use two-ways communication, such as socket.io. That is, you establish a connection to the server, once there is an update, the server can initialize a call to your client app for fetching the data.
Another way is to use long pulling, or just like your interval fetching, to pull a hash from the server and check if the hash changed, which is also not ideal as you have to also load your server with a hanging request, but at least the data traffic will be smaller.
These methods are obviously not optimal, but if you must follow the guideline and it means what it means, those can be your options
If the data is updated regularly
Go with the caching option provided by Phil
I would do nothing special at all.
It should be up to the API to specify the appropriate Cache-Control headers. The browser will handle the caching, so subsequent fetches will use the cache if applicable.
The API knows how fresh its data is, and knows when it expects to be updated. It's also the case with weather that certain weather patterns change faster than others. That's why it's up to the API to decide what to do.
Which is the best approach for an app that aims to filter data, like 5000+ records, by keeping the response speed in focus?
Filter local memory arrays
Query to db through http API request calls
For my app I use angularjs, php and SQLite3. Right now I load all record from slite db to my table and then filter this field by search. All works great, but when I exceed 3000 records I notice a certain slowing down. By limiting the search on two fields, I get better performance.
My doubt is if changing the model and querying the db I get a better performance or not.
Local array advantages
I can use JavaScript Array map() Method
low consuming data bandwidth
I can see all records in table before filter
I can work, after loading data, in offline.
Local array disadvantages
slowing down performance over 2000 record.
So can you help me to evaluate advantages and disadvantages if I make http API call for any filter action request keeping in focus the performances?
I can't tell about caching in PHP, but for the AngularJS end, there's an approach you can follow:
When the user searches for the first time, fetch the result(s) from db.
Make 2 copies of the data: one presented to the user directly, another can be stores in a local json with a key value pair approach.
3.Next time the user searches for anything, look into the local json first for the result. If the data is present locally, no need for the db query, else make the db query and repeat step 2.
The idea is not to make user wait for every search, you cannot simply call all 5000+ records at once and store locally, and you definitely cannot make db queries every-time since RDMS having that much records simply have low performance issues.
So this seems best to me.
I have an app where I'm updating data from various APIs and storing it locally. The data in each API is updated at different intervals...3 seconds, 15 seconds, and 30+ seconds.
At the moment, I'm updating each API and then setting a setTimeout to schedule the next update. It works...but is this optimal?
Another option I've considered is to include a field named nextUpdate in my database model that takes a Number (Unix timestamp), and then query the database once per second for any objects that are scheduled to update with mongoose, such as .find({ nextUpdate: { $gt: Date.now() / 1000 }). My concern was that this would cause too many unnecessary calls (and frankly this is my first app so I don't know how many mongo requests per second is considered too much). I'm currently using Mlab as my database host.
So would you continue using setTimeout? The database refresh option I've proposed above? Or another solution?
Thanks in advance for your time and advice.
I would keep using the first approach, though setInterval would be more fitting here.
The second approach seems like it only has downsides to it, unless I'm missing something?
(I would have liked to post this as a comment but cannot post comments yet)