I have an app where I'm updating data from various APIs and storing it locally. The data in each API is updated at different intervals...3 seconds, 15 seconds, and 30+ seconds.
At the moment, I'm updating each API and then setting a setTimeout to schedule the next update. It works...but is this optimal?
Another option I've considered is to include a field named nextUpdate in my database model that takes a Number (Unix timestamp), and then query the database once per second for any objects that are scheduled to update with mongoose, such as .find({ nextUpdate: { $gt: Date.now() / 1000 }). My concern was that this would cause too many unnecessary calls (and frankly this is my first app so I don't know how many mongo requests per second is considered too much). I'm currently using Mlab as my database host.
So would you continue using setTimeout? The database refresh option I've proposed above? Or another solution?
Thanks in advance for your time and advice.
I would keep using the first approach, though setInterval would be more fitting here.
The second approach seems like it only has downsides to it, unless I'm missing something?
(I would have liked to post this as a comment but cannot post comments yet)
Related
I'm building an app using Algolia and Firebase.
To use Algolia, I'm using the following firebase extension so that everytime new data is added, edited, updated to firestore, the same record is kept in Algolia.
Everything works well except it takes around 1-2 minute to store the record in Algolia (currently has around 15000+ records).
I'm currently reading and displaying data straight from Algolia and updating it whenever the data gets changed. However it seems absurd to wait 1-2 minutes before user can finally see the updated details.
I'm using Algolia because I need more flexible search options + page offset pagination. If I could do that with just firestore, I would happily just read data straight from firestore.
But since that's not possible Can anyone see a better option?
I've already tried writing a custom trigger instead of the extension but the speed seems to be the same.
That seems to be the expected behaviour. Algolia's documentation has a dedicated page for that topic:
Why aren't my objects immediately available for search after being added?.
When you add or update a record, our servers reply to your request as soon as they understand the operation, but the actual indexing starts a few seconds later, asynchronously.
They also say it may take a few seconds (or minutes) for the new documents to be available. You can read more about how fast is the indexing?. It says, indexing may be slower on shared clusters. Upgrading to dedicated clusters may be helpful. That being said, time taken for indexing does not depend on whether you are using the extension, custom cloud function or your own servers to add data in Algolia.
I am doing practice interviews and specifically prepping for the Design portion. One mentions:
Design a weather widget that pull data from a service API which makes data available every hour. Avoid pulling the data from it all the time if there are no changes. Then what happens when you scale this to lots of users.
My first thought would be obviously create a function that fetches the data from the GET endpoint and then parsing the JSON.
The part that would throw me off though is: "Avoid pulling the data from it all the time if there are no changes" . How can I know there are no changes without first pulling the data? My only thought would be to create a ignore flag:
Pull the data, mark the temperature as 55 Degrees. Create a flag that ignores values of +/- 3 degrees from this temperature value.
Next hour, pull the data and see the temperature is 56 Degrees. That is within the ignore flag range: (ex: if (Math.abs(temperature - nextTemp) > 3) { ignoreFor5Hours = true; } . Then this will stop the hourly pulling for 5 hours, or however long someone set it to.
Does this make sense or am I thinking about this the wrong way?
Assuming the data is not updated regularly
It sounds quite confusing as there should be no method for the client side to actively know whether there is an update on the data before pulling it from the server.
One way I would suggest is to use two-ways communication, such as socket.io. That is, you establish a connection to the server, once there is an update, the server can initialize a call to your client app for fetching the data.
Another way is to use long pulling, or just like your interval fetching, to pull a hash from the server and check if the hash changed, which is also not ideal as you have to also load your server with a hanging request, but at least the data traffic will be smaller.
These methods are obviously not optimal, but if you must follow the guideline and it means what it means, those can be your options
If the data is updated regularly
Go with the caching option provided by Phil
I would do nothing special at all.
It should be up to the API to specify the appropriate Cache-Control headers. The browser will handle the caching, so subsequent fetches will use the cache if applicable.
The API knows how fresh its data is, and knows when it expects to be updated. It's also the case with weather that certain weather patterns change faster than others. That's why it's up to the API to decide what to do.
I have a NodeJS application that saves form data in MongoDB database collection. I want to run a function that will change some values of the object in MongoDB database collection 2 days after the form data is saved in the database. I am not sure if it is possible to use setTimeOut() for 2 days, and even if it is possible, I think there must be a better way to do this because form data is saved upon a request.
My alternative solution is creating setInterval which will run the code once a day to check if there are any database items with dates past but I am still looking for a better solution
Thanks for your time!
It is absolutely possible with setTimeOut and setInterval. It will be better if you node package like Node Cron or Node Schedule.
This is more of a architectural questions. An external platform had product and price information for let's say, books. There is an API available to get this information.
What I read is that it should be possible to create a function in Javascript and connect the Javascript to a page where you want to show the data on my own website. This would mean that for each page request an API-call is made. Since the requested information only changes once a day maximum this does not sound the most efficient solution.
Can someone advise a better solution? Something into the direction of a similar php or javascript function that does the request on the background, schedule an update and import the data into mysql? If so, what language would be most common.
I need the solution for a Joomla/php/mysql environment
Here's a simple idea - fetch and store results from the API (ones you think aren't gonna change in a day), either on disk, or in the database, and later use these stored results to retrieve what you otherwise would've fetched from the API.
Since storing anything in frontend JS across page reloads isn't easy, you need to make use of PHP for that. Based on what's given, you seem to have two ways of calling the API:
via the frontend JS (no-go)
via your PHP backend (good-to-go)
Now, you need to make sure your results are synced every (say) 24 hours.
Add a snippet to your PHP code that contains a variable $lastUpdated (or something similar), and assign it the "static" value of the current time (NOT using time()). Now, add a couple of statements to update the stored results if the current time is at least 24 hours greater than $lastUpdated, followed by updating $lastUpdated to current time.
This should give you what you need with one API call per day.
PS: I'm not an expert in PHP, but you can surely figure out the datetime stuff.
It sounds like you need a cache, and you're not the first person to run into that problem - so you probably don't need to reinvent the wheel and build your own.
Look into something like Redis. There's an article on it available here as well: https://www.compose.com/articles/api-caching-with-redis-and-nodejs/
I am building a project using node.js that is integrated with 4 other systems that keeps sending data from sensors every 1 second. I am trying to have like a timeline so I need to save that data, but I don't feel it's correct to hit a couple of insert statements every one second.
what is the best way to save data that is that redundant. I was thinking about having some log files and then insert data in bulk. Any suggestions?
Thank you.
This would make it a premature optimization. I've bench-marked PostgreSQL under Node.js many times. And at any given moment inserting several records per second will take under 10ms, i.e. less than 1% of your app's load, if you do it every second.
The only worthwhile optimization you should do from start - use multi-row insert, even if you insert only 2 rows at a time. The reasons for this are as follows:
Node.js IO is a valuable resource, so the fewer round trips you do the better
Multi-row inserts are tremendously faster than separate insert queries
Separate inserts typically require a transaction, and a single multi-row insert doesn't.
You can find a good example here: Multi-row insert with pg-promise.