architecture for get and store api request data

architecture for get and store api request data - javascript

This is more of a architectural questions. An external platform had product and price information for let's say, books. There is an API available to get this information.
What I read is that it should be possible to create a function in Javascript and connect the Javascript to a page where you want to show the data on my own website. This would mean that for each page request an API-call is made. Since the requested information only changes once a day maximum this does not sound the most efficient solution.
Can someone advise a better solution? Something into the direction of a similar php or javascript function that does the request on the background, schedule an update and import the data into mysql? If so, what language would be most common.
I need the solution for a Joomla/php/mysql environment

Here's a simple idea - fetch and store results from the API (ones you think aren't gonna change in a day), either on disk, or in the database, and later use these stored results to retrieve what you otherwise would've fetched from the API.
Since storing anything in frontend JS across page reloads isn't easy, you need to make use of PHP for that. Based on what's given, you seem to have two ways of calling the API:
via the frontend JS (no-go)
via your PHP backend (good-to-go)
Now, you need to make sure your results are synced every (say) 24 hours.
Add a snippet to your PHP code that contains a variable $lastUpdated (or something similar), and assign it the "static" value of the current time (NOT using time()). Now, add a couple of statements to update the stored results if the current time is at least 24 hours greater than $lastUpdated, followed by updating $lastUpdated to current time.
This should give you what you need with one API call per day.
PS: I'm not an expert in PHP, but you can surely figure out the datetime stuff.

It sounds like you need a cache, and you're not the first person to run into that problem - so you probably don't need to reinvent the wheel and build your own.
Look into something like Redis. There's an article on it available here as well: https://www.compose.com/articles/api-caching-with-redis-and-nodejs/

Related

Javascript: Weather API design question with hourly polling

I am doing practice interviews and specifically prepping for the Design portion. One mentions:
Design a weather widget that pull data from a service API which makes data available every hour. Avoid pulling the data from it all the time if there are no changes. Then what happens when you scale this to lots of users.
My first thought would be obviously create a function that fetches the data from the GET endpoint and then parsing the JSON.
The part that would throw me off though is: "Avoid pulling the data from it all the time if there are no changes" . How can I know there are no changes without first pulling the data? My only thought would be to create a ignore flag:
Pull the data, mark the temperature as 55 Degrees. Create a flag that ignores values of +/- 3 degrees from this temperature value.
Next hour, pull the data and see the temperature is 56 Degrees. That is within the ignore flag range: (ex: if (Math.abs(temperature - nextTemp) > 3) { ignoreFor5Hours = true; } . Then this will stop the hourly pulling for 5 hours, or however long someone set it to.
Does this make sense or am I thinking about this the wrong way?

Assuming the data is not updated regularly
It sounds quite confusing as there should be no method for the client side to actively know whether there is an update on the data before pulling it from the server.
One way I would suggest is to use two-ways communication, such as socket.io. That is, you establish a connection to the server, once there is an update, the server can initialize a call to your client app for fetching the data.
Another way is to use long pulling, or just like your interval fetching, to pull a hash from the server and check if the hash changed, which is also not ideal as you have to also load your server with a hanging request, but at least the data traffic will be smaller.
These methods are obviously not optimal, but if you must follow the guideline and it means what it means, those can be your options
If the data is updated regularly
Go with the caching option provided by Phil

I would do nothing special at all.
It should be up to the API to specify the appropriate Cache-Control headers. The browser will handle the caching, so subsequent fetches will use the cache if applicable.
The API knows how fresh its data is, and knows when it expects to be updated. It's also the case with weather that certain weather patterns change faster than others. That's why it's up to the API to decide what to do.

Should I use a cache for this?

I made a code this summer holidays and today I look for the first time at my code again, and I am strugging on one thing I did.
My system is a system with multiple types (pages, newsletters etc.) and multiple subtypes (items, archive, concepts etc.). The idea now I have an object like this:
object { 1: { normal: { 1: { content: 'somecontent', title: 'sometitle' } } } }
Another example:
object { 1: { normal: { 1: { content: 'somecontent', title: 'sometitle' } }, archive: {} }, 2: { normal: {} } }
The data originally comes from the database. I'm making a system to edit pages on the website and other things like newsletters. Because I have multiple types and subtypes.
I made a cache for the reason I don't want to get all items from the database every time. But now the problem is if I add an item, edit an item and remove an item I have to delete it from the cache / edit / add.
My question: is this a good way? I thought it is because you don't have to call an AJAX file to get the data from the database.
I'm sorry if I'm not allowed to ask this here.

My question: is this a good way? I thought it is because you don't
have to call an AJAX file to get the data from the database.
The answer is that "it depends". There is no always right and always wrong answer for caching because caching is a tradeoff between efficiency and timeliness of data.
If you want maximum efficiency, you cache like crazy, but your data may not be perfectly up to date because you're using old data from the cache.
If you want the most up-to-date data, you don't cache anything so you always get the latest data, but obviously efficiency may suffer if you are regular requesting the same data over and over.
So, it's a tradeoff and the tradeoff depends entirely upon the application, its needs, how often the data is modified and what the consequences are for having stale data or for not caching. There is no single right or wrong answer for that tradeoff. It depends entirely upon the particular situation for your application and the tradeoff may even be different for some types of data vs. others within the same application.
For example, let's supposed you were writing an online bidding site that offered some functionality like eBay. You would probably be fine caching the item description for at least several hours because that almost never changes and even if it does, the consequences of being a bit tardy on seeing a new item description are fairly low. But, you could never cache the data on the current bid because the timeliness of that information is critical. The user needs to always see the latest info on the current bid, even if you have to make some sacrifices in efficiency.
Also, remember that caching isn't completely all or none. You can set a lifetime for a cached value such that it can only be used for a certain period of time that is appropriate for the type of data. For example, you might cache an item description in the above auction for up to 2 hours. This allows you to achieve some efficiency gains, but also to eventually see the new data if it happens to change.
In general, you have to review the consequences of showing stale data. If the consequences for having data that is even minutes out of date are high (like the latest price in a live auction), then you can't cache that data at all.
If the consequences of having data that is even hours out of date are low, then you can likely cache that value for at least several hours - maybe even longer.
And, when considering what to cache, you obviously want to first look at the items that are most requested and are the most expensive on your server to retrieve. Some analysis of the usage pattern on your server would give you a prioritized list of candidates to consider for caching.

My question: is this a good way? I thought it is because you don't
have to call an AJAX file to get the data from the database.
This is fine if
1) You want to provide offline reading continuity to the user. User doesn't have to wait for internet connection to be available so that they can read at any time.
2) Your data-service is quite heavy and you want to avoid multiple/frequent visits to the server to get the same data over and over again.
3) You want your app to be bundled with a native package (like phonegap) to become a hybrid app and give a complete offline experience to the user.
This is not a comprehensive list, but just to get your started in terms of when to go for offline and when to keep totally offline
So, on the other hand, this is a bad idea if
1) Your local storage structure is going to change frequently for user to require re-install (unless you can figure out auto-upgrate of local storage)
2) All your features are transactional and require synch with other users also.

Nothing wrong with your approach, just make sure you have kept these points in mind while managing client-side cache
You have one variable 'version' maintained, this version is to be increased whenever there's any change in structure, this version will be sent to client every time, client is responsible for comparison of versions and empty client cache if server version is greater than client version.
You can implement or find any open-sources to handle your ajax responses, this one might be useful - https://github.com/SaneMethod/jquery-ajax-localstorage-cache.
you can set proper expiry tag from server, which can also help, browser to cache response for you, if it is 'get' request.
You can also implement server-side cache, which will not make calls to database, it will cache response against request-url, Note - if different users are supposed to receive different response than this approach wont work. You can delete the cache if any changes happens related to that particular data set - delete/update
In your case you can also maintain flags on server, which simply tells if data has been updated or not the time of article update, if stored version is older you can make server-request or just use local version.
I hope it helps.

How to keep a list of items updated on a website using Javascript?

Coming from Python/Java/PHP, I'm now building a website. On it I want a list of items to be updated in near-realtime: if items (server side) get added to or deleted from the list, this should be updated on the webpage. I made a simple API call which I now poll every second to update the list using jQuery. Because I need some more lists to be kept updated on the same page I'm afraid this will turn into more than 10 server calls per second from every single open browser, even if nothing gets updated.
This seems not like the logical way to do it, but I don't really know how else to do it. I looked at Meteor, but since the webpage I'm building is part of a bigger system I'm rather restricted in my choices of technology (basic LAMP setup).
Could anybody enlighten me with a tip from the world of real-time websites on how to efficiently keep a list updated?

You can use WebSocket(https://code.google.com/p/phpwebsocket/ ) technology.
but php is not the best language for implement it

A way to work this is using state variables for the different types of data you want to have updated (or not).
In order to avoid re-querying the full tables even if the data set in them has not changed in relation to what a particular client has displayed at any given time, you could maintain a state counter variable for the data type on the server (for example in a dedicated small table) and on the client in a javascript variable.
Whenever an update is done on the data tables on the server, you update the state counter there.
Your AJAX polling calls would then query this state counter, compare it to the corresponding javascript variable, and only do a data-update call if it has changed, updating the local javascript variable to what the server has.
In order to avoid having to poll for each datatype separately, you might want to use an JS object with a member for each datatype.
Note: yes this is all very theoretical, but, hey, so is the question ;)

which is better, searching in javascript or database?

I have a grid(employee grid) which has say 1000-2000 rows.
I display employee name and department in the grid.
When I get data for the grid, I get other detail for the employee too(Date of Birth, location,role,etc)
So the user has option to edit the employee details. when he clicks edit, I need to display other employee details in the pop up. since I have stored all the data in JavaScript, I search for the particular id and display all the details. so the code will be like
function getUserDetails(employeeId){
//i store all the employeedetails in a variable employeeInformation while getting //data for the grid.
for(var i=0;i<employeeInformation.length;i++){
if(employeeInformation[i].employeeID==employeeId){
//display employee details.
}
}
}
the second solution will be like pass employeeid to the database and get all the information for the employee. The code will be like
function getUserDetails(employeeId){
//make an ajax call to the controller which will call a procedure in the database
// to get the employee details
//then display employee details
}
So, which solution do you think will be optimal when I am handling 1000-2000 records.
I don't want to make the JavaScript heavy by storing a lot of data in the page.
UPDATED:
so one of my friend came up with a simple solution.
I am storing 4 columns for 500 rows(average). So I don't think there should not be rapid slowness in the webpage.
while loading the rows to the grid, under edit link, I give the data-rowId as an attribute so that it will be easy to retrieve the data.
say I store all the employee information in a variable called employeeInfo.
when someone clicks the edit link.. $(this).attr('data-rowId') will give the rowId and employeeInfo[$(this).attr('data-rowId')] should give all the information about the employee.
instead of storing the employeeid and looping over the employee table to find the matching employeeid, the rowid should do the trick. this is very simple. but did not strike me.

I would suggest you make an AJAX call to the controller. Because of two main reasons
It is not advisable to handle Database actiity in javascript due to security issues.
Javascript runs on client side machine it should have the least load and computation.
Javascript should be as light as possible. So i suggest you do it in the database itself.

Don't count on JavaScript performance, because it is heavily depend on computer that is running on. I suggest you to store and search on server-side rather than loading heavy payload of data in Browser which is quite restricted to resources of end-user.
Running long loops in JavaScript can lead to an unresponsive and irritating UI. Use Ajax calls to get needed data as a good practice.

Are you using HTML5? Will your users typically have relatively fast multicore computers? If so, a web-worker (http://www.w3schools.com/html/html5_webworkers.asp) might be a way to offload the search to the client while maintaining UI responsiveness.
Note, I've never used a Worker, so this advice may be way off base, but they certainly look interesting for something like this.

In terms of separation of concerns, and recommended best approach, you should be handling that domain-level data retrieval on your server, and relying on the client-side for processing and displaying only the records with which it is concerned.
By populating your client with several thousand records for it to then parse, sort, search, etc., you not only take a huge performance hit and diminish user experience, but you also create many potential security risks. Obviously this also depends on the nature of the data in the application, but for something such as employee records, you probably don't want to be storing that on the client-side. Anyone using the application will then have access to all of that.
The more pragmatic approach to this problem is to have your controller populate the client with only the specific data which pertains to it, eliminating the need for searching through many records. You can also retrieve a single object by making an ajax query to your server to retrieve the data. This has the dual benefit of guaranteeing that you're displaying the current state of the DB, as well as being far more optimized than anything you could ever hope to write in JS.

Ajax "Is there new content? If so, update page" - How to do this without breaking the server?

It's a simple case of a javascript that continuously asks "are there yet?" Like a four year old on a car drive.. But, much like parents, if you do this too often or, with too many kids at once, the server will buckle under pressure..
How do you solve the issue of having a webpage that looks for new content in the order of every 5 seconds and that allows for a larger number of visitors?

stackoverflow does it some way, don't know how though.
The more standard way would indeed be the javascript that looks for new content every few seconds.
A more advanced way would use a push-like technique, by using Comet techniques (long-polling and such). There's a lot of interesting stuff under that link.
I'm still waiting for a good opportunity to use it myself...
Oh, and here's a link from stackoverflow about it:
Is there some way to PUSH data from web server to browser?

In Java I used Ajax library (DWR) using Comet technology - I think you should search for library in PHP using it.
The idea is that server is sending one very long Http response and when it has something to send to the client it ends it and send new response with updated data.
Using it client doens't have to ping server every x seconds to get new data - I think it could help you.

You could make the poll time variable depending on the number of clients. Using your metaphor, the kid asks "Are we there yet?" and the driver responds "No, but maybe in an hour". Thankfully, Javascript isn't a stubborn kid so you can be sure he won't bug you until then.

You could consider polling every 5 seconds to start with, but after a while start to increase the poll interval time - perhaps up to some upper limit (1 minute, 5 minute - whatever seems optimal for your usage). The increase doesn't have to be linear.
A more sophisticated spin (which could incorporate monzee's suggestion to vary by number of clients), would be to allow the server to dictate the interval before next poll. The server could then increase the intervale over time, and you can even change the algorithm on the fly, or in response to network load.

You could take a look at the 'Twisted' framework in python. It's event-driven network programming framework that might satisfy what you are looking for. It can be used to push messages from the server.

Perhaps you can send a query to a real simple script, that doesn't need to make a real db-query, but only uses a simple timestamp to tell if there is anything new.
And then, if the answer is true, you can do a real query, where the server has to do real work !-)

I would have a single instance calling the DB and if a newer timestamp exists, put that new timestamp in a application variable. Then let all sessions check against that application variable. Or something like that. That way only one innstance are calling the sql-server and the number of clients does'nt matter.
I havent tried this and its just the first idéa on the top of the head but I think that cashe the timestamp and let the clients check the cashe is a way to do it, and how to implement the cashe (sql-server-cashe, application variable and so on) I dont know whats best.

Regarding how SO does it, note that it doesn't check for new answers continuously, only when you're typing into the "Your Answer" box.
The key then, is to first do a computationally cheap operation to weed out common "no update needed" cases (e.g., entering a new answer or checking a timestamp) before initiating a more expensive process to actually retrieve any changes.
Alternately, depending on your application, you may be able to resolve this by optimizing your change-publishing mechanism. For example, perhaps it might be feasible for changes (or summaries of them) to be put onto an RSS feed and have clients watch the feed instead of the real application. We can assume that this would be fairly efficient, as it's exactly the sort of thing RSS is designed and optimized for, plus it would have the additional benefit of making your application much more interoperable with the rest of the world at little or no cost to you.

I believe the approach shd be based on a combination of server-side sockets and client-side ajax/comet. Like:
Assume a chat application with several logged on users, and that each of them is listening via a slow-load AJAX call to the server-side listener script.
Whatever browser gets the just-entered data submits it to the server with an ajax call to a writer script. That server updates the database (or storage system) and posts a sockets write to noted listener script. The latter then gets the fresh data and posts it back to the client browser.
Now I haven't yet written this, and right now I dunno whether/how the browser limit of two concurrent connections screws up the above logic.
Will appreciate hearing fm anyone with thoughts here.
AS

We Keep Coding

JavaScript is the programming language of the Web.