Caching information from API queries - Limited to 10 per 10s

Caching information from API queries - Limited to 10 per 10s - javascript

relatively new to databases here (and dba).
I've been recently looking into Riot Games' APIs, however now realising that you're limited to 10 calls per 10 seconds, I need to change my front-end code that was originally just loading all the information with lots of and lots of API calls into something that uses a MySQL database.
I would like to collect ranked data about each player and list them (30+ players) in an ordered list of ranking. I was thinking, as mentioned in their Rate Limiting Page, "caching" data when GET-ing it, and then when needing that information again, check if it is still relevant - if so use it, if not re-GET it.
Is the idea of adding a time of 30 minutes (the rough length of a game) in the future to a column in a table, and when calling check whether server time is ahead of the saved time. Is this the right approach/idea of caching - If not, what is the best practice of doing so?
Either way, this doesn't solve the problem of loading 30+ values for the first time, when no previous calls have been made to cache.
Any advice would be welcome, even advice telling me I'm doing completely the wrong thing!
If there is more information needed I can edit it in, let me know.
tl;dr What's best practice to get around Rate-Limiting?

Generally yes, most of the large applications simply put guesstimate rate limits, or manual cache (check DB for recent call, then go to API if its an old call).
When you use large sites like op.gg or lolKing for Summoner look ups, they all give you a "Must wait X minutes before doing another DB check/Call", I also do this. So yes, giving an estimated number (like a game length) to handle your rate limit is definitely a common practice that I have observed within the Riot Developer community. Some people do go all out and implement actual caching though with actual caching layers/frameworks, but you don't need to do that with smaller applications.
I recommend building up your app's main functionality first, submit it, and get it approved for a higher rate limit as well. :)
Also you mentioned adjusting your front-end code for calls, make sure your API calls are in server-side code for security concerns.

Related

Should I use a cache for this?

I made a code this summer holidays and today I look for the first time at my code again, and I am strugging on one thing I did.
My system is a system with multiple types (pages, newsletters etc.) and multiple subtypes (items, archive, concepts etc.). The idea now I have an object like this:
object { 1: { normal: { 1: { content: 'somecontent', title: 'sometitle' } } } }
Another example:
object { 1: { normal: { 1: { content: 'somecontent', title: 'sometitle' } }, archive: {} }, 2: { normal: {} } }
The data originally comes from the database. I'm making a system to edit pages on the website and other things like newsletters. Because I have multiple types and subtypes.
I made a cache for the reason I don't want to get all items from the database every time. But now the problem is if I add an item, edit an item and remove an item I have to delete it from the cache / edit / add.
My question: is this a good way? I thought it is because you don't have to call an AJAX file to get the data from the database.
I'm sorry if I'm not allowed to ask this here.

My question: is this a good way? I thought it is because you don't
have to call an AJAX file to get the data from the database.
The answer is that "it depends". There is no always right and always wrong answer for caching because caching is a tradeoff between efficiency and timeliness of data.
If you want maximum efficiency, you cache like crazy, but your data may not be perfectly up to date because you're using old data from the cache.
If you want the most up-to-date data, you don't cache anything so you always get the latest data, but obviously efficiency may suffer if you are regular requesting the same data over and over.
So, it's a tradeoff and the tradeoff depends entirely upon the application, its needs, how often the data is modified and what the consequences are for having stale data or for not caching. There is no single right or wrong answer for that tradeoff. It depends entirely upon the particular situation for your application and the tradeoff may even be different for some types of data vs. others within the same application.
For example, let's supposed you were writing an online bidding site that offered some functionality like eBay. You would probably be fine caching the item description for at least several hours because that almost never changes and even if it does, the consequences of being a bit tardy on seeing a new item description are fairly low. But, you could never cache the data on the current bid because the timeliness of that information is critical. The user needs to always see the latest info on the current bid, even if you have to make some sacrifices in efficiency.
Also, remember that caching isn't completely all or none. You can set a lifetime for a cached value such that it can only be used for a certain period of time that is appropriate for the type of data. For example, you might cache an item description in the above auction for up to 2 hours. This allows you to achieve some efficiency gains, but also to eventually see the new data if it happens to change.
In general, you have to review the consequences of showing stale data. If the consequences for having data that is even minutes out of date are high (like the latest price in a live auction), then you can't cache that data at all.
If the consequences of having data that is even hours out of date are low, then you can likely cache that value for at least several hours - maybe even longer.
And, when considering what to cache, you obviously want to first look at the items that are most requested and are the most expensive on your server to retrieve. Some analysis of the usage pattern on your server would give you a prioritized list of candidates to consider for caching.

My question: is this a good way? I thought it is because you don't
have to call an AJAX file to get the data from the database.
This is fine if
1) You want to provide offline reading continuity to the user. User doesn't have to wait for internet connection to be available so that they can read at any time.
2) Your data-service is quite heavy and you want to avoid multiple/frequent visits to the server to get the same data over and over again.
3) You want your app to be bundled with a native package (like phonegap) to become a hybrid app and give a complete offline experience to the user.
This is not a comprehensive list, but just to get your started in terms of when to go for offline and when to keep totally offline
So, on the other hand, this is a bad idea if
1) Your local storage structure is going to change frequently for user to require re-install (unless you can figure out auto-upgrate of local storage)
2) All your features are transactional and require synch with other users also.

Nothing wrong with your approach, just make sure you have kept these points in mind while managing client-side cache
You have one variable 'version' maintained, this version is to be increased whenever there's any change in structure, this version will be sent to client every time, client is responsible for comparison of versions and empty client cache if server version is greater than client version.
You can implement or find any open-sources to handle your ajax responses, this one might be useful - https://github.com/SaneMethod/jquery-ajax-localstorage-cache.
you can set proper expiry tag from server, which can also help, browser to cache response for you, if it is 'get' request.
You can also implement server-side cache, which will not make calls to database, it will cache response against request-url, Note - if different users are supposed to receive different response than this approach wont work. You can delete the cache if any changes happens related to that particular data set - delete/update
In your case you can also maintain flags on server, which simply tells if data has been updated or not the time of article update, if stored version is older you can make server-request or just use local version.
I hope it helps.

How to build 'real time visitors on site', Google API?

I'm building a custom admin dashboard for users on our site who create posts. I want to show them the active amount of visitors on their posts only (not on the entire site).
I want it to act exactly like GA does it:
I was originally thinking of building this from scratch, but in retrospect it might be easier to use the GA API?
I've stared at the docs for forever and I'm just not groking it, so I'm coming here for help.
We have ~5,000 posts total, and I some people on our site have authored over 1000 posts, so the 'input' to GA will be anywhere from 1 to 1000+ slugs (for only their posts).
I want a combined amount of on-site traffic for their posts only.
Optionally, maybe it would have to be reversed... I'm not sure if GA can show it, but even better probably would be to get a content breakdown of the realtime visitors from the API, with 5000 max results. From there I can filter through the result set slugs (along with then number of users on each), and compare those results to each slug which belongs to that user, then just sum the totals on my end.
Is this something the Google API could help me with? which API endpoint would I need to use? Is it possible to have 5000+ max results for URLs with traffic on them from the API?
Thanks!

Yes, it is possible.
It seems that you should utilize Real Time Data: get endpoint.
Additionally, to limit results for specific pages (posts) only, you should use dimension filters (filters which will select only specific page views before calculating aggregated result), and 'ga:pagePath' looks like the one you need:
ga:pagePath
UI Name: Page
A page on your website specified by path and/or query parameters. Use in conjunction with hostname to get the full URL of the page.
Source
You might prefer using ga:pageTitle instead, if you have similar title for posts of a single author, and you haven't got common path elements in posts of the same author.
So you do something like:
GET https://www.googleapis.com/analytics/v3/data/realtime
ids=ga:<your_analytics_id>
metrics=rt:activeUsers
dimensions=rt:pagePath
filters=rt:pagePath=~/authors/123/*
Please notice that there maybe slight difference in real time and non-realtime API (e.g. use of 'rt' instead of 'ga' above), and generally realtime-API is still in beta.
Generally speaking, you should go here: Real Time Reporting API - Developer Guide and look through the links in the table of contents (left part of the page).
What about 'building from scratch' idea: it's rather simple from the developer's perspective, but it could be complex from the dev-ops perspective. I.e., it's not a problem to write code which would aggregate such metrics. But it could be a problem to make a system which will sustain required for that task amount of requests per second.

I think you will want to apply your second approach: pull down realtime visitors for all slugs and then aggregate by author on your own server.
There is a quota of 10,000 queries per profile per day. Using your first approach, it sounds like you would be performing a query for each author. Say you have 50 authors. This would leave you only 20 queries/day/author (10,000/50). Factoring in the time dimension, this would allow you only 8.33 (200/24) queries per hour for each author. Not very "realtime like".
If you have problems getting it going, check out http://www.embeddedanalytics.com - we have done many implementations such as this. In fact, we even have hat "Right Now" realtime widget.
Is there a way to determine the author based on the slug title?

Breeze.js cache limitations? Or Browser?

We are investigating using Breeze for field deployment of some tools. The scenario is this -- an auditor will visit sites in the field, where most of the time there will be no -- or very degraded -- internet access. Rather than replicate our SQL database on all the laptops and tablets (if that's even possible), we are hoping to use Breeze to cache the data and then store it locally so it is accessible when there is not a usable connection.
Unfortunately, Breeze seems to choke when caching any significant amount of data. Generally on Chrome it's somewhere between 8 and 13MB worth of entities (as measured by the HTTPResponse headers). This can change a bit depending on how many tabs I have open and such, but I have not been able to move that more than 10%. the error I get is the Chrome tab crashes and tells me to reload. The error is replicable (I download the data in 100K chunks and it fails on the same read every time and works fine if I stop it after the previous read) When I change the page size, it always fails within the same range.
Is this a limitation of Breeze, or Chrome? Or windows? I tried it on Firefox, and it handles even less data before the whole browser crashes. IE fares a little better, but none of them do great.
Looking at performance in task manager, I get the following:
IE goes from 250M memory usage to 1.7G of memory usage during the caching process and caches a total of about 14MB before throwing an out-of-memory error.
Chrome goes from 206B memory usage to about 850M while caching a total of around 9MB
Firefox goes from around 400M to about 750M and manages to cache about 5MB before the whole program crashes.
I can calculate how much will be downloaded with any selection criteria, but I cannot find a way to calculate how much data can be handled by any specific browser instance. This makes using Breeze for offline auditing close to useless.
Has anyone else tackled this problem yet? What are the best approaches to handling something like this. I've thought of several things, but none of them are ideal. Any ideas would be appreciated.
ADDED At Steve Schmitt's request:
Here are some helpful links:
Metadata
Entity Diagram (pdf) (and html and edmx)
The first query, just to populate the tags on the page runs quickly and downloads minimal data:
var query = breeze.EntityQuery
.from("Countries")
.orderBy("Name")
.expand("Regions.Districts.Seasons, Regions.Districts.Sites");
Once the user has select the Sites s/he wishes to cache, the following two queries are kicked off (used to be one query, but I broke it into two hoping it would be less of a burden on resources -- it didn't help). The first query (usually 2-3K entities and about 2MB) runs as expected. Some combination of the predicates listed are used to filter the data.
var qry = breeze.EntityQuery
.from("SeasonClients")
.expand("Client,Group.Site,Season,VSeasonClientCredit")
.orderBy("DistrictId,SeasonId,GroupId,ClientId")
var p = breeze.Predicate("District.Region.CountryId", "==", CountryId);
var p1 = breeze.Predicate("SeasonId", "==", SeasonId);
var p2 = breeze.Predicate("DistrictId", "==", DistrictId);
var p3 = breeze.Predicate("Group.Site.SiteId", "in", SiteIds);
After the first query runs, the second query (below) runs (also using some combination of the predicates listed to filter the data. At about 9MB, it will have about 50K rows to download). When the total download burden between the two queries is between 10MB and 13MB, browsers will crash.
var qry = breeze.EntityQuery
.from("Repayments")
.orderBy('SeasonId,ClientId,RepaymentDate');
var p1 = breeze.Predicate("District.Region.CountryId", "==", CountryId);
var p2 = breeze.Predicate("SeasonId", "==", SeasonId);
var p3 = breeze.Predicate("DistrictId", "==", DistrictId);
var p4 = breeze.Predicate("SiteId", "in", SiteIds);
Thanks for the interest, Steve. You should know that the Entity Relationships are inherited and currently in production supporting the majority of the organization's operations, so as few changes as possible to that would be best. Also, the hope is to grow this from a reporting application to one with which data entry can be done in the field (so, as I understand it, using projections to limit the data wouldn't work).
Thanks for the interest, and let me know if there is anything else you need.

Here are some suggestions based on my experience building on an offline capable web application using breeze. Some or all of these might not make sense for your use cases...
Identify which entity types need to be editable vs which are used to fill drop-downs etc. Load non-editable data using the noTracking query option and cache them in localStorage yourself using JSON.stringify. This avoids the overhead of coercing the data into entities, change tracking, etc. Good candidates for this approach in your model might be entity types like Country, Region, District, Site, etc.
If possible, provide a facility in your application for users to identify which records they want to "take offline". This way you don't need to load and cache everything, which can get quite expensive depending on the number of relationships, entities, properties, etc.
In conjunction with suggestion #2, avoid loading all the editable data at once and avoid using the same EntityManager instance to load each set of data. For example, if the Client entity is something that needs to be editable out in the field without a connection, create a new EntityManager, load a single client (expanding any children that also need to be editable) and cache this data separately from other clients.
Cache the breeze metadata once. When calling exportEntities the includeMetadata argument should be false. More info on this here.
To create new EntityManager instances make use of the createEmptyCopy method.
EDIT:
I want to respond to this comment:
Say I have a client who has bills and payments. That client is in a
group, in a site, in a region, in a country. Are you saying that the
client, payment, and bill information might each have their own EM,
while the location hierarchy might be in a 4th EM with no-tracking?
Then when I refer to them, I wire up the relationships as needed using
LINQs on the different EMs (give me all the bills for customer A, give
me all the payments for customer A)?
It's a bit of a judgement call in terms of deciding how to separate things out. Some of what I'm suggesting might be overkill, it really depends on the amount of data and the way your application is used.
Assuming you don't need to edit groups, sites, regions and countries while offline, the first thing I'd do would be to load the list of groups using the noTracking option and cache them in localStorage for offline use. Then do the same for sites, regions and countries. Keep in mind, entities loaded with the noTracking option aren't cached in the entity manager so you'll need to grab the query result, JSON.stringify it and then call localStorage.setItem. The intent here is to make sure your application always has access to the list of groups, sites, regions, etc so that when you display a form to edit a client entity you'll have the data you need to populate the group, site, region and country select/combobox/dropdown.
Assuming the user has identified the subset of clients they want to work with while offline, I'd then load each of these clients one at a time (including their payment and bill information but not expanding their group, site, region, country) and cache each client+payments+bills set using entityManager.exportEntities. Reasoning here is it doesn't make sense to load several clients plus their payments and bills into the same EntityManager each time you want to edit a particular client. That could be a lot of unnecessary overhead, but again, this is a bit of a judgement call.

#Jeremy's answer was excellent and very helpful, but didn't actually answer the question, which I was starting to think was unanswerable, or at least the wrong question. However #Steve in the comments gave me the most appropriate information for this question.
It is neither Breeze nor the Browser, but rather Knockout. Apparently the knockout wrapper around the breeze entities uses all that memory (at least while loading the entities and in my environment). As described above, Knockout/Breeze would crap out after reading around 5MB of data, causing Chrome to crash with over 1.7GB of memory usage (from a pre-download memory usage around 300MB). Rewriting the app in ANgularJS eliminated the problem. So far I have been able to download over 50MB from the exact same EF6 model into Breeze/Angular, total Chrome memory usage never went above 625MB.
I will be testing larger payloads, but 50 MB more than satisfies my needs for the moment. Thanks everyone for your help.

Is preloading data into session variables with AJAX upon login a good idea?

I've created a subscription-based system that deals with a large data-set. In its first iteration, it had semi-complicated joins that would execute, based on user-set filters, on every 'data view' page. Each query would fetch anywhere from a few kilobytes to several megabytes depending on the filter range. I decided this was unacceptable and so learned about APC (I had heard about its data-store features).
I moved all of the strings out of the queries into an APC preload routine that fires upon first login. In the same routine, I am running the "full set" join query to get all of the possible IDs for the data set into a $_SESSION variable. The entire set is anywhere from 100-800Kb, depending on what data the customer is subscribed to.
I convert this set into a JSON array and shuffle the data around dynamically when the user changes the filters. In creating the system I wanted it to seem as if the user was moving around lots of data very quickly, with minimal page loading (AJAX + APC when string representations are needed), as they played with the filters.
My multipart question is, is it possible for the user to effectively "cancel" the initial cache/query routine by surfing to another page after the first login? If so, can I move this process to an AJAX page for preloading, or does this carry the same problem? Or, am I just going about all of this in the wrong way? I came up with the idea on my own and I'm worried that I've created an unusable monster.
Also, I've been warned that my questions suck and I'm in danger of being banned. Every question I've asked has come from a position of intelligent wonder, written as well as I knew how at the time, and so it's really aggravating when an outsider votes me down without intelligent criticism. Just tell me what I did wrong and I will quickly fix the problem. Bichis.

Ajax "Is there new content? If so, update page" - How to do this without breaking the server?

It's a simple case of a javascript that continuously asks "are there yet?" Like a four year old on a car drive.. But, much like parents, if you do this too often or, with too many kids at once, the server will buckle under pressure..
How do you solve the issue of having a webpage that looks for new content in the order of every 5 seconds and that allows for a larger number of visitors?

stackoverflow does it some way, don't know how though.
The more standard way would indeed be the javascript that looks for new content every few seconds.
A more advanced way would use a push-like technique, by using Comet techniques (long-polling and such). There's a lot of interesting stuff under that link.
I'm still waiting for a good opportunity to use it myself...
Oh, and here's a link from stackoverflow about it:
Is there some way to PUSH data from web server to browser?

In Java I used Ajax library (DWR) using Comet technology - I think you should search for library in PHP using it.
The idea is that server is sending one very long Http response and when it has something to send to the client it ends it and send new response with updated data.
Using it client doens't have to ping server every x seconds to get new data - I think it could help you.

You could make the poll time variable depending on the number of clients. Using your metaphor, the kid asks "Are we there yet?" and the driver responds "No, but maybe in an hour". Thankfully, Javascript isn't a stubborn kid so you can be sure he won't bug you until then.

You could consider polling every 5 seconds to start with, but after a while start to increase the poll interval time - perhaps up to some upper limit (1 minute, 5 minute - whatever seems optimal for your usage). The increase doesn't have to be linear.
A more sophisticated spin (which could incorporate monzee's suggestion to vary by number of clients), would be to allow the server to dictate the interval before next poll. The server could then increase the intervale over time, and you can even change the algorithm on the fly, or in response to network load.

You could take a look at the 'Twisted' framework in python. It's event-driven network programming framework that might satisfy what you are looking for. It can be used to push messages from the server.

Perhaps you can send a query to a real simple script, that doesn't need to make a real db-query, but only uses a simple timestamp to tell if there is anything new.
And then, if the answer is true, you can do a real query, where the server has to do real work !-)

I would have a single instance calling the DB and if a newer timestamp exists, put that new timestamp in a application variable. Then let all sessions check against that application variable. Or something like that. That way only one innstance are calling the sql-server and the number of clients does'nt matter.
I havent tried this and its just the first idéa on the top of the head but I think that cashe the timestamp and let the clients check the cashe is a way to do it, and how to implement the cashe (sql-server-cashe, application variable and so on) I dont know whats best.

Regarding how SO does it, note that it doesn't check for new answers continuously, only when you're typing into the "Your Answer" box.
The key then, is to first do a computationally cheap operation to weed out common "no update needed" cases (e.g., entering a new answer or checking a timestamp) before initiating a more expensive process to actually retrieve any changes.
Alternately, depending on your application, you may be able to resolve this by optimizing your change-publishing mechanism. For example, perhaps it might be feasible for changes (or summaries of them) to be put onto an RSS feed and have clients watch the feed instead of the real application. We can assume that this would be fairly efficient, as it's exactly the sort of thing RSS is designed and optimized for, plus it would have the additional benefit of making your application much more interoperable with the rest of the world at little or no cost to you.

I believe the approach shd be based on a combination of server-side sockets and client-side ajax/comet. Like:
Assume a chat application with several logged on users, and that each of them is listening via a slow-load AJAX call to the server-side listener script.
Whatever browser gets the just-entered data submits it to the server with an ajax call to a writer script. That server updates the database (or storage system) and posts a sockets write to noted listener script. The latter then gets the fresh data and posts it back to the client browser.
Now I haven't yet written this, and right now I dunno whether/how the browser limit of two concurrent connections screws up the above logic.
Will appreciate hearing fm anyone with thoughts here.
AS

We Keep Coding

JavaScript is the programming language of the Web.