Facebook query very slow - javascript

Hi i'm using this query for get next facebook events of a user:
FB.Data.query("select eid,name,start_time,location,venue,
pic_small,pic_big,description
from event WHERE eid IN (SELECT eid FROM event_member WHERE uid={0})
AND start_time >= " + from + " ORDER BY start_time LIMIT 10", uid);
But for users with many events this is very slow.
How can i do to speed it?
thanks

Well, I haven't used the Facebook API, but from the query I can tell its taxing their system a bit more than a "straight data grab" so they might impose an intentional wait time on this sort of query.. why not just do a more basic one and then do the processing in your script? For example, I don't know what langauge you are using, but you could easily do the ordering on your own side (order by start time)... maybe just play around with it a bit and impose less constraints and see if its faster, at least this way you have a better idea of what you're working with as far as if they impose waits, etc...
Also, what type of set-up are you accessing this from? Is it a cloud server, local machine, shared hosting? Shared hosts sometimes do weird things in imposing limitations on speeds so if you can try it from a different setup that may help too.

Related

Caching information from API queries - Limited to 10 per 10s

relatively new to databases here (and dba).
I've been recently looking into Riot Games' APIs, however now realising that you're limited to 10 calls per 10 seconds, I need to change my front-end code that was originally just loading all the information with lots of and lots of API calls into something that uses a MySQL database.
I would like to collect ranked data about each player and list them (30+ players) in an ordered list of ranking. I was thinking, as mentioned in their Rate Limiting Page, "caching" data when GET-ing it, and then when needing that information again, check if it is still relevant - if so use it, if not re-GET it.
Is the idea of adding a time of 30 minutes (the rough length of a game) in the future to a column in a table, and when calling check whether server time is ahead of the saved time. Is this the right approach/idea of caching - If not, what is the best practice of doing so?
Either way, this doesn't solve the problem of loading 30+ values for the first time, when no previous calls have been made to cache.
Any advice would be welcome, even advice telling me I'm doing completely the wrong thing!
If there is more information needed I can edit it in, let me know.
tl;dr What's best practice to get around Rate-Limiting?
Generally yes, most of the large applications simply put guesstimate rate limits, or manual cache (check DB for recent call, then go to API if its an old call).
When you use large sites like op.gg or lolKing for Summoner look ups, they all give you a "Must wait X minutes before doing another DB check/Call", I also do this. So yes, giving an estimated number (like a game length) to handle your rate limit is definitely a common practice that I have observed within the Riot Developer community. Some people do go all out and implement actual caching though with actual caching layers/frameworks, but you don't need to do that with smaller applications.
I recommend building up your app's main functionality first, submit it, and get it approved for a higher rate limit as well. :)
Also you mentioned adjusting your front-end code for calls, make sure your API calls are in server-side code for security concerns.

Breeze.js cache limitations? Or Browser?

We are investigating using Breeze for field deployment of some tools. The scenario is this -- an auditor will visit sites in the field, where most of the time there will be no -- or very degraded -- internet access. Rather than replicate our SQL database on all the laptops and tablets (if that's even possible), we are hoping to use Breeze to cache the data and then store it locally so it is accessible when there is not a usable connection.
Unfortunately, Breeze seems to choke when caching any significant amount of data. Generally on Chrome it's somewhere between 8 and 13MB worth of entities (as measured by the HTTPResponse headers). This can change a bit depending on how many tabs I have open and such, but I have not been able to move that more than 10%. the error I get is the Chrome tab crashes and tells me to reload. The error is replicable (I download the data in 100K chunks and it fails on the same read every time and works fine if I stop it after the previous read) When I change the page size, it always fails within the same range.
Is this a limitation of Breeze, or Chrome? Or windows? I tried it on Firefox, and it handles even less data before the whole browser crashes. IE fares a little better, but none of them do great.
Looking at performance in task manager, I get the following:
IE goes from 250M memory usage to 1.7G of memory usage during the caching process and caches a total of about 14MB before throwing an out-of-memory error.
Chrome goes from 206B memory usage to about 850M while caching a total of around 9MB
Firefox goes from around 400M to about 750M and manages to cache about 5MB before the whole program crashes.
I can calculate how much will be downloaded with any selection criteria, but I cannot find a way to calculate how much data can be handled by any specific browser instance. This makes using Breeze for offline auditing close to useless.
Has anyone else tackled this problem yet? What are the best approaches to handling something like this. I've thought of several things, but none of them are ideal. Any ideas would be appreciated.
ADDED At Steve Schmitt's request:
Here are some helpful links:
Metadata
Entity Diagram (pdf) (and html and edmx)
The first query, just to populate the tags on the page runs quickly and downloads minimal data:
var query = breeze.EntityQuery
.from("Countries")
.orderBy("Name")
.expand("Regions.Districts.Seasons, Regions.Districts.Sites");
Once the user has select the Sites s/he wishes to cache, the following two queries are kicked off (used to be one query, but I broke it into two hoping it would be less of a burden on resources -- it didn't help). The first query (usually 2-3K entities and about 2MB) runs as expected. Some combination of the predicates listed are used to filter the data.
var qry = breeze.EntityQuery
.from("SeasonClients")
.expand("Client,Group.Site,Season,VSeasonClientCredit")
.orderBy("DistrictId,SeasonId,GroupId,ClientId")
var p = breeze.Predicate("District.Region.CountryId", "==", CountryId);
var p1 = breeze.Predicate("SeasonId", "==", SeasonId);
var p2 = breeze.Predicate("DistrictId", "==", DistrictId);
var p3 = breeze.Predicate("Group.Site.SiteId", "in", SiteIds);
After the first query runs, the second query (below) runs (also using some combination of the predicates listed to filter the data. At about 9MB, it will have about 50K rows to download). When the total download burden between the two queries is between 10MB and 13MB, browsers will crash.
var qry = breeze.EntityQuery
.from("Repayments")
.orderBy('SeasonId,ClientId,RepaymentDate');
var p1 = breeze.Predicate("District.Region.CountryId", "==", CountryId);
var p2 = breeze.Predicate("SeasonId", "==", SeasonId);
var p3 = breeze.Predicate("DistrictId", "==", DistrictId);
var p4 = breeze.Predicate("SiteId", "in", SiteIds);
Thanks for the interest, Steve. You should know that the Entity Relationships are inherited and currently in production supporting the majority of the organization's operations, so as few changes as possible to that would be best. Also, the hope is to grow this from a reporting application to one with which data entry can be done in the field (so, as I understand it, using projections to limit the data wouldn't work).
Thanks for the interest, and let me know if there is anything else you need.
Here are some suggestions based on my experience building on an offline capable web application using breeze. Some or all of these might not make sense for your use cases...
Identify which entity types need to be editable vs which are used to fill drop-downs etc. Load non-editable data using the noTracking query option and cache them in localStorage yourself using JSON.stringify. This avoids the overhead of coercing the data into entities, change tracking, etc. Good candidates for this approach in your model might be entity types like Country, Region, District, Site, etc.
If possible, provide a facility in your application for users to identify which records they want to "take offline". This way you don't need to load and cache everything, which can get quite expensive depending on the number of relationships, entities, properties, etc.
In conjunction with suggestion #2, avoid loading all the editable data at once and avoid using the same EntityManager instance to load each set of data. For example, if the Client entity is something that needs to be editable out in the field without a connection, create a new EntityManager, load a single client (expanding any children that also need to be editable) and cache this data separately from other clients.
Cache the breeze metadata once. When calling exportEntities the includeMetadata argument should be false. More info on this here.
To create new EntityManager instances make use of the createEmptyCopy method.
EDIT:
I want to respond to this comment:
Say I have a client who has bills and payments. That client is in a
group, in a site, in a region, in a country. Are you saying that the
client, payment, and bill information might each have their own EM,
while the location hierarchy might be in a 4th EM with no-tracking?
Then when I refer to them, I wire up the relationships as needed using
LINQs on the different EMs (give me all the bills for customer A, give
me all the payments for customer A)?
It's a bit of a judgement call in terms of deciding how to separate things out. Some of what I'm suggesting might be overkill, it really depends on the amount of data and the way your application is used.
Assuming you don't need to edit groups, sites, regions and countries while offline, the first thing I'd do would be to load the list of groups using the noTracking option and cache them in localStorage for offline use. Then do the same for sites, regions and countries. Keep in mind, entities loaded with the noTracking option aren't cached in the entity manager so you'll need to grab the query result, JSON.stringify it and then call localStorage.setItem. The intent here is to make sure your application always has access to the list of groups, sites, regions, etc so that when you display a form to edit a client entity you'll have the data you need to populate the group, site, region and country select/combobox/dropdown.
Assuming the user has identified the subset of clients they want to work with while offline, I'd then load each of these clients one at a time (including their payment and bill information but not expanding their group, site, region, country) and cache each client+payments+bills set using entityManager.exportEntities. Reasoning here is it doesn't make sense to load several clients plus their payments and bills into the same EntityManager each time you want to edit a particular client. That could be a lot of unnecessary overhead, but again, this is a bit of a judgement call.
#Jeremy's answer was excellent and very helpful, but didn't actually answer the question, which I was starting to think was unanswerable, or at least the wrong question. However #Steve in the comments gave me the most appropriate information for this question.
It is neither Breeze nor the Browser, but rather Knockout. Apparently the knockout wrapper around the breeze entities uses all that memory (at least while loading the entities and in my environment). As described above, Knockout/Breeze would crap out after reading around 5MB of data, causing Chrome to crash with over 1.7GB of memory usage (from a pre-download memory usage around 300MB). Rewriting the app in ANgularJS eliminated the problem. So far I have been able to download over 50MB from the exact same EF6 model into Breeze/Angular, total Chrome memory usage never went above 625MB.
I will be testing larger payloads, but 50 MB more than satisfies my needs for the moment. Thanks everyone for your help.

How to set max number of clients in sockjs connection

I am currently experimenting with sockjs. How can I set the maximum number of clients that can join a sockjs server?
I understand that i can achieve this by simply closing any new connection if the total number of connections is above x, but i dont really find it elegant. I was hoping that there was some inbuilt way of doing this.
I am currently achieving this by:
var numConnections = currentConnections.length;
console.log('\nNumber of Connections = ' + numConnections);
// check for number of connections
if (numConnections >=3) {
//disconnect the client
conn.end();
}
As counter is very application based thing, especially in horizontally scaled environments with load balanced multiple SockJS servers - it is up to you to implement the mechanics of counting.
It is totally fine to have such minimalistic counter like you did. If you will have multiple services, you can easily switch to using redis for storing count of clients and share that value across different processes.
My personal opinion - you should not worry about such simple things in early stages of development, and focus on real challenges and problems as they occur. Though is good to think, but experience will help you, just don't waste time on such little things.

Server-side highscores for a Javascript-written game

I'm implementing a simple game in Javascript, and am interested in having an online highscores table for it, so that players can compete against one another. I've two concerns about this:
What is the simplest server-side program I need for this purpose? I don't need a full-fledged "web application", just something simple that gets POST requests with highscores, updates a database and sends back lists of scores. I'm familiar with Django. What are your suggestions?
How can I make the highscores table reasonably secure? I'm aware that making it bulletproof against competent and dedicated hackers is difficult, but I wouldn't want anyone with access to the Javascript sourcecode to be able to submit fictitious scores too simply. Any tools for this purpose?
It's going to be pretty hard to secure the high scores. I mean, it's not enough to ensure that it comes from your page, because if, say, the JavaScript function is submitHighScore(n) then they can always type javascript:submitHighScore(10000000) in the address bar on that page and have it work.
What comes to mind is perhaps some sort of hash function that generates specific codes that match certain levels in the game. When submitting the score it would also submit this hash, so users would have had to get to this level in order to get that equivalent score.
Another option would be for the game to pull in some kind of key that only works temporarily, so as you went along the key would change and then the score would be submitted to a central server intermittently.
Keep in mind that really determined individuals can always just track the data being sent to your data and decompile it.
You could go the Broderbund route and ask the player trivia questions which are validated server-side to ensure they really did pass the level they said they did...something like "What color was the monster in the previous level?"
To submit score securely, sign it (you'd also need to ensure that the score isn't faked before it's signed and sent, but that's another problem).
Hide a secret in JS code, and send highscore + hash(highscore + secret) to the server. The hash could be MD5/SHA1 — there are easy to find JS implementations.
Of course it won't stand anyone carefully analysing the JS code, but at least someone won't be able to submit fake highscore just by tampering with HTTP traffic.
On hiding stuff in JS:
You can't really hide it (it's ultimately futile like DRM), but you can try hard to obfuscate it and make debugging painful.
Don't put the secret as a literal in the source code - compute it at run time combining results of several functions, local and global-ish variables.
Minify all code, remove sourcemaps.
Add bits of code that don't do anything, but seem to be important, to make debugging more confusing.
Don't put anything in global scope, but do rely on shared mutable state by passing closures and arrays around.
Rely on Date and timers to cause race conditions to make your code produce wrong results if it's paused in the debugger (just don't make it too tight to allow it to run on slow machines).
If the game is deterministic (like a puzzle game), then users could submit highscore in form of a log of steps taken to win (user's input) that you'd replay on the server to calculate the score.
This would change attack from finding/emulating score-submitting function to witing AI or hacking the game itself to make it easier to play (but still within its basic rules).
1.) Any CGI script that can talk to a database and understand JSON, or other format of your choice, will do the work.
However, if you're familiar with Django, building your server on top of Django would be the most simple, in the sense of what you have to learn and how much application code you have to write. Seemingly simple CGI script can turn out rather complex if you write it from scratch.
I found django-piston to be a handy Django app to quickly write a REST-style API server. It supports JSON so it should be easy to interface with your JavaScript game.
2.) The most casual cracker will go for a replay attack and its variants: peek at the page source and execute a JavaScript function, intercept HTTP requests and resend it (should be easy with a Firefox add-on like Tamper Data).
To counteract the former, you can obfuscate the source code and HTTP body;
Minify the JavaScript code
Encode the message you send to the server with Base64 or other encoding algorithm
The latter can be prevented by requiring all update requests to include an one-time password ("session token" in the Wikipedia article) that was recently acquired from the server.
I am thinking about this myself. What seems to be the most reasonable solution to me is this:
1) Sessions, to disallow tampering with the scoretable outside the game.
2) Log every action in the game and send it to the score server. The server will then calculate if those actions actually give such score. If you also log the time spent playing the game, you can further minimize the chance of an attacker to bother himself enough to break your game. This will also enable you to make a replay script like Arcade servers with hi-score tables have and in case of a suspicious score, you can watch the replay and decide for yourself if the score is real. The cheater would have to use a clever bot to play your game - and unless you have a game for real prizes, noone will try that hard.
If the cheater won't even analyze your code, sessions will stop him. If he reads your code, he would quickly break anything similar to hashed scores, secrets, tokens and whatsoever. But if you make the game-logging script thorough enough, he will give up.
In answer to your question:
1.) This depends on your environment and coding preference. PHP, Python, ASP.NET are a few that come to mind. Sense you already know Python (from your profile) you can use a Python CGI script to do this or use one of the many frameworks for Python (Zope, Django, Pylons,...).
see: http://www.python.org/doc/essays/ppt/sd99east/index.htm
for info on Python CGI.
2.) A few tricks for security: (none or full-proof)
Hidden Text Box in HTML with encoded value that server checks to match a cookie to ensure high score comes from your page.
Server Script only accepts values from a specific domain
You could use a combination of one of the methods above, as well as simply requiring the user to be registered to be able to post high scores. Non registered users could view their current score compared to existing high scores, but in order to post your high score online, you must have already logged in with your registered account, or provide it when the app goes to update the score online.
A simple message along the lines of "Your high score is X, and ranks ### in the high score table. To post this score online, please register with us first".
The better I think, is to make the calculation of the score directly into the python files of your django app.
Instead of calculate it in the javascript file. You send the datas to compare with your data basis with a POST request, then you calculate the score and you store it in your data basis. Like this, you don't make circulate the score across the web to your servers. Which is completely safe because you are only doing local stuffs.

Ajax "Is there new content? If so, update page" - How to do this without breaking the server?

It's a simple case of a javascript that continuously asks "are there yet?" Like a four year old on a car drive.. But, much like parents, if you do this too often or, with too many kids at once, the server will buckle under pressure..
How do you solve the issue of having a webpage that looks for new content in the order of every 5 seconds and that allows for a larger number of visitors?
stackoverflow does it some way, don't know how though.
The more standard way would indeed be the javascript that looks for new content every few seconds.
A more advanced way would use a push-like technique, by using Comet techniques (long-polling and such). There's a lot of interesting stuff under that link.
I'm still waiting for a good opportunity to use it myself...
Oh, and here's a link from stackoverflow about it:
Is there some way to PUSH data from web server to browser?
In Java I used Ajax library (DWR) using Comet technology - I think you should search for library in PHP using it.
The idea is that server is sending one very long Http response and when it has something to send to the client it ends it and send new response with updated data.
Using it client doens't have to ping server every x seconds to get new data - I think it could help you.
You could make the poll time variable depending on the number of clients. Using your metaphor, the kid asks "Are we there yet?" and the driver responds "No, but maybe in an hour". Thankfully, Javascript isn't a stubborn kid so you can be sure he won't bug you until then.
You could consider polling every 5 seconds to start with, but after a while start to increase the poll interval time - perhaps up to some upper limit (1 minute, 5 minute - whatever seems optimal for your usage). The increase doesn't have to be linear.
A more sophisticated spin (which could incorporate monzee's suggestion to vary by number of clients), would be to allow the server to dictate the interval before next poll. The server could then increase the intervale over time, and you can even change the algorithm on the fly, or in response to network load.
You could take a look at the 'Twisted' framework in python. It's event-driven network programming framework that might satisfy what you are looking for. It can be used to push messages from the server.
Perhaps you can send a query to a real simple script, that doesn't need to make a real db-query, but only uses a simple timestamp to tell if there is anything new.
And then, if the answer is true, you can do a real query, where the server has to do real work !-)
I would have a single instance calling the DB and if a newer timestamp exists, put that new timestamp in a application variable. Then let all sessions check against that application variable. Or something like that. That way only one innstance are calling the sql-server and the number of clients does'nt matter.
I havent tried this and its just the first idéa on the top of the head but I think that cashe the timestamp and let the clients check the cashe is a way to do it, and how to implement the cashe (sql-server-cashe, application variable and so on) I dont know whats best.
Regarding how SO does it, note that it doesn't check for new answers continuously, only when you're typing into the "Your Answer" box.
The key then, is to first do a computationally cheap operation to weed out common "no update needed" cases (e.g., entering a new answer or checking a timestamp) before initiating a more expensive process to actually retrieve any changes.
Alternately, depending on your application, you may be able to resolve this by optimizing your change-publishing mechanism. For example, perhaps it might be feasible for changes (or summaries of them) to be put onto an RSS feed and have clients watch the feed instead of the real application. We can assume that this would be fairly efficient, as it's exactly the sort of thing RSS is designed and optimized for, plus it would have the additional benefit of making your application much more interoperable with the rest of the world at little or no cost to you.
I believe the approach shd be based on a combination of server-side sockets and client-side ajax/comet. Like:
Assume a chat application with several logged on users, and that each of them is listening via a slow-load AJAX call to the server-side listener script.
Whatever browser gets the just-entered data submits it to the server with an ajax call to a writer script. That server updates the database (or storage system) and posts a sockets write to noted listener script. The latter then gets the fresh data and posts it back to the client browser.
Now I haven't yet written this, and right now I dunno whether/how the browser limit of two concurrent connections screws up the above logic.
Will appreciate hearing fm anyone with thoughts here.
AS

Categories