Update: I am getting the impression that this is not even the right website to post this. If someone can point me in the right direction, I'd be appreciative...
I have an existing PHP+MySQL application that wasn't built to render "real-time" or similarly live-style data. But now I need to build in a way to pull nearly real-time data into the application and keep the data on the page fresh. This live data is only for 1 page in the application.
Looked at things like socket.io and PHP-based websockets libraries, but it seemed like overkill because the data is basically coming from 1 source and being delivered to 1 person (the client). Multiple other users could have this process running, but each one would bring their own data endpoint. That's... like a year down the road. But good to think about. Would ideally have hundreds, or thousands of users on the system, pulling their live-ish data. So I want this to be as streamlined and low-impact as possible.
Users must be authenticated and authorized to consume the data. This is already baked into the current system.
The API to get the data (which has already been built by another vendor) is also NOT streaming. It's set on a 20-second cron, so the new data is available every 20 seconds, which satisfies the client's needs.
My current plan is to do something like this...
Data is pulled on a cron every 20 seconds, organized, and stored into the database (complete)
Adjust #1 so it also does any additional proprietary calculations on data AND compiles + writes a JSON file on the server (unique to the user) which is the exact data needed for the front end (DB data is needed for other pages)
Create small PHP-based service which validates a client-provided JWT and reads the JSON file out
Write AJAX front end to poll endpoint from #3 every X seconds using a JWT for authorization
This all seems sort of like I might be reinventing the wheel, or missing something. The fact that this is an existing PHP based application (LAMP) does have some limiting factors, but I feel like there's got to be a more efficient way to handle this... It's pretty new to me. Also, I'm open to other technologies that'll run on the LAMP stack, if it'll make things better.
I would say go for the API solution in the beginning :) Since it fits the architecture more and is for sure the least amount of work. Also if there will be problem with the "live" feeling of the data you can fix it by polling more often or introducing long polling, assuming you change the cron job time.
I mean in the end it is all about impact for the time spent, don't start implement features that customers don't care about :)
The biggest problem to solve is to implement it in a way that fits your requirements and is somewhat future extendable. You still have to deal with issues like resolution, time outs, reducing server processing when requesting data and so on!
For me, if you need to maintain a global service state because a single client(s) request could affect all other connected client request(s) then most all server-side scripting languages are not the best choice! Also to further add, if you plan on implementing something like this with PHP, you will be setting your self up for a living nightmare! Why, because simply put, PHP(s) socket(s) implementation is that bad!
I have been working on something using AJAX that sometimes requires a couple POST's per second. Each post returns a JSON object (generated by the PHP file being posted to, ~11,000 bytes) and on average the latency is between 30ms and 250ms depending on if i'm on wifi or wired, but every roughly 1/15 calls it spikes up to about 4000ms. I am trying to find a way around this, as of right now I see two options:
Throw a timeout on the AJAX call, and have it call a GET on fail (the POST should still go through, it's the return trip that always times out) or...
Cut the entire thing down, learn node.js so I can use websockets to potentially rectify this issue.
Either solution as far as I can see is based on WHY the original call is failing. If it is something wrong with the AJAX call, then a new GET should be likely to go through, and solve the issue. but if it is something with the server itself then logically the GET would just time out as well, it's an issue with the server, and I'm dead in the water.
Since I have no experience yet at all with websockets I was hoping for some feedback on the best action to take next. Thanks for the input.
If it would help, I can very likely reduce the returning payload to 1/15 the size with some sneaky coding. would that make an impact?
WebSockets are a great option! Actually working with SocketIO is pretty simple and has a shallow learning curve.
Because the connection stays open, your requests skips the DNS lookup and routing for lower latency. This mean much lower overhead for each POST request you make.
If you ever forsee pushing data to your users, WebSockets are the de facto way to do it. Ajax polling is going out of style.
That said, you would have to port your back end logic to JavaScript. You would have to change your deployment strategy to a server that supports Node apps. You would have to deal with learning a new environment -- can also have some overhead.
Before you explore Node, consider the drawbacks above. I think it is a great bit of technology, but I would also look into the following approaches, especially if you are pressed for time.
1/15 size reduction is totally worth. Actually, it is worth it in both cases.
Can you do any sort of batching with your POST requests from the client side? If subsequent requests rely on the results of the previous POST request, you cannot do this. In this case, I strongly suggest using WebSockets.
All in all, there are always tradeoffs. If you aren't pressed for time, given Node and SocketIO and whirl, they are becoming very prevalent web technologies and are worth learning.
Cut the entire thing down, learn node.js so I can use websockets to potentially rectify this issue.
There is no point doing things with wrong tools. If you need real-time communication, use servers which support it out-of-the box, like node.js (probably the simplest to get into from PHP).
Since I have no experience yet at all with websockets
Get some framework atop of the raw websockets, like primus or socket.io and good luck ;)
I came across a site that does something very similar to Google Suggest. When you type in 2 characters in the search box (e.g. "ca" if you are searching for "canon" products), it makes 4 Ajax requests. Each request seems to get done in less than 125ms. I've casually observed Google Suggest taking 500ms or longer.
In either case, both sites are fast. What are the general concepts/strategies that should be followed in order to get super-fast requests/responses? Thanks.
EDIT 1: by the way, I plan to implement an autocomplete feature for an e-commerce site search where it 1.) provides search suggestion based on what is being typed and 2.) a list of potential products matches based on what has been typed so far. I'm trying for something similar to SLI Systems search (see http://www.bedbathstore.com/ for example).
This is a bit of a "how long is a piece of string" question and so I'm making this a community wiki answer — everyone feel free to jump in on it.
I'd say it's a matter of ensuring that:
The server / server farm / cloud you're querying is sized correctly according to the load you're throwing at it and/or can resize itself according to that load
The server /server farm / cloud is attached to a good quick network backbone
The data structures you're querying server-side (database tables or what-have-you) are tuned to respond to those precise requests as quickly as possible
You're not making unnecessary requests (HTTP requests can be expensive to set up; you want to avoid firing off four of them when one will do); you probably also want to throw in a bit of hysteresis management (delaying the request while people are typing, only sending it a couple of seconds after they stop, and resetting that timeout if they start again)
You're sending as little information across the wire as can reasonably be used to do the job
Your servers are configured to re-use connections (HTTP 1.1) rather than re-establishing them (this will be the default in most cases)
You're using the right kind of server; if a server has a large number of keep-alive requests, it needs to be designed to handle that gracefully (NodeJS is designed for this, as an example; Apache isn't, particularly, although it is of course an extremely capable server)
You can cache results for common queries so as to avoid going to the underlying data store unnecessarily
You will need a web server that is able to respond quickly, but that is usually not the problem. You will also need a database server that is fast, and can query very fast which popular search results start with 'ca'. Google doesn't use conventional database for this at all, but use large clusters of servers, a Cassandra-like database, and a most of that data is kept in memory as well for quicker access.
I'm not sure if you will need this, because you can probably get pretty good results using only a single server running PHP and MySQL, but you'll have to make some good choices about the way you store and retrieve the information. You won't get these fast results if you run a query like this:
select
q.search
from
previousqueries q
where
q.search LIKE 'ca%'
group by
q.search
order by
count(*) DESC
limit 1
This will probably work as long as fewer than 20 people have used your search, but will likely fail on you before you reach a 100.000.
This link explains how they made instant previews fast. The whole site highscalability.com is very informative.
Furthermore, you should store everything in memory and should avoid retrieving data from the disc (slow!). Redis for example is lightning fast!
You could start by doing a fast search engine for your products. Check out Lucene for full text searching. It is available for PHP, Java and .NET amongst other.
I am designing a web site in which users solve puzzles as quickly as they can. JavaScript is used to time each puzzle, and the number of milliseconds is sent to the server via AJAX when the puzzle is completed. How can I ensure that the time received by the server was not forged by the user?
I don't think a session-based authenticity token (the kind used for forms in Rails) is sufficient because I need to authenticate the source of a value, not just the legitimacy of the request.
Is there a way to cryptographically sign the request? I can't think of anything that couldn't be duplicated by a hacker. Is any JavaScript, by its exposed, client-side nature, subject to tampering? Am I going to have to use something that gets compiled, like Flash? (Yikes.) Or is there some way to hide a secret key? Or something else I haven't thought of?
Update: To clarify, I don't want to penalize people with slow network connections (and network speed should be considered inconsistent), so the timing needs to be 100% client-side (the timer starts only when we know the user can see the puzzle). Also, there is money involved so no amount of "trusting the user" is acceptable.
You can't guarantee the security of the timings cryptographically, because the client's browser can't do secure computation. Any means for encrypting to/from the server could be bypassed by adjusting the actual timings.
And timing on the server doesn't work, either - if you don't take account of latency in the round-trip-time, users with lower latency connections will have an advantage; if you do, users could thwart the compensation phase by adding extra latency there and then removing it later.
You can, of course make it difficult for the users to modify this, but security by obscurity is an unsustainable policy anyway.
So it comes down to either trusting your users somewhat (a reasonable assumption, most of the time) and designing the game so it's not trivial to circumvent the timings.
This approach obviously makes assumptions and is not invincible. All calculations are done on the client, and the server does some background checks to find out if the request could have been forged. Like any other client-based approach, this is not deterministic but makes it very hard for a lying client.
The main assumption is that long-lived HTTP connections are much faster for transmitting data, even negligible in some cases depending on the application context. It is used in most online trading systems as stock prices can change multiple times within a second, and this is the fastest way to transmit current price to users. You can read up more about HTTP Streaming or Comet here.
Start by creating a full-duplex ajax connection between the client and server. The server has a dedicated line to talk to the client, and the client can obviously talk to the server. The server sends the puzzle, and other messages to the client on this dedicated line. The client is supposed to confirm the receipt of each message to the server along with its local timestamp.
On the server generate random tokens (could be just distinct integers) after the puzzle has been sent, record the time when each token was generated, and pass it over to the client. The client sees the message, and is supposed to immediately relay this token back along with it's local time of receipt. To make it unpredictable for the client, generate these server tokens at random intervals, say between 1 and n ms.
There would be three types of messages that the client sends to the server:
PUZZLE_RECEIVED
TOKEN_RECEIVED
PUZZLE_COMPLETED
And two types of messages that the server sends to the client:
PUZZLE_SENT
TOKEN_SENT
There could be a lot of time variation in the messages send from the client to the server, but much lesser in the other direction (and that's a very fair assumption, hey - we have to start somewhere).
Now when the server receives a receipt to a message it sent, record the client time contained in that message. Since the token was also relayed back in this message, we can match it with the corresponding token on the server. At the end of the puzzle, the client sends a PUZZLE_COMPLETED message with local time to the server. The time to complete the puzzle would be:
PUZZLE_COMPLETED.time - PUZZLE_RECEIVED.time
Then double check by calculating the time difference in each message's sent vs received times.
PUZZLE_RECEIVED.time - PUZZLE_SENT.time
TOKEN_RECEIVED.time - TOKEN_SENT.time
A high variance in these times implies that the response could have been forged. Besides simple variance, there is lots of statistical analysis you can do on this data to look for odd patterns.
Even a compiled application could be forged. If the user changes their system clock halfway through timing, your application will report an incorrect time to the server. The only way to get an accurate upper-bound on the time it takes them is to start timing on the server when the puzzle is given to them, and to stop timing when they supply the answer.
As others have pointed out you can minimise the effect that slow connections have by making the load of the puzzle as small as possible. Load the entire page and "game engine" first, and then use an asynchronous request to load the puzzle itself (which should be a small amount of data) to level the playing field as much as possible.
Unfortunately you can't do latency compensation as this would be open to tampering. However, on a connection that's not being used for anything else, the latency for a request like this would be greatly overshadowed by the time it takes a human to solve a puzzle, I don't think it will be a big deal.
(Reasoning: 200ms is considered very bad lag, and that's the average human reaction time. The shortest possible "puzzle" for a human to complete would be a visual reaction speed test, in which case bad lag would have a 100% markup on their results. So as a timing solution this is 2-OPT. Any puzzle more complex will be impacted less by lag.)
I would also put a banner on the page saying to not use the internet connection for anything else while playing for the best possible speeds, possibly linking to a speed / latency tester.
It is impossible to start and stop the timer at the client-side without fear of manipulation...
Anything you perform at the client can be altered / stopped / bypassed..
encrypting/decrypting at the client is also not safe since they can alter the info before the encryption occurs..
Since it involves money, the users can not be trusted..
The timing has to start at the server, and it has to stop at the server..
Use ajax to start the timer at the server only if the puzzle contents return with the result of the ajax call. do not load the puzzle and then sent an ajax request as this could be hijacked and delayed while they review the puzzle...
..
Depending on the server side implementation you have, you could put the timing functionality on the server side. Record the time that the webpage request was made (you could put that into a database if you liked) and then when the answer is received get the current time and perform some arithmetic to get the duration of the answer. You could store the time in the session object if you liked instead of the database as well although I don't know too much about its integrity in there.
You have to use server-side time here. Here is how I would do it:
Make an AJAX request on document ready to ping the server. When server-side code receives the ping, store the server-side time as a session variable (making sure the variable does not already exist). When they finish the quiz, take the server-side time again and compare it with the session variable to determine their duration. Remove the session variable.
Why this works:
You do not start the timer before they see the quiz
The network delay is factored in, because the timer does not start until the AJAX request comes in (if they have a slow connection, the AJAX request will be slow)
Ping is not spoofable because you make sure the session variable does not exist before storing
EDIT: I wanted to add that you could continue to keep client-side time, and include it in the final post. Then you can compare it with your server-side calculated time. If they are reasonably close, then you can trust the client time.
You asked a bunch of questions in your original question, I'm only going to answer one of them:
Am I going to have to use something that gets compiled, like Flash? (Yikes.)
Yes. Given your criteria: 1) 100% accurate, and 2) No possibility of user interference, you have to use a compiled binary.
Doesn't have to be flash though - I'd suggest a java applet if the thought of Flash makes you say "Yikes".
-- Edit:
This solution is somewhat flawed, as pointed out by ZoFrex below.
-- Old:
Here is a way (but you'll need to do some profiling).
Send down a series of "problems" for the JavaScript to solve, while they are playing the puzzle. Previously, I've sufficiently-sized number N such that it is the result of: prime1 * prime2. This forces the client to factor the number (you can get code to do this in JavaScript) and this will take time (this is where profiling clients comes in, and sending down appropriately-sized primes [obviously, this opens you to degradation-attacks, but nevertheless]).
Then, you just send down say, 500, of these prime-problems (or another type), and let the JavaScript solve them in the background. It will generate a list of solutions, and when you send the completed value, you also send this list. From the total count of answers supplied, you can determine how long they spent on the puzzle.
Cons:
Requires profiling to determine capabilities of various clients (and hence difficulty of problems)
Can be downgrade-attacked
Slightly complicated
JavaScript computation may interrupt general puzzle-solving
Possible to write a bot to get solve problems faster than JS
Pros:
Calculations must be done in order to submit the form
If implemented correctly, will prevent all but non-trivial attacks
Clearly, it's attackable, (all proposed answers are), but I think it's reasonable. At least, it would be fun to work on :)
In the end, though, you need to actually install a client-side system with a bit more security. And do note that Flash certainly is not this; it's trivial to decompile. Infact, there was an IQ test here in Australia once, and it was controlled via a Flash app that was done LIVE on television. Of course, the winner was a computer programmer, I wonder why :P
-- Edit:
OP, Also, I linked it in a comment to this post, but just incase you miss it, you are kind of interested in the Hashcash, which is the aim to show that a client has completed some amount of 'Work'. Even if my implementation isn't suitable, you may find a review of that field fruitful.
It's a tricky problem because it's fundamentally unsolvable, so you need to work around the tradeoffs to do your best. There've been several good points made on the technical side including: (a) don't waste your time thinking compiling to Flash, Windows, Silverlight, JVM, or anything will actually help, (b) first transmit the encrypted real puzzle payload, then as the actual bottleneck transmit the key alone, (c) the latency even on 56k of sending a couple hundred bytes is negligible compared to human reaction time.
One thing I haven't seen mentioned is this:
Use after-the-fact auditing and tracking of users. This is how real casinos work. This is, I am told, a big part of how PayPal made their money. In both cases, rather than doing a lot of before-the-fact security, they keep very close tabs on everything about their players, use a lot of technology (statistics, pattern detection, etc) to flag suspicious players and they investigate. In both casinos and PayPal, you can't get your money right away. You have to cash in your chips or wait for the money to make it out of the PayPal system into your real bank. Your system should work the same way-- they can't actually get the money for at least a few days at minimum (longer if you are unable to set up a sufficiently speedy auditing system), giving you time to potentially impound their winnings. You have a lawyer, right? Also, casinos and PayPal know your real life identity (a must for dealing in money) so they can go after you legally-- or more importantly, deter would-be attackers since they could go after them legally.
Combined with the other tips, this may be sufficient to eliminate cheating entirely.
If you find it is not, make your goal not to totally eliminate cheating but to keep it to an acceptable level. Kind of like having 99.99% uptime. Yes, as one poster said, if even one person can compromise it everyone is screwed, but with a good auditing system the attacker won't be able to consistently cheat. If they can cheat 1 in 1000 times if they're lucky and they find they can't cheat more than once or twice before being caught, it won't be a problem since very few will cheat and any given honest user will have an extremely low chance of being affected by an extremely small amount of cheating. It'll be imperceptible. If you ever have a real cheating occurence that hurts an honest user, go out of your way to make the honest user feel satisfied with the outcome to a degree out of proportion to the value of that single customer. That way everyone will be confident in your security. They know they don't have to worry about anything.
People problems are not always solvable with technology solutions alone. You sometimes need to use people solutions. (The technology can help those people solutions work a lot better though.)
excuse me but why you can't use the time on the server? the time when you recieve the response will be the one which you use to calculate the score.
As several others have pointed out:
You must use server time, because client time is vulnerable to manipulation.
Checking the time on the server will potentially penalize people with slow network connections, or people that are far away.
The answer to the problem is to use a time synchronization protocol between the client and the server similar to the protocol that NTP uses. Working together, the client and the server determine the amount of delay caused by network latency. This is then factored into the times given to each user.
NTP's algorithms are complicated and have been developed over years. But a simple approach is below; I think that the protocol should work, but you may wish to test it.
Have the client measure the round-trip time with two successive HTTP XMLRPC pings. Each ping returns a different nonce. The second ping requires the nonce from the first ping, which assures that they are sequential. The puzzle time starts when the second HTTP ping is sent from the client. The server timestamps each request and assumes that the puzzle is displayed 1/2 way between the receipt of the first and the second request.
When the puzzle is finished the client pings twice again, following the same protocol as before. The server knows when it receives each request and it knows the time delta. Now take half the time delta and subtract that from when the first ping of the second set is received. That can be safely assumed to be the time that the puzzle was completed.
there is a very fast implementation of cryptography in js here
http://crypto.stanford.edu/sjcl/
it allows public / private key encryption all on the client and I think you can adapt it to encrypt the Ajax communication between your server and the client browser
here is a detailed explanation, which you can adapt to your needs
http://crypto.stanford.edu/sjcl/#usage
Just a quick thought: why don't you use an iFrame to include the game and it's javascripts and let them reside on the server you have your server side implementation running. Any ajax request should then be sent by the same IP as your server side IP is which would solve the problem of identifying the source. Of course you have to take further measures but already gained a lot of confidence in your "client" side requests. Remember the windowsLive services login and many more like it are based on javascript and the usage of iFrames and are considered secure enough.
I do not think there is a perfect solution. Here is an alternative that makes it harder for cheater but at the same time an unlucky honest solver may lose out.
Get many samples of roundtrip time measurements from each specific devices/location/other combination apriori for each user based on their other interaction with your site. You will also have these measurements for the entire population. You could also be very subtle get the timestamps for when a particular DNS lookup happened from their ISP's resolver (random hostname and you host the authoritative DNS server for that domain).
Once you have this, perform all measurements on the server side (puzzle returned to user, solution received) and subtract out the network time based on previous observations.
Note that even in other solutions you have server load, client load (slow processor, etc.), etc that affect timing.
Make sure you have XSRF protection on puzzle submission page :)
The way I would do this is that when the server sends the puzzle to the client, the current time is stored in a session. This ensures that the timing starts immediately after the puzzle has been sent. After the puzzle has been completed, and is sent to the server to check if the puzzle was done right, the server again checks the time and does a comparison.
Obviously slow Internet connections can make this time bigger, but there's nothing you can do about it.
I am working on a simple notification service that will be used to deliver messages to the users surfing a website. The notifications do not have to be sent in real time but it might be a better user experience if they happened more frequently than say every 5 minutes. The data being sent to and from the client is not very large and it is a straight forward database query to retrieve the data.
In reading other conversations on the topic it would appear that an AJAX push can result in higher server loads. Since I can tolerate longer server delays is it worth while to have the server push notifications or to simply poll.
It is not much harder to implement the push scenario and so I thought I would see what the opinion was here.
Thanks for your help.
EDIT:
I have looked into a simple AJAX Push and implemented a simple demo based on this article by Mike Purvis.
The client load is fairly low at around 5k for the initial version and expected to stay that way for quite some time.
Thank you everyone for your responses. I have decided to go with the polling solution but to wrap it all within a utility library so that if they want to change it later it is easier.
I'm surprised noone here has mentioned long-polling. Long polling means keeping an open connection for a longer period (say 30-60 seconds), and once it's closed, re-opening it again, and simply having the socket/connection listen for responses. This results in less connections (but longer ones), and means that responses are almost immediate (some may have to wait for a new polling connection). I'd like to add that in combination with technologies like NodeJS, this results in a very efficient, and resource-light solution, that is 100% browser compatible across all major browsers and versions, and does not require any additional tech like Comet or Flash.
I realize this is an old question, but thought it might still be useful to provide this information :)
Definitely use push its much cooler. If you just want simple notifications I would use something like StreamHub Push Server to do the heavy-lifting for you. Developing your own Ajax Push functionality is an extremely tricky and rocky road - you have to get it working in all browsers and then handle firewalls and proxies killing keep-alive connections etc... Why re-invent the wheel. Also, it has a similarly low footprint of less than 10K so it should suit if that is a priority for you.
Both have diferent requirements and address diferent scenarios.
If you need realtime updates, like in an online chat, push is a must.
But, if the refresh period is big, as it is in your case (5 minutes), then pool is the appropriate solution. Push, in this case, will require a lot of resource from both the client and the server.
Tip! try to make the page that checks the pool fast and clean, so it doesn't consumes a lot of resources in the server in each request. What I usually do is to keep a flag in memory (like in a session variable) that says if the pool is empty or not... so, I only do havy look in the pool only if it is not empty. When the pool is empty, which is most of the time, the page request runs extremely fast.
Because using a push requires an open HTTP connection to be maintained between your server and each client, I'd go for poll as well - not only is that going to consume a lot of server resources but it's also going to be significantly more tricky to implement as matt b mentioned.
My experience with polling is that if you have a frequent enough polling interval on a busy enough site your web server logs can get flooded with poll requests real quickly.
Edit (2017): I'd say your choices are now are between websockets and long polling (mentioned in another answer). Sounds like long polling might be the right choice based on the way the question mentions that the notifications don't need to be received in real time, an infrequent polling period would be pretty easy to implement and shouldn't be very taxing on your server. Websockets are cool and a great choice for many applications these days, sounds like that might be overkill in this case though.
I would implement a poll just because it sounds simpler to write, and keeping it simple is very valuable.
Not sure if you have taken a look at some of the COMET implementations out there (is that what you mean by AJAX push).
If the user is surfing the site, won't that in effect be requesting information from the server that this notification can piggy-back on?
It's impossible to say whether polling will be more expensive then pushing without knowing how many clients you'll have. I'd recommend polling because:
It sounds like you want to update data about once per minute. Unless notifications are able to arrive at a much faster rate than that, pushing would mean you're keeping an HTTP connection open but seeing very little activity on it.
Polling is built on top of existing HTTP conventions, so any server that talks to web browsers is already ready to respond to ordinary Ajax requests. A Comet– or Flash socket–based solution has different requirements; you'll need something like cometd on the server side and a client-side library that groks server-side push.
So if you needed something heavy-duty to manage a torrent of data and a crapload of clients, I'd recommend Comet. But that doesn't seem to be the case.
There's now a service http://pusherapp.com that is trying to solve this problem once and for all, in a blink. Might be worth checking out. (disclaimer: i am in no way associated with them).
I haven't tried it myself, but some say COMET works and is easier than you think. There's also a Ruby on Rails plug-in called Juggernaut that I've heard talked about highly. Again, I haven't used it, so YMMV, but my understanding is that it takes far fewer resources compared to polling. I believe (can someone confirm?) that COMET is how MacRumorsLive.com delivers live blogging of WWDC Stevenotes.