I'm developing a Facemash-like rating system. Recently I decided to switch from simple voting links (href=vote.php?v=left or right while id's were stored in $_SESSION) to AJAX. Then I realised that it's extremely vurnelable. In a browser console cheater can invoke an infinite loop that checks picture url then, if it matches some specified url, votes for it, and if it doesn't match, votes for just left or right. Is there any way to prevent this besides obviously not implementing ajax voting at all? Maybe there is some command to break the loop in console or something?
This is just a specific example of a general principle: You can't trust anything sent to the server from the client. You must implement checks at the server end to rate-limit (or just limit), validate, etc., because people can send you any information they like (they don't even have to be using a web browser to do it).
In this case, if you're letting people vote on things, you have to limit how many times an individual can vote on a specific item (and probably rate-limit how often they can vote, period).
Related
I've created a subscription-based system that deals with a large data-set. In its first iteration, it had semi-complicated joins that would execute, based on user-set filters, on every 'data view' page. Each query would fetch anywhere from a few kilobytes to several megabytes depending on the filter range. I decided this was unacceptable and so learned about APC (I had heard about its data-store features).
I moved all of the strings out of the queries into an APC preload routine that fires upon first login. In the same routine, I am running the "full set" join query to get all of the possible IDs for the data set into a $_SESSION variable. The entire set is anywhere from 100-800Kb, depending on what data the customer is subscribed to.
I convert this set into a JSON array and shuffle the data around dynamically when the user changes the filters. In creating the system I wanted it to seem as if the user was moving around lots of data very quickly, with minimal page loading (AJAX + APC when string representations are needed), as they played with the filters.
My multipart question is, is it possible for the user to effectively "cancel" the initial cache/query routine by surfing to another page after the first login? If so, can I move this process to an AJAX page for preloading, or does this carry the same problem? Or, am I just going about all of this in the wrong way? I came up with the idea on my own and I'm worried that I've created an unusable monster.
Also, I've been warned that my questions suck and I'm in danger of being banned. Every question I've asked has come from a position of intelligent wonder, written as well as I knew how at the time, and so it's really aggravating when an outsider votes me down without intelligent criticism. Just tell me what I did wrong and I will quickly fix the problem. Bichis.
I'm integrating an external application to SharePoint 2010 by developing custom ribbon tabs, groups, controls and commands that are made available to editors of a SharePoint 2010 site. The ribbon commands use the dialog framework to open dialogs with custom application pages.
In order to pass a number of query string parameters to the custom applications pages, I'm therefore looking for the equivalent of SPContext.Current.ListItem in the Client Object Model (ECMAScript).
Regarding available tokens (i.e. {ListItemId} or {SelectedItemId}) that can be used in the declarative XML, I already emitting all tokens, but unfortunately the desired tokens are not either not parsed or simply null, while in the context of a Publishing Page (i.e. http://domain/pages/page.aspx). Thus, none of the tokes that do render, are of use to establishing the context of the calling SPListItem in the application page.
Looking at the SP.ClientContext.get_current() provides a lot of information about the current SPSite, SPWeb etc. but nothing about the current SPListItem I'm currently positioned at (again, having the page rendered in the context of a Publishing Page).
What I've come up with so far is the idea of passing in the url of the current page (i.e. document.location.href) and parse that in the application page - however, it feels like I'm going in the wrong direction, and SharePoint surely should be able to provide this information.
I'm not sure this is a great answer, or even fully on-topic, but is basically something I originally intended to blog about - anyway:
It is indeed a pain that the Client OM does not seem to provide a method/property with details of the current SPListItem. However, I'd venture to say that this is a simple concept, but actually has quite wide-ranging implications in SharePoint which aren't apparent until you stop to think about it.
Consider:
Although a redirect exists, a discussion post can be surfaced on 2 or 3 different URLs (e.g. Threaded.aspx/Flat.aspx)
Similarly, a blog post can exist on a couple (Post.aspx/EditPost.aspx, maybe one other)
A list item obviously has DispForm.aspx/EditForm.aspx and (sort of) NewForm.aspx
Also for even for items with an associated SPFile (e.g. document, publishing page), consider that these URLs represent the same item:
http://mydomain/sites/someSite/someLib/Forms/DispForm.aspx?ID=x, http://mydomain/sites/someSite/someLib/Filename.aspx
Also, there could be other content types outside of this set which have a similar deal
In our case, we wanted to 'hang' data off internal and external items (e.g. likes, comments). We thought "well everything in SharePoint has a URL, so that could be a sensible way to identify an item". Big mistake, and I'm still kicking myself for falling into it. It's almost like we need some kind of 'normalizeUrl' method in the API if we wanted to use URLs in this way.
Did you ever notice the PageUrlNormalization class in Microsoft.SharePoint.Utilities? Sounds promising doesn't it? Unfortunately that appears to do something which isn't what I describe above - it doesn't work across the variations of content types etc (but does deal with extended web apps, HTTP/HTTPS etc).
To cut a long story short, we decided the best approach was to make the server emit details which allowed us to identify the current SPListItem when passed back to the server (e.g. in an AJAX request). We hide the 'canonical' list item ID in a JavaScript variable or hidden input field (whatever really), and these are evaluated when back at the server to re-obtain the list item. Not as efficient as obtaining everything from context, but for us it's OK because we only need to resolve when the user clicks something, not on every page load. By canonical, I mean:
SiteID|WebID|ListID|ListItemID
IIRC, one of the key objects has a CanonicalId property (or maybe it's internal), which may help you build such a string.
So in terms of using the window.location.href, I'd avoid that if you're in vaguely the same situation as us. Suggest considering an approach similar to the one we used, but do remember that there are some locations (e.g. certain forms) where even on the server SPContext.Current.ListItem is null, despite the fact that SPContext.Current.Web (and possibly SPContext.Current.List) are populated.
In summary - IDs are your friend, URLs are not.
Can anyone share best practices for troubleshooting google anlytics code?
Has anyone built a debugging tool? Does google have a linter hidden somewhere? Does anybody have a good triage logic diagram?
I'll periodically set up different parts of GA and it seems like every time I do it takes 4 or 5 days to get it working.
The workflow looks like this:
Read the docs on the feature (e.g. events, custom variables).
Implement what appears to be the correct code based on the docs.
Wait a day.
See no data.
Google every version of the problem I can imagine. Find what may be a solution.
Change my code.
Wait a day.
See no data.
Loop:
Randomly move elements of the tracking code around.
Wait a day.
If other parts break, tell ceo, get yelled at, revert changes.
If data appears, break.
Pray it continues to work/I never have to change the tracking code again.
For obvious reasons, I'm not satisfied with this workflow and hoping someone has figured out something I haven't.
Everything I do, debugging GA code, stops and starts with the Google Analytics Debugger Chrome Extension. It prints out to the console a summary of the data it has sent to Google Analytics which, for all purposes except testing profile filters, is all you need. It'll eliminate the "wait a day" step.
If you're not a fan of Google Chrome, you can inspect the HTTP requests yourself to see how the data is parsing. You can use this guide to figure out what each paramater in the URL represents.
In terms of ensuring the features I've installed or the code itself is working, I'll open a fresh browser (cleared of cookies), and navigate to the site I'm testing via Google search. I'll proceed to navigate to all of the pertinent pages, and trigger all the pertinent events, all the while ensuring that the requests are being sent to Google, and that the session isn't broken at any point (by either keeping an eye on the Session Count, or ensuring that the traffic source doesn't change from organic/google to direct or a self-referral.
Screenshot:
To begin with, this answer isn't at odds with any portion of either of the two answers before mine--i.e. you could certainly implement them all without conflict.
My answer just reflects my own priority, which is that the latency issue. Latency makes debugging far more difficult than it should be. Ten minutes of latency while waiting for the compiler to finish is irritating, four hours (minimum GA latency) is painful.
So for me, the first step in building a GA de-bugging framework was to somehow get the GA results in real-time--in other words, if i changed a regular expression filter, i needed to catch the traffic processed by that filter. So removing the 4-24 hour latency in getting results from the GA server was critical.
The easiest way i have found so far to do this is to modify the GA tracking code on each page of your Site so that it sends a copy of each GIF Request to your own server.
To do this, immediately before the call to trackPageview(), add this line:
pageTracker._setLocalRemoteServerMode();
This will send the entire request header to your server access log, which you can parse in real time. (Specifically, your server writes to the access log one line at a time--one line corresponds to one request. All of the GA data is packaged and set as a request header, so there's perfect coincidence between the two.
yahelc answer is great, but I'd like to add my 2c here.
Get yourself a nice sniffer to see the hits flowing.
Nice options:
Wasp
Charles
HTTPFox
Fiddler
Then implement your changes on QA.
Test this new setup on QA. Things you should keep an eye on.
Always make sure that the basic pageview fires. It should have at least an utmp value and no utmt set.
Make sure the visitor Id doesn't get overwritten. This is the second number on the __utma cookie. This number should be your userid, if it changes then things are broken.
Make sure your pageviews contain the page and session variables you set. If you set any. They are coded into the param utme.
Make sure that any Visitor custom var is fired before your basic pageview. utmt=custom variable
Make sure the source data is not overwritten (Campaign/medium/source/content/keyword) - These are set on the __utmz cookie. If it gets overwritten by direct or a referral of you own site there's something wrong.
If you miss any event it may be due a reqired field missing or the last value being a float or string. The value of an event must be an integer.
If you're using the ecomerce double check all your parameters. Make sure that you're firing everything as strings here and that unused parametrs are empty strings.
triple check your account number. UA-XXXXX-X.
If your doing something with custom JS make sure to test on all browsers, and try to get at least the basic tracking on a safe zone where you are sure things won't break.
Send debug info about javascript code that might break GA to GA. Check this.
I'm implementing a simple game in Javascript, and am interested in having an online highscores table for it, so that players can compete against one another. I've two concerns about this:
What is the simplest server-side program I need for this purpose? I don't need a full-fledged "web application", just something simple that gets POST requests with highscores, updates a database and sends back lists of scores. I'm familiar with Django. What are your suggestions?
How can I make the highscores table reasonably secure? I'm aware that making it bulletproof against competent and dedicated hackers is difficult, but I wouldn't want anyone with access to the Javascript sourcecode to be able to submit fictitious scores too simply. Any tools for this purpose?
It's going to be pretty hard to secure the high scores. I mean, it's not enough to ensure that it comes from your page, because if, say, the JavaScript function is submitHighScore(n) then they can always type javascript:submitHighScore(10000000) in the address bar on that page and have it work.
What comes to mind is perhaps some sort of hash function that generates specific codes that match certain levels in the game. When submitting the score it would also submit this hash, so users would have had to get to this level in order to get that equivalent score.
Another option would be for the game to pull in some kind of key that only works temporarily, so as you went along the key would change and then the score would be submitted to a central server intermittently.
Keep in mind that really determined individuals can always just track the data being sent to your data and decompile it.
You could go the Broderbund route and ask the player trivia questions which are validated server-side to ensure they really did pass the level they said they did...something like "What color was the monster in the previous level?"
To submit score securely, sign it (you'd also need to ensure that the score isn't faked before it's signed and sent, but that's another problem).
Hide a secret in JS code, and send highscore + hash(highscore + secret) to the server. The hash could be MD5/SHA1 — there are easy to find JS implementations.
Of course it won't stand anyone carefully analysing the JS code, but at least someone won't be able to submit fake highscore just by tampering with HTTP traffic.
On hiding stuff in JS:
You can't really hide it (it's ultimately futile like DRM), but you can try hard to obfuscate it and make debugging painful.
Don't put the secret as a literal in the source code - compute it at run time combining results of several functions, local and global-ish variables.
Minify all code, remove sourcemaps.
Add bits of code that don't do anything, but seem to be important, to make debugging more confusing.
Don't put anything in global scope, but do rely on shared mutable state by passing closures and arrays around.
Rely on Date and timers to cause race conditions to make your code produce wrong results if it's paused in the debugger (just don't make it too tight to allow it to run on slow machines).
If the game is deterministic (like a puzzle game), then users could submit highscore in form of a log of steps taken to win (user's input) that you'd replay on the server to calculate the score.
This would change attack from finding/emulating score-submitting function to witing AI or hacking the game itself to make it easier to play (but still within its basic rules).
1.) Any CGI script that can talk to a database and understand JSON, or other format of your choice, will do the work.
However, if you're familiar with Django, building your server on top of Django would be the most simple, in the sense of what you have to learn and how much application code you have to write. Seemingly simple CGI script can turn out rather complex if you write it from scratch.
I found django-piston to be a handy Django app to quickly write a REST-style API server. It supports JSON so it should be easy to interface with your JavaScript game.
2.) The most casual cracker will go for a replay attack and its variants: peek at the page source and execute a JavaScript function, intercept HTTP requests and resend it (should be easy with a Firefox add-on like Tamper Data).
To counteract the former, you can obfuscate the source code and HTTP body;
Minify the JavaScript code
Encode the message you send to the server with Base64 or other encoding algorithm
The latter can be prevented by requiring all update requests to include an one-time password ("session token" in the Wikipedia article) that was recently acquired from the server.
I am thinking about this myself. What seems to be the most reasonable solution to me is this:
1) Sessions, to disallow tampering with the scoretable outside the game.
2) Log every action in the game and send it to the score server. The server will then calculate if those actions actually give such score. If you also log the time spent playing the game, you can further minimize the chance of an attacker to bother himself enough to break your game. This will also enable you to make a replay script like Arcade servers with hi-score tables have and in case of a suspicious score, you can watch the replay and decide for yourself if the score is real. The cheater would have to use a clever bot to play your game - and unless you have a game for real prizes, noone will try that hard.
If the cheater won't even analyze your code, sessions will stop him. If he reads your code, he would quickly break anything similar to hashed scores, secrets, tokens and whatsoever. But if you make the game-logging script thorough enough, he will give up.
In answer to your question:
1.) This depends on your environment and coding preference. PHP, Python, ASP.NET are a few that come to mind. Sense you already know Python (from your profile) you can use a Python CGI script to do this or use one of the many frameworks for Python (Zope, Django, Pylons,...).
see: http://www.python.org/doc/essays/ppt/sd99east/index.htm
for info on Python CGI.
2.) A few tricks for security: (none or full-proof)
Hidden Text Box in HTML with encoded value that server checks to match a cookie to ensure high score comes from your page.
Server Script only accepts values from a specific domain
You could use a combination of one of the methods above, as well as simply requiring the user to be registered to be able to post high scores. Non registered users could view their current score compared to existing high scores, but in order to post your high score online, you must have already logged in with your registered account, or provide it when the app goes to update the score online.
A simple message along the lines of "Your high score is X, and ranks ### in the high score table. To post this score online, please register with us first".
The better I think, is to make the calculation of the score directly into the python files of your django app.
Instead of calculate it in the javascript file. You send the datas to compare with your data basis with a POST request, then you calculate the score and you store it in your data basis. Like this, you don't make circulate the score across the web to your servers. Which is completely safe because you are only doing local stuffs.
It's a simple case of a javascript that continuously asks "are there yet?" Like a four year old on a car drive.. But, much like parents, if you do this too often or, with too many kids at once, the server will buckle under pressure..
How do you solve the issue of having a webpage that looks for new content in the order of every 5 seconds and that allows for a larger number of visitors?
stackoverflow does it some way, don't know how though.
The more standard way would indeed be the javascript that looks for new content every few seconds.
A more advanced way would use a push-like technique, by using Comet techniques (long-polling and such). There's a lot of interesting stuff under that link.
I'm still waiting for a good opportunity to use it myself...
Oh, and here's a link from stackoverflow about it:
Is there some way to PUSH data from web server to browser?
In Java I used Ajax library (DWR) using Comet technology - I think you should search for library in PHP using it.
The idea is that server is sending one very long Http response and when it has something to send to the client it ends it and send new response with updated data.
Using it client doens't have to ping server every x seconds to get new data - I think it could help you.
You could make the poll time variable depending on the number of clients. Using your metaphor, the kid asks "Are we there yet?" and the driver responds "No, but maybe in an hour". Thankfully, Javascript isn't a stubborn kid so you can be sure he won't bug you until then.
You could consider polling every 5 seconds to start with, but after a while start to increase the poll interval time - perhaps up to some upper limit (1 minute, 5 minute - whatever seems optimal for your usage). The increase doesn't have to be linear.
A more sophisticated spin (which could incorporate monzee's suggestion to vary by number of clients), would be to allow the server to dictate the interval before next poll. The server could then increase the intervale over time, and you can even change the algorithm on the fly, or in response to network load.
You could take a look at the 'Twisted' framework in python. It's event-driven network programming framework that might satisfy what you are looking for. It can be used to push messages from the server.
Perhaps you can send a query to a real simple script, that doesn't need to make a real db-query, but only uses a simple timestamp to tell if there is anything new.
And then, if the answer is true, you can do a real query, where the server has to do real work !-)
I would have a single instance calling the DB and if a newer timestamp exists, put that new timestamp in a application variable. Then let all sessions check against that application variable. Or something like that. That way only one innstance are calling the sql-server and the number of clients does'nt matter.
I havent tried this and its just the first idéa on the top of the head but I think that cashe the timestamp and let the clients check the cashe is a way to do it, and how to implement the cashe (sql-server-cashe, application variable and so on) I dont know whats best.
Regarding how SO does it, note that it doesn't check for new answers continuously, only when you're typing into the "Your Answer" box.
The key then, is to first do a computationally cheap operation to weed out common "no update needed" cases (e.g., entering a new answer or checking a timestamp) before initiating a more expensive process to actually retrieve any changes.
Alternately, depending on your application, you may be able to resolve this by optimizing your change-publishing mechanism. For example, perhaps it might be feasible for changes (or summaries of them) to be put onto an RSS feed and have clients watch the feed instead of the real application. We can assume that this would be fairly efficient, as it's exactly the sort of thing RSS is designed and optimized for, plus it would have the additional benefit of making your application much more interoperable with the rest of the world at little or no cost to you.
I believe the approach shd be based on a combination of server-side sockets and client-side ajax/comet. Like:
Assume a chat application with several logged on users, and that each of them is listening via a slow-load AJAX call to the server-side listener script.
Whatever browser gets the just-entered data submits it to the server with an ajax call to a writer script. That server updates the database (or storage system) and posts a sockets write to noted listener script. The latter then gets the fresh data and posts it back to the client browser.
Now I haven't yet written this, and right now I dunno whether/how the browser limit of two concurrent connections screws up the above logic.
Will appreciate hearing fm anyone with thoughts here.
AS