I need a problem that is computationally difficult (in any language), that I can easily implement in JavaScript. I'm trying to do a CAPTCHA-like test to make it unlikely that hacker is accessing my page mechanically.
Yes, I know that he could use Rhino or some other JS engine and do it -- that's why I want it to be computationally expensive, so it takes him a few hours to set up and his machine a few seconds to fake each access.
I'm think getting a bunch of large primes on the back end and sending over the product of two of them and demand that web-page factor it, but if anybody has a better idea, I'm all ears. Also, does anybody have a good library for doing that factoring thing?
You can use the same method as bitcoin, ie. reversing a secure hash.
Explained here:
http://www.tomshardware.com/reviews/bitcoin-mining-make-money,3514-3.html
Bitcoin source
https://github.com/bitcoin/bitcoin
you can implement a standard captcha and make some more checking on the client side. for exaample, add a event listener on the captcha input text to listen for key down/key up events and xor the keycodes and send them along with the captcha. add a hidden input text in the form named email or something you find on every form. robots fill those up automatically. and if you get a value for post['email'] then it's a robot because the user won't see that. also you can have a piece of code in a totally unrelated javascript that automatically adds a field in the form that is required to validate. so...captcha no captcha, you can still enhance the robot protection client side without computation difficult processes.
The problem with this is that if it is known to be NP-Hard, it's going to be a pain in the rear for human beings to solve, as well, on non-trivial instances. Visual/auditory captchas are kind of cool in that they give people a leg up... we have very sophisticated sensory organs for processing these kinds of things, and computers are not too good at it (though they are getting better all the time!).
As such, you're probably better off coming up with a unique thing that people can do very easily, but that machines are not too good at. For instance, give some simple black and white pictures and ask the user which one doesn't belong, or show some pictures of foods and ask what kind of recipe you could make with them.
Clever approach. Whenever one-way complexity is needed it makes me think of a hash. Simply hash some aspect of their user account (not anything sensitive) and send the hash to the client. You would want to truncate/pad the string to get your desired complexity level. This isn't to secure an account so md5 or any other hashing algorithm would be fine.
Here is some sample code that you might be able to leverage for the client side.
Related
When you pay through online payment systems ( being with or without 3DSecure), you fill in the form and validate, and from a strictly visual point of view, things seems pretty straightforward. But behind, there is often multiple redirections, which are handled through JavaScript.
Basically, your data is submitted, and you land on a page with a pre-filled form, which is immediately submitted through JavaScript, sometimes multiple times in a row (with fast enough connection, you don't even see those steps from browser).
I was wondering why they do it that way (instead of proper back-end redirections), and I can't find an answer to it.
My guess is that it's just to make it harder for scripts to follow it, but it's still possible to do it (so why bother), and to my opinion, the "dirty aspect" of it (from a coder point of view) is not worth the constraints it gives to scripts that would attempt an automatic validation.
Do you have any insights on this?
From my view, using the JavaScript will detect the bot or human efficiently.
As you can already saw, how the Google validate the bot.
It's just simple a check box, but it's quite complicated if you try to write the bot to verify or pass the check. (Now I still don't know how to pass by it ^)
I want to implement some anti-crawler mechanism to protect data in my site. After reading many related topics in SO, I am going to focus on "enforce running javascript".
My plan is:
Implement a special function F (eg. MD5SUM) in javascript file C
Input: cookie string of current user (the cookie changes in each response)
Output: a verification string V
Send V along with other parameters to sensitive backend interface to request valuable data
Backend server has validation function T to check whether V is correct
The difficult part is how to obfuscate F. If crawlers can easily understand F, they will get V without C and bypass javascript.
Indeed, there are many js obfuscators, but I am going achieve the goal by implement a generator function G which is not appear in C.
G(K) generates F, where K is a large integer. F should be complicate enough, so that crawler writers have to take many hours to understand F. Given another K',
G(K') = F', F' should look like a new function in some extent, and again, crawler writers have to take hours to crack.
A possible implementation of G might be a mapping from integer to a digital circuit of many connected logic gates (like a maze). Using javascript grammar to represent it as F. Since F must be run in javascript, crawlers have to run PhantomJS. Furthermore, I can insert sleeps in F to slow down crawlers while normal users hardly aware 50-100ms delay.
I know there is a group of methods to detect crawlers. They will be applied. Let's only discuss "enforce running javascript" topic.
Could you give me some advice? Is there any better solution?
Using login to prevent the whole world to see the data is one option.
If you do not want logged in users to fetch all the data you make available to them, you could then limit the number of requests per minute for the user, adding a delay to your page load if it has been reached. Since the user is logged, you could easily track the requests server-side even if they manage to change cookies/localStorage/IP/Browser and whatnot.
You can use images for some texts, that will force them to use some resource-heavy mechanics to translate to usable information.
You could add hidden texts, this would even prevent users' copy/paste (you use spans filled with 3-4 random letters on every 3-4 real letter and make them font-size 0). That way they aren't seen, but still copied, and most likely will be taken from crawler.
Refuse connection from known crawler HTTP header signatures, although any crawler could mock those. And greasemonkey or some scripting extension could even turn a regular browser into a crawler so this has very little incidence.
Now, to force using javascript
The problem is that you cannot really force any javascript execution. What the javascript does is seen by everyone who has access to the page, so if it's some kind of MD5 hash you'd accomplish, this can be implemented in any language.
That's mainly unfeasible because the crawler has access to exactly everything the client's javascript has access to.
Forcing to use a javascript enabled crawler can be circumvented, and even if not, with the computing power available to anyone nowaday, it is very easy to launch a phantomJS instance... And as I said above, anyone with slight javascript knownledge can simply automate clicks on your website using their browser, which will make everything undetectable.
What should be done
The only bulletproof way to prevent crawlers to leech your data, and to prevent any automation is to ask something that only a human could do. Captcha comes to mind.
Think about your real users
First thing you should keep in mind is that is your website starts to get annoying to use for normal users, they will not come back. Having to type a 8 character captcha on each page request just because there MIGHT be someone who wants to pump the data will become too tedious for anyone. Also, blocking unknown browser agents might prevent legit users from accessing your website because of X or Y reason they are using a weird browser.
The impact on your legit users, and the time you'd take working hard on fighting crawlers might be too high to just accept that some crawling will happen. So your best bet is to rewrite your TOS to explicitly forbid crawling of any sort, log every http access of every user, and take action when needed.
Disclaimer:
I'm scrapping over a hundred websites monthly, following external
links to totalise about 3000 domains. At the time of posting, none of
them are resisting, while they employ one or more techniques of the
above. When a scrapping error is detected, it does not take long to
fix it...
The only thing is to crawl respectfully, not over crawl or make too
many requests in a small time frame. Just doing that will circumvent
most popular anti crawlers.
I made this like random lottery.
Generates random number from 1 to 10000
If it is smaller than 5000 double the value of your coins
if not take them away
When I tested that system I could make that your winning would be more than 2 times bigger by inspecting elements going to source and finding my JavaScript file and changing bet * 2 into bet * 999.
Now I need to remake it because I don't want that my websites users to cheat.
Instead of adding script from script file I wrote it directly in the HTML page between <script> code </script> and then I felt like god because I thought I fixed that.
Is this proper way to deal with this?
All the code (HTML, CSS, Javascript) that you send to client (browser or other) is editable.
Never rely on client side validation.
Never trust user input even after server side validation, before dumping user input back on the page escape for XSS.
Never put your business logic in client side code.
Never trust user supplied date information as request date.
Give every form with CSRF parameter.
These are the most basic rules I can think of.
Cheating on games is not about your source code being editable or not.
It's the same problem that the music industry had when people were freely copying and sharing they work, and they thought that DRMs were the way to go to avoid that. In the end, if you can listen to music on your computer, you will always be able to copy it.
The same goes for the source code of your game: you can hide it with obfuscation, cryptography or use schemes as complicated as you see fit but as long as it runs on your user's computer, they will be able to change it.
Now, if one of your users change the code locally and have the game report fake winnings that's not a problem as long as it's not real money, and as long as that does not change the gaming experience of other users.
I recommend reading the following question. It's not a duplicate but most of your questions should be answered by reading all answers: What good ways are there to prevent cheating in JavaScript multiplayer games?
Imagine a space shooter with a scrolling level. What methods are there for preventing a malicious player from modifying the game to their benefit? Things he could do that are hard to limit server-side is auto-aiming, peeking outside the visible area, speed hacking and other things.
What ways are there of preventing this? Assume that the server is any language and that the clients are connected via WebSocket.
Always assume that the code is 100% hackable. Think of ways to prevent a client completely rewritten (for the purposes of cheating) from cheating. These can be things such as methods for writing a secure game protocol, server-side detection, etc.
The server is king. Clients are hackable.
What you want to do is two things with your websocket.
Send game actions to the server and receive game state from the server.
You render the game state. and you send input to the server.
auto aiming - this one is hard to solve. You have to go for realism. If a user hits 10 headshots in 10ms then you kick him. Write a clever cheat detection algorithm.
peeking outside the visibile area - solved by only sending the visible area to each client
speeding hacking - solved by handling input correctly. You receive an event that user a moved forward and you control how fast he goes.
You can NOT solve these problems by minifying code. Code on the client is ONLY there to handle input and display output. ALL logic has to be done on the server.
You simply need to write server side validation . The only thing is that a game input is significantly harder to validate then form input due to complexity. It's the exact same thing you would do to make forms secure.
You need to be really careful with your "input is valid" detection though. You do not want to kick/ban highly skilled players from your game. It's very hard to hit the balance of too lax on bot detection and too strict on bot detection. The whole realm of bot detection is very hard overall. For example Quake had an auto aim detection that kicked legitedly skilled players back in the day.
As for stopping a bots from connecting to your websocket directly set up a seperate HTTP or HTTPS verification channel on your multiplayer game for added security. Use multiple Http/https/ws channels to validate a client as being "official", acting as some form of handshake. This will make connecting to the ws directly harder.
Example:
Think of a simple multiplayer game. A 2D room based racing game. Upto n users go on a flat 2D platformer map and race to get from A to B.
Let's say for arguments sake that you have a foolsafe system where there's a complex authetication going over a HTTPS channel so that users can not access your websocket channel directly and are forced to go through the browser. You might have a chrome extension that deals with the authentication and you force users to use that. This reduces the problem domain.
Your server is going to send all the visual data that the client needs to render the screen. You can not obscure this data away. No matter what you try a silled hacker can take your code and slow it down in the debugger editing it as he goes along until all he's left with is a primitive wrapper around your websocket. He let's you run the entire authentication but there is nothing you can do to stop him from stripping out any JavaScript you write from stopping him doing that. All you can achieve with that is limit the amount of hackers skilled enough of accessing your websocket.
So the hacker now has your websocket in a chrome sandbox. He sees the input. Of course your race course is dynamically and uniquely generated. If you had a set amount of them then the hacker could pre engineer the optimum race route. The data you send to visualise this map can be rendered faster then human interaction with your game and the optimum moves to win your racing game can be calculated and send to your server.
If you were to try and ban players who reacted too fast to your map data and call them bots then the hacker adjusts this and adds a delay. If you try and ban players who play too perfectly then the hacker adjusts this and plays less then perfect using random numbers. If you place traps in your map that only algorithmic bots fall into then they can be avoided by learning about them, through trial and error or a machine learning algorithm. There is nothing you can do to be absolutely secure.
You have only ONE option to absolutely avoid hackers. That is to build your own browser which cannot be hacked. Build the security mechanisms into the browser. Do not allow users to edit javascript at runtime in realtime.
At the server-side, there are 2 options:
1) Full server-side game
Each client sends their "actions" to the server. The server executes them and sends relevant data back. e.g. a ship wants to move north, the server calculates its new position and sends it back. The server also sends a list of visible ships (solving maphacks), etcetera.
2) Full client-side game
Each client still sends their actions to the server. But to reduce workload on the server, the server doesn't execute the actions but forwards them to all other clients. The clients then resolve all actions simultaneously. As a result, each client should end up with an identical game. Periodically, each client sends their absolute data (ship positions, etc.) to the server and the server checks if all client data is identical. Otherwise, the games are out of sync and someone must be hacking.
Disadvantage of the second method is that some hacks remain undetected: A maphack for example. A cheater could inject code so he sees everything, but still only sends the data he should normally be able to see to the server.
--
At the client-side, there is 1 option:
A javascript component that scans the game code to see if anything has been modified (e.g. code modified to render objects that aren't visible but send different validation data to the server).
Obviously, a hacker could easily disable this component. To fix that, you could force the client to periodically reload the component from the server (The server can check if the script file was requested by the user periodically). This introduces a new problem: the hacker simply periodically requests the component via AJAX but prevents it from running. To avoid that: have the component redownload itself, but a slightly modified version of itself.
For example: have the component be located at yoursite/cheatdetect.js?control=5.
The server will generate a slightly modified cheatdetect.js so that in the next iteration, cheatdetect.js?control=22 (for example) must be downloaded. If the control mechanism is sufficiently complicated, the hacker won't be able to predict which control number to request next, and cheatdetect.js must be executed in order to continue the game.
There's nothing you can really do to prevent anyone from modifying your JS or writing a GreaseMonkey script. However you can make it hard for them by minifying your script as well as making your code as cryptic as possible. Maybe even throwing in some fake methods or variables that do nothing but are used to throw an attacker off. But given enough time, none of these methods are completely foolproof, as once your code goes to the client, it is no longer yours.
The only way I can even think of implementing this is by modifying your Javascript to function as a client and then designing a central server mechanism to validate data sent from that client. This is probably a big change to implement and will most likely make your project more complex. However, as was said earlier, if the application runs entirely on the client, the client can pretty much do whatever they want with your script. The only way to secure it to use a trusted machine to handle validation.
They don't have to touch your client-side code -- they could just sniff and implement your Websocket protocol and write a tiny agent that pretends to be a human player.
Update: The problem has a few parts, and I don't have answers off the top of my head, but the various options could be evaluated with these questions in mind:
How far are you willing to go to prevent cheating? If you only care about casual cheating, how many barriers are enough to discourage the casual cheater? The intermediate Javascript programmer? A serious expert? Weighing this against the benefits of cheating, is there anything of real value at stake, like cash and prizes, or just reputation?
How do you get a high confidence that a human is providing inputs to your game? For example, with a good enough computer vision library I could model your game on a separate machine feed inputs to the computer pretending to be the mouse, but this has a high relative cost (not worth my time).
How can you create a chain of trust in your protocol such that knowledge of (2) can be passed to the server, and that your server is relatively confident your client code is sending the messages?
Sure many of the roadblocks you throw up can be side-stepped, but what is the cost to the player and you? See "Attrition warfare".
Some other methods that can be implemented:
Make the target elements difficult for a script to distinguish from other elements. Avoid divs with predictable class and id names if possible. Inject styling using JavaScript instead of using classes. Think like a hacker and make it hard on yourself.
Use decoys that a script will fire on. For instance, if the threat vector is a screen scraping algorithm using pixel colors, throw some common pixel colors in non-target elements. Hits on these non-targets could seem inconsequential to the cheater, but would be detectable. You don't want the cheater to know why you know.
Limit the minimum time between actions to slightly below the best human levels. The best players will hit that plateau, and it won't matter as much who's cheating, and immediately be able to detect anyone scripting faster than that by side-calling method calls.
Random number generators are typically uniform. Human nature is not. Likely a random number generator will have values within a set limit and even distribution. Natural distribution is a Gaussian curve. If you sampled the distribution and it looks like a square wave in the x and y axis, 100% it's a cheater. This will be fairly difficult for the cheater to detect the threshold for the algorithm because it's a derivative of the random, and not the random distribution itself. You're also using aggregate data and not individual plays to detect it, so reverse engineering the algorithm would be extremely difficult without knowing your detection algorithm.
Utilize entropy whenever possible. Avoid predictable game plays. Imagine a racing game on a set collection of race tracks. Each game play could have slightly differing levels of traction, horsepower, and momentum. The script would have to be extremely good to beat it. In a scrolling game, you can alter factors that are instinctual to humans, but difficult for computers, such as wind force, changes in gravity, etc. It would also make it more fun as a side benefit.
Server generated tokens can be used to validate UI elements were used and not calls to the code itself. Validation can be handled in one call at the end of the game comparing events to hashed codes of UI elements. The token should be a hash with a server private key and some value of the UI element.
Decoy the cheater with data they think you're using to detect cheats. Such as calls to a DetectCheat method with dummy calls to a fake backend. It's the old magician's trick. Wave your hand over here, while you slip a card into the deck with the other hand. Let them waste days on end in a maze that has no exit, with lot's of hair pulling.
I'd use a combination of minification and AJAX. If all of the functions and data aren't loaded into the page, it'd be more difficult to cheat.
On the other hand, modding turned out to be a very profitable tool for companies like Id Software. Perhaps allowing the system to be modded might make the game that much more enjoyable to the community at large.
Obfuscate your client exposed code as much as possible. Additionally, use some magic.
You can edit the javascript on the browser and make it work.
Some people suggest that make a call to check with the server. So after making a call to the server, it will be validated in the server. Once validated, it will come to client side and do actions. But I think even this is not foolproof.
For eg.,. for a Basic login action : in angular while making a call to server, the backend validates username & pwd and if validated, it will come back to the client and let the user login using angular.
When I say login using angular, it is going to store things in cookies, like user objects and other things. But still the user can remove the JS code which is making the call to backend, and return TRUE(wherever needed) and insert user object(dummy) to cookies and other objects(whatever needed) and login. It is a very difficult thing to do, but it is doable. In many scenarios, this is not desirable even if it takes hours to edit/hack the code.
This is possible in single page applications, where JS files dont get reloaded for each page. To mitigate the possibility of getting hacked we can use minified codes. And I guess if actions like this is done in backend(like login in Django) it is much safer.
Please correct me if I am wrong.
We're talking your average everyday spamming bots -- those which we try to protect against using captcha.
How many of them are capable of running JS in some kind of embedded-browser?
If it's a very tiny amount, then how on earth can solutions like this be useful: http://wcaptcha.wozia.pt/sample.php
Apart from the obvious usability/accessibility issues, these drag-n-drop solutions require the client to have JS. There's not even a fallback. So, assuming it is intended to protect against bots (non-humans) isn't it entirely redundant, or at least redundant to the extent of how many bots would be technically capable of attempting such a thing?
If the client has JS (which is a pre-requisite for this solution to work) then isn't it safe (within reasonable measure) to assume the client not a bot?
It isn't that redundant. If you just detect for Javascript, people can still boot up instances of Selenium and pretend to comment. The number of spam bots doing that now is in the minority, but as the spam wars evolve, you can bet spam bots will move on to other methods such as using a browser. If you detect for Javascript AND make them drag and drop something, it'll definitely prove you're a human.
But I think this implementation is just not practical because there is still a % of people that have JS off for whatever reason. I hear this % is 2 or 3%, which is still a good amount when you're talking about hundreds of thousands of visitors.
An alternative is to have a noscript option that asks the user to activate Javascript if he/she wants to comment on the blog.
Yes, very few spambots will have JavaScript enabled.
Spam is a percentages game. Only a very small percentage of spam messages will trigger any revenue for the spammer. If you can increase the cost of spam, you make it economically infeasible. Spamming in a JavaScript-enabled browser is way more expensive than spamming on the command line, so you can send out more spam at a time if you stick to curl.
Yes, it is redundant.
Rather than making users do this pointless task, you might as well automatically perform a javascript check. It could be as simple as a script that grabs the domain name of the site and inserts it into each form as a hidden field. This will stop all drive-by spammers. If your site is high-profile enough to attract custom spammers, this solution won't be enough anyway.
For those without JavaScript, just show them a regular old image CAPTCHA after their post fails.
A bigger issue is usability IMHO. Captcha is always going to decrease conversion rates, and often significantly. If your goal is to use JS as a means of deterring bots, I can tell you that it has significantly reduced bot traffic for me by more than 90%.
Just incorporate a hidden field that gets populated by JS. If it isn't filled in, they're either a bot, or one of those idiots with JS turned off, who you don't really want to cater to anyway.
Also incorporate a hidden field that is visible in the DOM. Make it fly off the screen with CSS like "position:absolute; left:9999px; top: -9999px". Don't use "display:none;" If this field is filled in, they're a bot.
I cut down our spam more than 90% with this, so you should use it over Captcha types, unless you're a big business. If you're a big business, your only real solution is a back-end server side solution. Good luck finding that on StackOverflow. They'll close your comment quicker than people can answer it. (and it will have better Google rank than anything out there)