I use to make an interface to send message to my website by using only HTML (without any kind of human verification). Annoying to get every days many fake message by bots that spam or flood my website, I do this interface by using Javascript : I mean, thanks fo JS, I open a box like a popup on my website, and than the message will be send using this interface.
After done this, I never get any kind of fake message. That I'd like to know : this is a real barrier to avoid the flooding on a website, or there are some kind of bots that can also use Javascript?
You will find that typical bots will only attempt the "typical" list of weaknesses. If you have come up with some sort of system that is a little more intuitive then others and possibly unique then you should be OK with what you have.
Thats not to say that no bot is out there which might be able to break through your barrier.
A bot can be coded to parse the javascript, of course, but most bots don't do this as they go for the easy targets.
If you'd really like to be safe, you need to implement something like a CAPTCHA, which uses images that are very hard to read by machines. But OCR has taken huge leaps forward, so they may be rendered useless in a couple of years.
http://en.wikipedia.org/wiki/Captcha
Related
I want to implement some anti-crawler mechanism to protect data in my site. After reading many related topics in SO, I am going to focus on "enforce running javascript".
My plan is:
Implement a special function F (eg. MD5SUM) in javascript file C
Input: cookie string of current user (the cookie changes in each response)
Output: a verification string V
Send V along with other parameters to sensitive backend interface to request valuable data
Backend server has validation function T to check whether V is correct
The difficult part is how to obfuscate F. If crawlers can easily understand F, they will get V without C and bypass javascript.
Indeed, there are many js obfuscators, but I am going achieve the goal by implement a generator function G which is not appear in C.
G(K) generates F, where K is a large integer. F should be complicate enough, so that crawler writers have to take many hours to understand F. Given another K',
G(K') = F', F' should look like a new function in some extent, and again, crawler writers have to take hours to crack.
A possible implementation of G might be a mapping from integer to a digital circuit of many connected logic gates (like a maze). Using javascript grammar to represent it as F. Since F must be run in javascript, crawlers have to run PhantomJS. Furthermore, I can insert sleeps in F to slow down crawlers while normal users hardly aware 50-100ms delay.
I know there is a group of methods to detect crawlers. They will be applied. Let's only discuss "enforce running javascript" topic.
Could you give me some advice? Is there any better solution?
Using login to prevent the whole world to see the data is one option.
If you do not want logged in users to fetch all the data you make available to them, you could then limit the number of requests per minute for the user, adding a delay to your page load if it has been reached. Since the user is logged, you could easily track the requests server-side even if they manage to change cookies/localStorage/IP/Browser and whatnot.
You can use images for some texts, that will force them to use some resource-heavy mechanics to translate to usable information.
You could add hidden texts, this would even prevent users' copy/paste (you use spans filled with 3-4 random letters on every 3-4 real letter and make them font-size 0). That way they aren't seen, but still copied, and most likely will be taken from crawler.
Refuse connection from known crawler HTTP header signatures, although any crawler could mock those. And greasemonkey or some scripting extension could even turn a regular browser into a crawler so this has very little incidence.
Now, to force using javascript
The problem is that you cannot really force any javascript execution. What the javascript does is seen by everyone who has access to the page, so if it's some kind of MD5 hash you'd accomplish, this can be implemented in any language.
That's mainly unfeasible because the crawler has access to exactly everything the client's javascript has access to.
Forcing to use a javascript enabled crawler can be circumvented, and even if not, with the computing power available to anyone nowaday, it is very easy to launch a phantomJS instance... And as I said above, anyone with slight javascript knownledge can simply automate clicks on your website using their browser, which will make everything undetectable.
What should be done
The only bulletproof way to prevent crawlers to leech your data, and to prevent any automation is to ask something that only a human could do. Captcha comes to mind.
Think about your real users
First thing you should keep in mind is that is your website starts to get annoying to use for normal users, they will not come back. Having to type a 8 character captcha on each page request just because there MIGHT be someone who wants to pump the data will become too tedious for anyone. Also, blocking unknown browser agents might prevent legit users from accessing your website because of X or Y reason they are using a weird browser.
The impact on your legit users, and the time you'd take working hard on fighting crawlers might be too high to just accept that some crawling will happen. So your best bet is to rewrite your TOS to explicitly forbid crawling of any sort, log every http access of every user, and take action when needed.
Disclaimer:
I'm scrapping over a hundred websites monthly, following external
links to totalise about 3000 domains. At the time of posting, none of
them are resisting, while they employ one or more techniques of the
above. When a scrapping error is detected, it does not take long to
fix it...
The only thing is to crawl respectfully, not over crawl or make too
many requests in a small time frame. Just doing that will circumvent
most popular anti crawlers.
I wanna build an app something like mobile messanger. But i'm not quite a programmer. I mean, i know js in the middle level but i have not used it for something serius.
Anyway, the basic idea that i need to be done in the app is push massages or something like that. Let's imagine that one app's user turn on flashlight on mobile phone of other user. And othe user can do the same thing in other direction or with any contact's nubers who have this app. In other words i need to seending some code from one client to another.
I do not want to grow a beard while learning Cbjective C or Java. I just want my app. And obviously it will be on phonegap with node js.
My question is not like that "please write the code for me". But if someone can tell something about it or give some links or keywords for googling then i will be very happy.
At the now momment i have smooth imagination about the indentification of user, then sending the massage, then server searching othe user, then sending them the message... and in the other direction.
There may be some standard technique for such things?
PS: Sorry for my eanglish.
This question may be to broad and open for interpretation. The answers you get may be based on opinionated, rather then helpful solutions based on a specific technical problem. It's probably best to recreate your question with more specific information. Try something specific, document what you did, then post specific technical questions here.
Generally speaking, I'd create some back-end service that saves and serves up messages between each chatter/participant, and call that service with javascript utilizing JSON as the data to pass back and forth. That's just one way. There's too many different ways to describe here. My answer is just me simply trying to be helpful. However, this question might get closed for being to opinion based. But that's my 2 cents.
I need a problem that is computationally difficult (in any language), that I can easily implement in JavaScript. I'm trying to do a CAPTCHA-like test to make it unlikely that hacker is accessing my page mechanically.
Yes, I know that he could use Rhino or some other JS engine and do it -- that's why I want it to be computationally expensive, so it takes him a few hours to set up and his machine a few seconds to fake each access.
I'm think getting a bunch of large primes on the back end and sending over the product of two of them and demand that web-page factor it, but if anybody has a better idea, I'm all ears. Also, does anybody have a good library for doing that factoring thing?
You can use the same method as bitcoin, ie. reversing a secure hash.
Explained here:
http://www.tomshardware.com/reviews/bitcoin-mining-make-money,3514-3.html
Bitcoin source
https://github.com/bitcoin/bitcoin
you can implement a standard captcha and make some more checking on the client side. for exaample, add a event listener on the captcha input text to listen for key down/key up events and xor the keycodes and send them along with the captcha. add a hidden input text in the form named email or something you find on every form. robots fill those up automatically. and if you get a value for post['email'] then it's a robot because the user won't see that. also you can have a piece of code in a totally unrelated javascript that automatically adds a field in the form that is required to validate. so...captcha no captcha, you can still enhance the robot protection client side without computation difficult processes.
The problem with this is that if it is known to be NP-Hard, it's going to be a pain in the rear for human beings to solve, as well, on non-trivial instances. Visual/auditory captchas are kind of cool in that they give people a leg up... we have very sophisticated sensory organs for processing these kinds of things, and computers are not too good at it (though they are getting better all the time!).
As such, you're probably better off coming up with a unique thing that people can do very easily, but that machines are not too good at. For instance, give some simple black and white pictures and ask the user which one doesn't belong, or show some pictures of foods and ask what kind of recipe you could make with them.
Clever approach. Whenever one-way complexity is needed it makes me think of a hash. Simply hash some aspect of their user account (not anything sensitive) and send the hash to the client. You would want to truncate/pad the string to get your desired complexity level. This isn't to secure an account so md5 or any other hashing algorithm would be fine.
Here is some sample code that you might be able to leverage for the client side.
We're talking your average everyday spamming bots -- those which we try to protect against using captcha.
How many of them are capable of running JS in some kind of embedded-browser?
If it's a very tiny amount, then how on earth can solutions like this be useful: http://wcaptcha.wozia.pt/sample.php
Apart from the obvious usability/accessibility issues, these drag-n-drop solutions require the client to have JS. There's not even a fallback. So, assuming it is intended to protect against bots (non-humans) isn't it entirely redundant, or at least redundant to the extent of how many bots would be technically capable of attempting such a thing?
If the client has JS (which is a pre-requisite for this solution to work) then isn't it safe (within reasonable measure) to assume the client not a bot?
It isn't that redundant. If you just detect for Javascript, people can still boot up instances of Selenium and pretend to comment. The number of spam bots doing that now is in the minority, but as the spam wars evolve, you can bet spam bots will move on to other methods such as using a browser. If you detect for Javascript AND make them drag and drop something, it'll definitely prove you're a human.
But I think this implementation is just not practical because there is still a % of people that have JS off for whatever reason. I hear this % is 2 or 3%, which is still a good amount when you're talking about hundreds of thousands of visitors.
An alternative is to have a noscript option that asks the user to activate Javascript if he/she wants to comment on the blog.
Yes, very few spambots will have JavaScript enabled.
Spam is a percentages game. Only a very small percentage of spam messages will trigger any revenue for the spammer. If you can increase the cost of spam, you make it economically infeasible. Spamming in a JavaScript-enabled browser is way more expensive than spamming on the command line, so you can send out more spam at a time if you stick to curl.
Yes, it is redundant.
Rather than making users do this pointless task, you might as well automatically perform a javascript check. It could be as simple as a script that grabs the domain name of the site and inserts it into each form as a hidden field. This will stop all drive-by spammers. If your site is high-profile enough to attract custom spammers, this solution won't be enough anyway.
For those without JavaScript, just show them a regular old image CAPTCHA after their post fails.
A bigger issue is usability IMHO. Captcha is always going to decrease conversion rates, and often significantly. If your goal is to use JS as a means of deterring bots, I can tell you that it has significantly reduced bot traffic for me by more than 90%.
Just incorporate a hidden field that gets populated by JS. If it isn't filled in, they're either a bot, or one of those idiots with JS turned off, who you don't really want to cater to anyway.
Also incorporate a hidden field that is visible in the DOM. Make it fly off the screen with CSS like "position:absolute; left:9999px; top: -9999px". Don't use "display:none;" If this field is filled in, they're a bot.
I cut down our spam more than 90% with this, so you should use it over Captcha types, unless you're a big business. If you're a big business, your only real solution is a back-end server side solution. Good luck finding that on StackOverflow. They'll close your comment quicker than people can answer it. (and it will have better Google rank than anything out there)
I am creating a web site and my client demands to restrict user to copy TEXT displayed on the web page.how can I do that? I am using PHP and HTML in my application.
Not trying to be rude, but why do people keep asking this? If you want people to be able to see the information, then you cannot prevent them from copying it. Any kind of javascript nonsense to prevent right-clicking or selection or whatever else will not stop determined thieves and will annoy legitimate users.
As mentioned by every answer previously, there's no way to prevent someone from being able to use the copy from your site. Even if you use methods to restrict direct copy and paste, there are always screenshots, OCR or good old writing by hand.
Looking at it from a different perspective...if the content is sensitive and your client doesn't want it distributed, you COULD add it to a section of your site that requires registration and authentication to access. By doing this you could require that users agree to terms and conditions on registration which explicitly deny permission to reproduce any of the content from the site.
Just a thought.
As every other answer has said, there is nothing technically you can to to prevent people from copying the text of your page. For the text to be display to the user, you must send it to the user's computer, which means they can copy it.
However, you can legally prevent them from copying the text with a service like CopyScape
Copyscape is dedicated to protecting
your valuable content online. We
provide the world's most powerful and
most popular online plagiarism
detection solutions, ranked #1 by
independent tests. Copyscape's
products are trusted by millions of
website owners worldwide to check the
originality of their new content,
prevent duplicate content, and search
for copies of existing content online.
Copyscape provides a free service for
finding copies of your web pages
online, as well as two more powerful
professional solutions for preventing
content theft and content fraud:
Copyscape Premium provides more
powerful plagiarism detection than the
free service, plus a host of other
features, including copy-paste
originality checks, batch search, case
tracking and an API
Copysentry provides comprehensive
protection for your website by
automatically scanning the web daily
or weekly and emailing you when new
copies of your content are found.
Read more on their site.
you can force people to call a phone number to hear the text of your website, a great solution if you do not want people to copy/paste the text of your webpage
Basically, you cannot. Even if there was a way to restrict user from copy & paste the text, they can always just grab the screen and translate it somehow into text.
I'd recommend not to try restrict users in any way. It's not really friendly and people usually hate it. If you want to create some private content, just make people to log in, do some ACL check and hope that they won't copy it somewhere else. You could also consider using some kind of license to prevent people from "stealing" your content.
Even if he was to build the system in flash the user could still hand write out the content if they desperately wanted it, like everyone else said its impossible to stop a determined person from getting your content, unless of course you don't display it.
No, AFAIK, there is no way you can achieve that. Unless you're building the whole thing in Flash or other non-HTML plugin contents.
The short answer is that you can't (easily) do this - if it's visible in the browser then it is obtainable somehow. This is particularly the case if you are just displaying text.
And it all gets back to "Why"? If the information is secret, don't show it to someone in the first place. If you're concerned about copyright violation, as others have said, once someone sees the text, even if you somehow came up with a brilliant technical solution that prevented them from copying the text in any way (which I doubt is possible), they could always write it down by hand, or take a picture of the screen with a digital camera and then OCR it. In the digital age, your protection against copyright violation is more legal than technical: if somebody steals your material and resells it, sue them.
Depending on the nature of your material, you may be able to make it awkward for people to get it all on one screen. Like, if you were running an on-line phone book and you were afraid of people stealing your listings, instead of displaying some large number of listings on one giant page -- all the "A"s or whatever -- you could require people to enter search terms and only show two or three possible hits at a time. Then if someone wanted to steal your listings, they would have to spend thousands of hours entering every imaginable search term. Now that I think of it, I was using some phone book site the other day that gave me a listing of names and addresses that were possible matches, but then I had to click on each one to get the phone number. At the time I thought "dumb nuisance", but now it hits me: they probably had the same idea that I briefly thought was original. Anyway, if your material is a database of individual factoids, this could be practical. If it's an article on the economic history of Lithuania or some such, making the user seach for it in tiny pieces is just going to make people abandon you and look elsewhere.
Personally, I've taken the philosophy that I just don't care. I've had many occassions when I've done Google searches on subjects that interest me and turned up articles that I've written, on sites that never asked my permission. I once even found an article that I wrote on one of those pre-written student papers web sites. (Not that any student would just paste his name on it, print it off, and hand it in, of course. They are "for research purposes only". I'm sure if they knew of students claiming this as their own work they would take down the site immediately.) So an article that I published on the web, available to anyone for free, these people were now charging dishonest students $25 to download! My reaction was, Way cool! It's one thing when others quote you, but you've really reached the big time when others plagiarize you!
This is not possible.
You cannot prevent someone from getting the information if you're sending it to them so they can see it. A user can simply view the source of the HTML and see what the text is and copy it from there and there's nothing you can do to stop them.
Implementing anything in JavaScript is completely ineffective since anyone can just disable JavaScript in their browser and get around it, and you'll only end up annoying your users.
The only way to prevent someone copying the text from a web page is to not put it on the web page in the first place.
If you presented content via images, or flash, and prevented the ability to save as that might be a solution. I found some resources you might find useful in protecting images here and some information on "preventing" print screen here.
Unfortunately, there is no easy solution for your question, as once the content is delivered to the user, they have ultimate control over the information (who's preventing them from taking an actual picture of the site?).
Well, the PHP has nothing to do with it, as that's server-side. You might be able to cook up something in javascript (it's fairly easy to disable right-click; it may also be possible to disable text highlighting), but it's fairly easy to get around this. Failing all else, the user might view source, though that can be encrypted too:
document.write(base64decode('encoded string containing entire HTML document'));
This is, frankly, both annoying and pointless. Anything that's available to the user can be taken somehow. Even flash isn't immune. (There are browser plugins available to take videos out of flash.)
You may want to look at your target audience as well to help determine how you want to make it harder (since you can't realistically prevent it)..
For the simple user just disabling the right click may be good enough to prevent it. Slightly more work would be to do as others had suggested and create an image. With the image you'd probably want to set a background-image on a DIV or something since you can easily drag images, using the IMG tag, straight from the page onto you desktop, or wherever. From there you could use Flash, or some other RIA, or maybe even SVG/VML..
Anyone who knows how to do a screen capture really narrows down what you can feasibly implement :(
<script type="text/JavaScript">
//script to bar copying of website contents
function killCopy(e){
return false
}
function reEnable(){
return true
}
document.onselectstart=new Function("return false"){
if (window.sidebar){
document.onmousedown=killcopy
document.onclick=reEnable
}
};
</script>