Protect public API from unwanted "submissions" - javascript

I've been fiddling around with this issue for quiet some time but couldn't come up with a satisfying solution so far.
We are currently in the process of creating a new public API, which will be used by widgets to get information but also to post back information to the system (like a contact form). As the widgets will be implementend as web components and can be implemented on any page, we don't have control over how the widgets are delivered.
The issue I'm facing now is: how can we protect the API from unwanted submissions (apart from general form validation) so that we can be quiet sure that it's either a submission from that form or that it's a legit POST to the API?
My concerns are, that everything in this case is spoofable (e.g. fetching a form token and submitting it as a header, validating origin headers,...), as it could easily be spoofed i.e. with Postman. I'd be more than happy for any of your experiences and tips into the right direction.

I think you could try:
Rate limit based on IP
Rate limits on general insertions
Require email validation after send (if you have this data)
Save sender IP and check with old data, to know if someoe is abusing (a monitoring tip, but maybe is not bad idea)
Captcha to avoid malicious senders (but not at all)
Have you implemented anything similar? Maybe by seeing what you have, we can see what's missing.
It is also complicated by being public, and allowing access from any system. Perhaps it would be a good idea to evaluate an authentication system, and authenticate from the widget itself, incorporating a rate lim by key.

If you have a public API without authentication,
all you can do is make access as hard as possible for hackers.
In other words: put more/complex locks on the door ... but any lock can be picked
"Lock" code using the URI
The method we used to keep a WebComponent "safe",
was to load the WebComponent from a long URI:
(Modern browsers don't have a 2048 character URI limit any more)
https://domain/p1/p2/customeElements/define/secure-api/HTMLElement/p7/p8/webcomponent.js
The component code then decodes the URI to
let p = [domain,p1,p2,"customElements","define","secure-api","HTMLElement",p7,p8]
to execute JavaScript:
window[p[3]][p[4]](p[5],class extends window[p[6]]{ ...
More locks
If you throw in some BtoA/AtoB and String.reverse() conversions
https://domain/AH=V/CV=/==QYvRnY/aW5uZXJ/IVE1M/webcomponent.js
You have deterred most potential hackers
More locks
By generating webcomponent.js server-side to use the /domain/ part,
that long URI can be the (unique) handshake between Server and Client
More complex locks
Since all state is in the URI, it is easy to apply an address shifting mechanism,
every request can be a different URI (makes PostMan unusable and a real pain to debug also :-)
[and we applied some other trickery I won't explain here]
It won't keep hackers out, but will delay them long enough for the majority to give up.
And a mousetrap
In our code/URI encoding we also included a reference to a unique "mousetrap" URI.
If we detect 404 activity in that subdir, we know someone is actively picking a lock.
And.. we can interactively lead/direct them to more mousetraps.
We have only had one attempt thus far.
One telephone call (because we know the buyers Domain) was enough to make them stop.
Hello IT manager of [very-big-well-known] IT company X,
if we detect hacking attempts from your company IP address nn.nn.nn.nn,
by law we would have to report this to the authorities
HTH

Related

Prevent malicious users from abusing and spamming unauthenticated open APIs

Here's a security problem I've encountered a couple of times when building small web-based projects interacting with a REST API service. For example, let's say you're building a casual JavaScript-based game where you want a leaderboard of highscores, so you need to post the scores of users to a database.
The easiest solution would be to build a simple web service, e.g. using PHP, Node.js or Python, that accepts GET request and saves the results to a database. Let's imagine the API looks something like this:
GET https://www.example.com/api/highscore?name=SuperGoat31&score=500
Creating such an API for posting highscores has some obvious drawbacks. A malicious user could write a three-line piece of PHP code to spam the database full of false results, for example:
for ($i = 0; $i < 100; i++) {
file_get_contents("https://www.example.com/api/highscore?name=SuperGoat31&score=5000000");
}
So, I'm looking for a way to prevent that. This mostly relates to small hobby or hackathon projects that just need some kind of protection that will prevent the most obvious of attacks, not large enterprise applications that need strict security. A couple of things I could think of:
1. Some form of authentication
An obvious way to solve this would be to have user accounts and only allow requests from logged-in users. This unfortunately has the drawback of putting up a large barrier for users, who need to get an account first. It would also require building a whole authentication workflow with password recovery and properly encrypting passwords and the like.
2. One-time token based protection
Generate a token on the server side and serve that to the user on first load, then only allow requests that serve that specific token. Simple enough, but also very easy to circumvent by finding the requests in a browser web inspector and using that for the three-line PHP script.
3. Log IP address's and ban when malicious use happens
This could work, but I feel it's not very privacy friendly. Also, logging IP addresses would require GDPR consent from users in Europe. Also doesn't prevent the actual spamming itself so you might to first clean up the mess before you start banning IP addresses.
4. Use an external service
There are services that provide solutions to this problem. For example, in the past I've used Google's reCAPTCHA to prevent malicious use. But that also means integrating an external service, making sure you keep it up to date, concerns about the privacy aspects (esp. regarding a service like reCAPTCHA), etc. It feels a bit much for a weekend project.
5. Throttle requests
I feel this is probably the easiest solution that actually works for a bit. This does require some form of IP address logging (which might give the problems stated in 3), but at least you can delete those IP addresses pretty quickly afterwards.
But I'm sure there are other methods I've missed, so I would be curious to see other ways of tackling this problem.
Taking into account all mentioned limitations, I would recommend using a combination of methods:
Simple session authentication based on one-time token
Script obfuscation
Request encryption with integrity control
Example:
let req_obj = {
user: 'SuperGoat31',
score: 123456,
sessionId: '4d2NhIgMWDuzarfAY0qT3g8U2ax4HCo7',
};
req_obj.hash = someCustomHashFunc(JSON.stringify(req_obj));
// now, req_obj.hash = "y0UXBY0rYkxMrJJPdoSgypd"
let req_string = "https://www.example.com/api/cmd?name=" +
req_obj.user +
"&data=" +
Buffer.from(JSON.stringify(req_obj)).toString('base64');
// now, your requests will look like that:
"https://www.example.com/api/cmd?name=SuperGoat31&data=eyJ1c2VyIjoiU3VwZXJHb2F0MzEiLCJzY29yZSI6MTIzNDU2LCJzZXNzaW9uSWQiOiI0ZDJOaElnTVdEdXphcmZBWTBxVDNnOFUyYXg0SENvNyIsImhhc2giOiJ5MFVYQlkwcllreE1ySkpQZG9TZ3lwZCJ9"
For casual players, this allows start playing very quickly, as no explicit registration is required. Upon generation, token might be saved as cookie for repetitive use, but this is not necessary, single-time use would also suffice. No personal info gathered.
However, if short-term storage of some client information is an option, the token might be not just some random bytes, but an encrypted string, containing some parameters, such as random salt + IP address + nickname + agent id + etc. In this case you may start silently ignore certain requests from fraudulent clients upon detection.
Obviously, this would be very easy to crack for a professional, but this is not our goal. When such simple methods are mixed with several kilobytes of logic of the game and obfuscated, figuring out how to deal with it would require significant amount of knowledge and time, which might serve as a sufficient barrier.
As it is all about balance between convenience and protection, you may implement some additional scoring logic to detect cheating attempts, like final score cannot end with '0', or cannot be even, etc. This would allow you to count cheating attempts (in addition to counting forged requests) and then estimate efficiency of implemented combination of methods.
Your list of solutions are mostly mitigations, and they are good ideas if they are your only tools. The list seems pretty exhaustive.
2 major ways to actually solve this problem are:
Remove the incentive of cheating. There's no point submitting a fake score if you are the only person who can see the score. Think about the purpose of why you even want a global high-score list. Maybe there's another way you can reach your objective that makes it uninteresting (or undesirable) to cheat.
Have the server completely manage (or duplicate) the game state. You can't cheat if the server calculates the score. For example, if you're modelling a chess game the server can compute every valid move, preventing clients from submitting moves that wouldn't be possible.
It's possible that for your specific case neither are possible, but if you can't adopt either of these strategies you are stuck to imperfect detection mechanisms.
I suspect that a perfect solution will be elusive because two of
your wishes are, perhaps, contradictory:
"You need to post the scores of users to a database" but... "prevent
the most obvious of attacks" without "Some form of authentication."
The most obvious of attacks are those from users without some form
of authentication.
You wish this system to work without placing an undue burden on
your users. You wish to avoid the usual login and password
authentication which can be cumbersome for users.
I think there is a way to accomplish what you want by creating a
very simple form of authentication by the use of a one-time token
based protection. And I would also incorporate IP tracking against
abuse. In other words, let's combine your options 1 and 2 and 3 in
the following way.
You already have implied that you will maintain a database, and that
within the database, user names will be unique (otherwise you couldn't
record unique high scores). Let people sign up freely by submitting
their requested user name, which you'll accept if not already used
by someone prior. Track the sign-up requests by IP address to detect
and prevent abuse: too many sign-ups from one IP address within a given timeframe. So far, the burden is all at the server end, not on the user.
When you process a valid sign-up (i.e. new user name) into the
database, you will also generate, record into the database, and return to the user a shared secret (a token) that will be used by the
Time-based One-time Password (TOTP) algorithm.
Don't reinvent this.
See:
Time-based One-Time Password
FreeOTP
OneTimePass
When you return a token to the user, it will be in the form of a "QR Code"
QR code
which the user will scan and store with his "Google Authenticator" or
equivalent TOTP application.
When the user returns to your web site to update his high score, he
will authenticate himself using his Google Authenticator" or
equivalent TOTP application. These are often used for "second factor"
authentication, 2FA (Multi-factor authentication), but because
of your need for less strict security, you'll be using the TOTP
authentication as the primary and only form of authentication.
So we have combined a form of authentication which doesn't place a
very high burden on the user (apps already widely available and in
use), with one-time token based protection (provided by the TOTP
app) and a little bit of IP address-based abuse protection for the initial sign-ups.
On of the weaknesses of my proposal is that a user may share his
TOTP token with another person, who may then impersonate him. But this
is no different from the risk of password sharing. And there will
be no "recover my lost password" option.
I would tackle this in a slightly different way: usernames/gamertags. Depending on how frequently you find gamertags and usernames sharing the same IP. So if you only accept a maximum of, say, 5 gamertags per IP, and you also throttle the frequency of updates per gamertag, you have a fairly spam-resistant system.
I would recommend a mix of code obfuscation and using web sockets to request the score, rather than post the score. Something like socket.io (https://socket.io/) where the server sends a request with a code in it and your game responds with the score and that code changed in some way.
Obviously a hacker could look through your code for how your game responds to requests and rewrite it, which is where the obfuscation is important, but it does at least hide the obvious network traffic and prevents them posting scores whenever they feel like it.
I would suggest using reCAPTCHA V2.
Admittedly, v3 provides better protection, but it is hard to implement, so go with v2.
Come on, it is just a few lines of code.
How it should work (according to me):
You are at the main page willing to play the game
You solve the reCAPTCHA
Then the app sends a one-time token with a script tag which establishes a websocket request with your server (using socket.io) with the one-time token and then it is destroyed immediately (from the server as well as the client) after establishment of a connection
Your server validates the token and accepts the request of websocket and then it will send the HTML content
Just create a div and set the value using obj.innerHTML
You can use styles in body (I guess)
And the most important point is obfuscating your code.
Security
Websockets are harder to reverse engineer in a test environment
Even if they create a web socket, it won't respond, because they don't know the one-time token
It prevents script blocking (as the script loads everything on the page)
It provides real-time communication
The only way out is to somehow get your hands on Google's reCAPTCHA token which is impossible, because it means going against Google
You can’t reuse any token (however immediate it be), because it was destroyed from both the sides
One more last tip: set a timeout for the one-time token to about 15 seconds
How will it help? It will prevent someone (extremely malicious) from pausing the Chrome debugger and get the token and put it in their stuff as 15 seconds is ok for slow networks also, but not a human

Short message encryption with only javascript to generate it in a URL

I'd like to present an idea to you that I think might help the privacy of the average user. I would appreciate any comment or suggestion on this.
I've been struggling for quite some time now with the need for a simple tool that I could share and use with my contacts who are only average users and not familiar at all with any cryptographic technology or the current tools available.
I'm planning to create a solution where one can easily encrypt a text message or a file with a single password and send it in email or chat or through whatever channel to somebody else. The solution should be entirely platform independent and usable without the need to install any extra softwares.
There are some text encryption websites out there that run client side encryption from JavaScript entirely. I find this approach currently the only possible solution. Also, there are libs for JS that already implement encryption:
http://crypto.stanford.edu/sjcl/
http://code.google.com/p/crypto-js/
Though the mentioned approaches store the message on their server, requiring you and your contact to trust it entirely. Because the server might present a different JS code to the user when visiting it after he gets the message by steeling the password and so revealing the secret.
While many think that it's not a good idea to do anything regarding cryptographic tasks in JS, I believe there is a need for a tool that is really platform independent (can be used on any tablet or PC) and still incredibly easy to use. The idea behind this is that I believe something is better than nothing. Sending information in plain text in email for decades with our current technology is wrong in most cases. There are times when we do need to share sensitive info via email and the other side might have any kind of system.
I intend to avoid the use of public key cryptography for the following reasons:
- it is very complicated to setup including the signing of each others' keys
- complicated to use it
- the user can loose his keys
- most of the time it needs and external software to be used and installed too
- a single password can be easily shared personally one time with my contact and he or she can keep it written on a paper wherever
The solution I came up with could be the following:
First of all, the browser and the operating system under it should be considered trusted.
There would be a static index.html page with embedded JavaScript. The page shows a textarea for the message and a textbox for the password. When hitting enter, the JS code generates a URL that itself will contain the encrypted message in base64 encoding. After digging I figured that 2000 bytes can be used for URLs just fine in every cases, so 1600 or 800 characters could be enough for short messages. This still needs planning.
So the encrypted message would travel with the URL. The website serving the index.html would of course use SSL with a valid certificate. While it seems an easy taks, of course it is not. The JS implementation should be carefully created to avoid easy attacks on it.
(URL shortener services could be used for it too).
Also, the question stands: How can I make sure that my contact can be certain about the origin of my message?
Well, the other side has to check if the domain is correct. Beside this, the implementation must avoid the rest of the attacks. If the URL gets changed during the travel of the email, then maximum the other side won't be able to decode the message with the password. That's what I believe. That it can be implemented this way.
About the file sharing. The solution should have a possibility to browse for a file, then encrypt it, then put it out for download to the user. This is just for him to be able to create the encrypted form of the file without the need for external tools. Then he could upload it to the cloud of his choice wherever (Google drive, Skydrive etc) and use that link in the URL of the JS solution to send it to his contact.
So if another link travels with the link, then the file from the remote host gets downloaded, decrypted and sent for download. All in his browser. If it's an encrypted message in base64 form, then it gets printed on the page after decryption (by the user providing his password of course).
Pros compared to other solutions:
- no need to implement a storage because no message nor file will be stored on the server, so the big players' services could be used
- therefore no need to reimplement the wheel regarding the storage question
- no need to trust a 3rd party because the server could easily be ours because it would be extremely easy to set up and serve it
- easy with even a free provider to host the static index.html
- because of its simplicity, the server can be hardened much better
- easy to encrypt with it in practice
- if one needs it, he could use the index.html by clicking on it from his desktop too, but that's not part of the original idea
My questions to you all are:
Do you find any flaw in my theory above? Could this really serve the average people by providing a usable tool for them that is more than nothing in times when they do need to send sensitive info to others?
Or does anything like that exist yet? Are there any better approaches? Different technology maybe?
Thank You.

Hide urls in html/javascript file

I am using ajax in my website and in order to use the ajax, I habe to write the name of the file for example:
id = "123";
$.getJSON(jquerygetevent.php?id=" + id, function(json)
{
//do something
});
how can I protect the url? I dont want people to see it and use it...
that is a limitation of using client side scripts. there is no real way to obfuscate it from the user there are many ways to make it less readable (minify etc) but in the end an end-user can still view the code
Hi Ron and welcome to the internet. The internet was (to quote Wikipedia on the subject)
The origins of the Internet reach back to research of the 1960s, commissioned by the United States government in collaboration with private commercial interests to build robust, fault-tolerant, and distributed computer networks. The funding of a new U.S. backbone by the National Science Foundation in the 1980s, as well as private funding for other commercial backbones, led to worldwide participation in the development of new networking technologies, and the merger of many networks. The commercialization of what was by the 1990s an international network resulted in its popularization and incorporation into virtually every aspect of modern human life.
Because of these origins, and because of the way that the protocols surrounding HTTP resource identification (like for URLs) there's not really any way to prevent this. Had the internet been developed as a commercial venture initially (think AOL) then they might have been able to get away with preventing the browser from showing the new URL to the user.
So long as people can "view source" they can see the URLs in the page that you're referring them to visit. The best you can do is to obfuscate the links using javascript, but at best that's merely an annoyance. What can be decoded for the user can be decoded for a bot.
Welcome to the internet, may your stay be a long one!
I think the underlying issue is why you want to hide the URL. As everyone has noted, there is no way to solve the actual resolved URL. Once it is triggered, FireBug gives you everything you need to know.
However, is the purpose to prevent a user from re-using the URL? Perhaps you can generate one-time, session-relative URLs that can only be used in the given HTTP Session. If you cut/paste this URL to someone else, they would be unable to use it. You could also set it to expire if they tried to Refresh. This is done all the time.
Is the purpose to prevent the user from hacking your URL by providing a different query parameter? Well, you should be handling that on the server side anyways, checking if the user is authorized. Even before activating the link, the user can use a tool like FireBug to edit your client side code as much as they want. I've done this several times to live sites when they're not functioning the way I want :)
UPDATE: A HORRIBLE hack would be to drop an invisible Java Applet on the page. They can also trigger requests and interact with Javascript. Any logic could be included in the Applet code, which would be invisible to the user. This, however, introduces additional browser compatibility issues, etc, but can be done. I'm not sure if this would show up in Firebug. A user could still monitor outgoing traffic, but it might be less obvious. It would be better to make your server side more robust.
Why not put some form of security on your php script instead, check a session variable or something like that?
EDIT is response to comment:
I think you've maybe got the cart before the horse somehow. URLs are by nature public addresses for resources. If the resource shouldn't be publicly consumable except in specific instances (i.e. from within your page) then it's a question of defining and implementing security for the resource. In your case, if you only want the resource called once, then why not place a single use access key into the calling page? Then the resource will only be delivered when the page is refreshed. I'm unsure as to why you'd want to do this though, does the resource expose sensitive information? Is it perhaps very heavy on the server to run the script? And if the resource should only be used to render the page once, rather than update it once it's rendered, would it perhaps be better to implement it serverside?
you can protect (hide) anything on client, just encrypt/encode it into complicated format to real human

Dual login: One login, 2 servers

Okay, this just feels plain nasty, but I've been directed to do it, and just wanted to run it past some people who actually have a clue, so they can point out all the massive holes in it.....so here goes.....
We've got this legacy site & a new public beta-test one. Apparently it's super cereal that moving from one to the other is seamless, so in a manner of speaking, we need a single signon solution.
As we're not allowed to put any serious development into the legacy site (It's also in old school ASP, a language I don't care to learn.) I can't do a proper single sign-on solution, so I proposed the following: On login, the legacy site performs an AJAX post to the login controller of the new beta site, logging the user in there, it then simply proceeds with the login on the legacy site as normal. This may not be acceptable as there's code to prevent a user from being logged on twice, I'm not sure if it's been written to apply across sites.
The other idea I had was to pass a salted hash of the user's details across with their username when they try to access the 2nd site. If the hash matches the details of the user, then access is granted. This would need ASP development obviously as generating the hash on the client side would only serve to enhance the idiocy even further.
Does anyone have any thoughts?
The old ASP site must have some concept of a session if it requires a logon. You will, at a minimum, need to understand how to provide the session information to the legacy site and splice some code in to keep it copacetic if both sites need to be kept up indefinitely.
"Classic" ASP isn't so bad if you can read/write VB6, VBA, VBScript or VB.net. It probably won't be difficult to graft session initialization provided the code is half way decent.
Consider creating a common logon page for both sites + either an automatic redirect based on either the requested URL (I'm guessing the old and new sites have distinct URLs) or cookies passed with the request (the old site, if it used cookies, could identify a legacy user). This common logon page could initialize session on both the legacy site (only if required by user type) and on the new site. This will allow you to keep your new logon process unencumbered by the legacy process while maintaining the old as long as required.
Bear in mind that your first approach (AJAX request from one site to the other) won't work if the sites are on different domains, because of javascript security restrictions.
You might be able to work around this by using a hidden iframe for the post like this, but it's getting a little hacky.

How do I uniquely identify computers visiting my web site?

I need to figure out a way uniquely identify each computer which visits the web site I am creating. Does anybody have any advice on how to achieve this?
Because i want the solution to work on all machines and all browsers (within reason) I am trying to create a solution using javascript.
Cookies will not do.
I need the ability to basically create a guid which is unique to a computer and repeatable, assuming no hardware changes have happened to the computer. Directions i am thinking of are getting the MAC of the network card and other information of this nature which will id the machine visiting the web site.
Introduction
I don't know if there is or ever will be a way to uniquely identify machines using a browser alone. The main reasons are:
You will need to save data on the users computer. This data can be
deleted by the user any time. Unless you have a way to recreate this
data which is unique for each and every machine then your stuck.
Validation. You need to guard against spoofing, session hijacking, etc.
Even if there are ways to track a computer without using cookies there will always be a way to bypass it and software that will do this automatically. If you really need to track something based on a computer you will have to write a native application (Apple Store / Android Store / Windows Program / etc).
I might not be able to give you an answer to the question you asked but I can show you how to implement session tracking. With session tracking you try to track the browsing session instead of the computer visiting your site. By tracking the session, your database schema will look like this:
sesssion:
sessionID: string
// Global session data goes here
computers: [{
BrowserID: string
ComputerID: string
FingerprintID: string
userID: string
authToken: string
ipAddresses: ["203.525....", "203.525...", ...]
// Computer session data goes here
}, ...]
Advantages of session based tracking:
For logged in users, you can always generate the same session id from the users username / password / email.
You can still track guest users using sessionID.
Even if several people use the same computer (ie cybercafe) you can track them separately if they log in.
Disadvantages of session based tracking:
Sessions are browser based and not computer based. If a user uses 2 different browsers it will result in 2 different sessions. If this is a problem you can stop reading here.
Sessions expire if user is not logged in. If a user is not logged in, then they will use a guest session which will be invalidated if user deletes cookies and browser cache.
Implementation
There are many ways of implementing this. I don't think I can cover them all I'll just list my favorite which would make this an opinionated answer. Bear that in mind.
Basics
I will track the session by using what is known as a forever cookie. This is data which will automagically recreate itself even if the user deletes his cookies or updates his browser. It will not however survive the user deleting both their cookies and their browsing cache.
To implement this I will use the browsers caching mechanism (RFC), WebStorage API (MDN) and browser cookies (RFC, Google Analytics).
Legal
In order to utilize tracking ids you need to add them to both your privacy policy and your terms of use preferably under the sub-heading Tracking. We will use the following keys on both document.cookie and window.localStorage:
_ga: Google Analytics data
__utma: Google Analytics tracking cookie
sid: SessionID
Make sure you include links to your Privacy policy and terms of use on all pages that use tracking.
Where do I store my session data?
You can either store your session data in your website database or on the users computer. Since I normally work on smaller sites (let than 10 thousand continuous connections) that use 3rd party applications (Google Analytics / Clicky / etc) it's best for me to store data on clients computer. This has the following advantages:
No database lookup / overhead / load / latency / space / etc.
User can delete their data whenever they want without the need to write me annoying emails.
and disadvantages:
Data has to be encrypted / decrypted and signed / verified which creates cpu overhead on client (not so bad) and server (bah!).
Data is deleted when user deletes their cookies and cache. (this is what I want really)
Data is unavailable for analytics when users go off-line. (analytics for currently browsing users only)
UUIDS
BrowserID: Unique id generated from the browsers user agent string. Browser|BrowserVersion|OS|OSVersion|Processor|MozzilaMajorVersion|GeckoMajorVersion
ComputerID: Generated from users IP Address and HTTPS session key.
getISP(requestIP)|getHTTPSClientKey()
FingerPrintID: JavaScript based fingerprinting based on a modified fingerprint.js. FingerPrint.get()
SessionID: Random key generated when user 1st visits site. BrowserID|ComputerID|randombytes(256)
GoogleID: Generated from __utma cookie. getCookie(__utma).uniqueid
Mechanism
The other day I was watching the wendy williams show with my girlfriend and was completely horrified when the host advised her viewers to delete their browser history at least once a month. Deleting browser history normally has the following effects:
Deletes history of visited websites.
Deletes cookies and window.localStorage (aww man).
Most modern browsers make this option readily available but fear not friends. For there is a solution. The browser has a caching mechanism to store scripts / images and other things. Usually even if we delete our history, this browser cache still remains. All we need is a way to store our data here. There are 2 methods of doing this. The better one is to use a SVG image and store our data inside its tags. This way data can still be extracted even if JavaScript is disabled using flash. However since that is a bit complicated I will demonstrate the other approach which uses JSONP (Wikipedia)
example.com/assets/js/tracking.js (actually tracking.php)
var now = new Date();
var window.__sid = "SessionID"; // Server generated
setCookie("sid", window.__sid, now.setFullYear(now.getFullYear() + 1, now.getMonth(), now.getDate() - 1));
if( "localStorage" in window ) {
window.localStorage.setItem("sid", window.__sid);
}
Now we can get our session key any time:
window.__sid || window.localStorage.getItem("sid") || getCookie("sid") || ""
How do I make tracking.js stick in browser?
We can achieve this using Cache-Control, Last-Modified and ETag HTTP headers. We can use the SessionID as value for etag header:
setHeaders({
"ETag": SessionID,
"Last-Modified": new Date(0).toUTCString(),
"Cache-Control": "private, max-age=31536000, s-max-age=31536000, must-revalidate"
})
Last-Modified header tells the browser that this file is basically never modified. Cache-Control tells proxies and gateways not to cache the document but tells the browser to cache it for 1 year.
The next time the browser requests the document, it will send If-Modified-Since and If-None-Match headers. We can use these to return a 304 Not Modified response.
example.com/assets/js/tracking.php
$sid = getHeader("If-None-Match") ?: getHeader("if-none-match") ?: getHeader("IF-NONE-MATCH") ?: "";
$ifModifiedSince = hasHeader("If-Modified-Since") ?: hasHeader("if-modified-since") ?: hasHeader("IF-MODIFIED-SINCE");
if( validateSession($sid) ) {
if( sessionExists($sid) ) {
continueSession($sid);
send304();
} else {
startSession($sid);
send304();
}
} else if( $ifModifiedSince ) {
send304();
} else {
startSession();
send200();
}
Now every time the browser requests tracking.js our server will respond with a 304 Not Modified result and force an execute of the local copy of tracking.js.
I still don't understand. Explain it to me
Lets suppose the user clears their browsing history and refreshes the page. The only thing left on the users computer is a copy of tracking.js in browser cache. When the browser requests tracking.js it recieves a 304 Not Modified response which causes it to execute the 1st version of tracking.js it recieved. tracking.js executes and restores the SessionID that was deleted.
Validation
Suppose Haxor X steals our customers cookies while they are still logged in. How do we protect them? Cryptography and Browser fingerprinting to the rescue. Remember our original definition for SessionID was:
BrowserID|ComputerID|randomBytes(256)
We can change this to:
Timestamp|BrowserID|ComputerID|encrypt(randomBytes(256), hk)|sign(Timestamp|BrowserID|ComputerID|randomBytes(256), hk)
Where hk = sign(Timestamp|BrowserID|ComputerID, serverKey).
Now we can validate our SessionID using the following algorithm:
if( getTimestamp($sid) is older than 1 year ) return false;
if( getBrowserID($sid) !== createBrowserID($_Request, $_Server) ) return false;
if( getComputerID($sid) !== createComputerID($_Request, $_Server) return false;
$hk = sign(getTimestamp($sid) + getBrowserID($sid) + getComputerID($sid), $SERVER["key"]);
if( !verify(getTimestamp($sid) + getBrowserID($sid) + getComputerID($sid) + decrypt(getRandomBytes($sid), hk), getSignature($sid), $hk) ) return false;
return true;
Now in order for Haxor's attack to work they must:
Have same ComputerID. That means they have to have the same ISP provider as victim (Tricky). This will give our victim the opportunity to take legal action in their own country. Haxor must also obtain HTTPS session key from victim (Hard).
Have same BrowserID. Anyone can spoof User-Agent string (Annoying).
Be able to create their own fake SessionID (Very Hard). Volume atacks won't work because we use a time-stamp to generate encryption / signing key so basically its like generating a new key for each session. On top of that we encrypt random bytes so a simple dictionary attack is also out of the question.
We can improve validation by forwarding GoogleID and FingerprintID (via ajax or hidden fields) and matching against those.
if( GoogleID != getStoredGoodleID($sid) ) return false;
if( byte_difference(FingerPrintID, getStoredFingerprint($sid) > 10%) return false;
These people have developed a fingerprinting method for recognising a user with a high level of accuracy:
https://panopticlick.eff.org/static/browser-uniqueness.pdf
We investigate the degree to which modern web browsers
are subject to “device fingerprinting” via the version and configuration information that they will transmit to websites upon request. We
implemented one possible fingerprinting algorithm, and collected these
fingerprints from a large sample of browsers that visited our test side,
panopticlick.eff.org. We observe that the distribution of our finger-
print contains at least 18.1 bits of entropy, meaning that if we pick a
browser at random, at best we expect that only one in 286,777 other
browsers will share its fingerprint. Among browsers that support Flash
or Java, the situation is worse, with the average browser carrying at least
18.8 bits of identifying information. 94.2% of browsers with Flash or Java
were unique in our sample.
By observing returning visitors, we estimate how rapidly browser fingerprints might change over time. In our sample, fingerprints changed quite
rapidly, but even a simple heuristic was usually able to guess when a fingerprint was an “upgraded” version of a previously observed browser’s
fingerprint, with 99.1% of guesses correct and a false positive rate of only
0.86%.
We discuss what privacy threat browser fingerprinting poses in practice,
and what countermeasures may be appropriate to prevent it. There is a
tradeoff between protection against fingerprintability and certain kinds of
debuggability, which in current browsers is weighted heavily against privacy. Paradoxically, anti-fingerprinting privacy technologies can be self-
defeating if they are not used by a sufficient number of people; we show
that some privacy measures currently fall victim to this paradox, but
others do not.
It's not possible to identify the computers accessing a web site without the cooperation of their owners. If they let you, however, you can store a cookie to identify the machine when it visits your site again. The key is, the visitor is in control; they can remove the cookie and appear as a new visitor any time they wish.
A possibility is using flash cookies:
Ubiquitous availability (95 percent of visitors will probably have flash)
You can store more data per cookie (up to 100 KB)
Shared across browsers, so more likely to uniquely identify a machine
Clearing the browser cookies does not remove the flash cookies.
You'll need to build a small (hidden) flash movie to read and write them.
Whatever route you pick, make sure your users opt IN to being tracked, otherwise you're invading their privacy and become one of the bad guys.
There is a popular method called canvas fingerprinting, described in this scientific article: The Web Never Forgets:
Persistent Tracking Mechanisms in the Wild. Once you start looking for it, you'll be surprised how frequently it is used. The method creates a unique fingerprint, which is consistent for each browser/hardware combination.
The article also reviews other persistent tracking methods, like evercookies, respawning http and Flash cookies, and cookie syncing.
More info about canvas fingerprinting here:
Pixel Perfect: Fingerprinting Canvas in HTML5
https://en.wikipedia.org/wiki/Canvas_fingerprinting
You may want to try setting a unique ID in an evercookie (it will work cross browser, see their FAQs):
http://samy.pl/evercookie/
There is also a company called ThreatMetrix that is used by a lot of big companies to solve this problem:
http://threatmetrix.com/our-solutions/solutions-by-product/trustdefender-id/
They are quite expensive and some of their other products aren't very good, but their device id works well.
Finally, there is this open source jquery implementation of the panopticlick idea:
https://github.com/carlo/jquery-browser-fingerprint
It looks pretty half baked right now but could be expanded upon.
Hope it helps!
There is only a small amount of information that you can get via an HTTP connection.
IP - But as others have said, this is not fixed for many, if not most Internet users due to their ISP's dynamic allocation policies.
Useragent String - Nearly all browsers send what kind of browser they are with every request. However, this can be set by the user in many browsers today.
Collection of request fields - There are other fields sent with each request, such as supported encodings, etc. These, if used in the aggregate can help to ID a user's machine, but again are browser dependent and can be changed.
Cookies - Setting a cookie is another way to identify a machine, or more specifically a browser on a machine, but as others have said, these can be deleted, or turned off by the users, and are only applicable on a browser, not a machine.
So, the correct response is that you cannot achieve what you would live via the HTTP over IP protocols alone. However, using a combination of cookies, as well as IP, and the fields in the HTTP request, you have a good chance at guessing, sort of, what machine it is. Users tend to use only one browser, and often from one machine, so this may be fairly relieable, but this will vary depending on the audience...techies are more likely to mess with this stuff, and use more machines/browsers. Additionally, this could even be coupled with some attempt to geo-locate the IP, and use that data as well. But in any case, there is no solution that will be correct all of the time.
There are flaws with both cookie and non-cookie approaches. But if you can forgive the shortcomings of the cookie approach, here's an idea.
If you're already using Google Analytics on your site, then you don't need to write code to track unique users yourself. Google Analytics does that for you via the __utma cookie value, as described in Google's documentation. And by reusing this value you're not creating additional cookie payload, which has efficiency benefits with page requests.
And you could write some code easily enough to access that value, or use this script's getUniqueId() function.
As with the previous solutions cookies are a good method, be aware that they identify browsers though. If I visited a website in Firefox and then in Internet Explorer cookies would be stored for both attempts seperately. Some users also disable cookies (but more people disable JavaScript).
Another method to consider would be I.P. and hostname identification (be aware these can vary for dial-up/non-static IP users, AOL also uses blanket IPs). However since this only identifies networks this might not work as well as cookies.
The suggestions to use cookies aside, the only comprehensive set of identifying attributes available to interrogate are contained in the HTTP request header. So it is possible to use some subset of these to create a pseudo-unique identifier for a user agent (i.e., browser). Further, most of this information is possibly already being logged in the so-called "access log" of your web server software by default and, if not, can be easily configured to do so. Then, a utlity could be developed that simply scans the content of this log, creating fingerprints of each request comprised of, say, the IP address and User Agent string, etc. The more data available, even including the contents of specific cookies, adds to the quality of the uniqueness of this fingerprint. Though, as many others have stated already, the HTTP protocol doesn't make this 100% foolproof - at best it can only be a fairly good indicator.
When i use a machine which has never visited my online banking web site i get asked for additional authentification. then, if i go back a second time to the online banking site i dont get asked the additional authentification...i deleted all cookies in IE and relogged onto my online banking site fully expecting to be asked the authentification questions again. to my surprise i was not asked. doesnt this lead one to believe the bank is doing some kind of pc tagging which doesnt involve cookies?
This is a pretty common type of authentication used by banks.
Say you're accessing your bank website via example-isp.com. The first time you're there, you'll be asked for your password, as well as additional authentication. Once you've passed, the bank knows that user "thatisvaliant" is authenticated to access the site via example-isp.com.
In the future, it won't ask for extra authentication (beyond your password) when you're accessing the site via example-isp.com. If you try to access the bank via another-isp.com, the bank will go through the same routine again.
So to summarize, what the bank's identifying is your ISP and/or netblock, based on your IP address. Obviously not every user at your ISP is you, which is why the bank still asks you for your password.
Have you ever had a credit card company call to verify that things are OK when you use a credit card in a different country? Same concept.
Really, what you want to do cannot be done because the protocols do not allow for this. If static IPs were universally used then you might be able to do it. They are not, so you cannot.
If you really want to identify people, have them log in.
Since they will probably be moving around to different pages on your web site, you need a way to keep track of them as they move about.
So long as they are logged in, and you are tracking their session within your site via cookies/link-parameters/beacons/whatever, you can be pretty sure that they are using the same computer during that time.
Ultimately, it is incorrect to say this tells you which computer they are using if your users are not using your own local network and do not have static IP addresses.
If what you want to do is being done with the cooperation of the users and there is only one user per cookie and they use a single web browser, just use a cookie.
You can use fingerprintjs2
new Fingerprint2().get(function(result, components) {
console.log(result) // a hash, representing your device fingerprint
console.log(components) // an array of FP components
//submit hash and JSON object to the server
})
After that you can check all your users against existing and check JSON similarity, so even if their fingerprint mutates, you still can track them
Because i want the solution to work on all machines and all browsers (within reason) I am trying to create a solution using javascript.
Isn't that a really good reason not to use javascript?
As others have said - cookies are probably your best option - just be aware of the limitations.
I guess the verdict is i cannot programmatically uniquely identify a computer which is visiting my web site.
I have the following question. When i use a machine which has never visited my online banking web site i get asked for additional authentification. then, if i go back a second time to the online banking site i dont get asked the additional authentification. reading the answers to my question i decided it must be a cookie involved. therefore, i deleted all cookies in IE and relogged onto my online banking site fully expecting to be asked the authentification questions again. to my surprise i was not asked. doesnt this lead one to believe the bank is doing some kind of pc tagging which doesnt involve cookies?
further, after much googling today i found the following company who claims to sell a solution which does uniquely identify machines which visit a web site. http://www.the41.com/products.asp.
i appreciate all the good information if you could clarify further this conflicting information i found i would greatly appreciate it.
I would do this using a combination of cookies and flash cookies. Create a GUID and store it in a cookie. If the cookie doesn't exist, try to read it from the flash cookie. If it's still not found, create it and write it to the flash cookie. This way you can share the same GUID across browsers.
I think cookies might be what you are looking for; this is how most websites uniquely identify visitors.
Cookies won't be useful for determining unique visitors. A user could clear cookies and refresh the site - he then is classed as a new user again.
I think that the best way to go about doing this is to implement a server side solution (as you will need somewhere to store your data). Depending on the complexity of your needs for such data, you will need to determine what is classed as a unique visit. A sensible method would be to allow an IP address to return the following day and be given a unique visit. Several visits from one IP address in one day shouldn't be counted as uniques.
Using PHP, for example, it is trivial to get the IP address of a visitor, and store it in a text file (or a sql database).
A server side solution will work on all machines, because you are going to track the user when he first loads up your site. Don't use javascript, as that is meant for client side scripting, plus the user may have disabled it in any case.
Hope that helps.
I will give my ideas starting from simpler to more complex.
In all the above you can create sessions and the problem essentialy translates to match session with request.
a) (difficulty: easy) use client hardware to store explicitely a session id/hash of some sort (there are quite some privace/security issues so make sure you hash anything you store ), solutions include:
cookies storage
browser storage/webDB/ (more exotic browser solutions )
extensions with permission to store things in files.
The above suffer from the fact the the user can just empty his cache in case he doesn want.
b) (difficulty: medium) Login based authentication.
Most modern web frameworks provide such solution the core idea is you let the user voluntarily identify himself, quite straghtforward but adds complexity in the architecture.
The above suffer from additional complexity and making essentially non public content.
c)(difficulty: hard -R&D) Identification based on metadata, (browser ip/language /browser / and other privace invasice stuff so make sure you let your users know or you miay get sued )
non perfect solution can get more complicated (a user typing with specific frequency or using mouse with specific patterns ? you even apply ML solutions ).
The claimed solutions
The most powerful since the user even without wanting explicitely he can be identified. It is straight invasion of privacy(see GDPR) and not perfect eg. ip can change .
Assuming you don't want the user to be in control, you can't. The web doesn't work like that, the best you can hope for is some heuristics.
If it is an option to force your visitor to install some software and use TCPA you may be able to pull something off.
My post might not be a solution, but I can provide an example, where this feature has been implemented.
If you visit the signup page of www.supertorrents.org for the first time from you computer, it's fine. But if you refresh the page or open the page again, it identifies you've previously visited the page. The real beauty comes here - it identifies even if you re-install Windows or other OS.
I read somewhere that they store the CPU ID. Although I couldn't find how do they do it, I seriously doubt it, and they might use MAC Address to do it.
I'll definitely share if I find how to do it.
A Trick:
Create 2 Registration Pages:
First Registration Page: without any email or security check (just with username and password)
Second Registration Page: with high security level (email verification request and security image and etc.)
For customer satisfaction, and easy registration, default
registration page should be the (First Registration Page) but in the
(First Registration Page) there is a hidden restriction. It's IP
Restriction. If an IP tried to register for second time, (for example less than 1 hour) instead of
showing the block page. you can show the (Second Registration Page)
automatically.
in the (First Registration Page) you can set (for example: block 2
attempts from 1 ip for just 1 hour or 24 hours) and after (for example) 1 hour, you can open access from that ip automatically
Please note: (First Registration Page) and (Second Registration Page) should not be in separated pages. you make just 1 page. (for example: register.php) and make it smart to switch between First PHP Style and Second PHP Style

Categories