I'm writing an odd kind of webapp that is designed to communicate with other sites loaded into the browser. This is fairly trivial to do via MessageChannels. Now, these applications must access protected resources and must get authorization from the user. Using something that would allow these other applications to communicate with a server (such as OAuth) is not an option since there is no server: Support for P2P and E2EE connections is required. One idea I had for limiting access was to use Symbols: For example, if a message is sent that references a resource, it may contain the Symbol for that resource. Then, if the other application wants that resource, it can retrieve it by using that Symbol.
The problem is that I'm not sure this is secure. Is there any way to deliberately create a Symbol that is not unique? If so, this could potentially be used as an attack vector in my webapp: A malicious "client" application could just keep guessing Symbols until it finds one that corresponds to something useful.
Also, if there's a better way of doing this or you see any other issues with it, feel free to let me know ;)
EDIT: To clarify: Application A creates a Symbol to give to Application B. With this Symbol, Application B can access certain resources (files, objects, etc.) which are sent back to it. Is there any way for Application C to get access to a Symbol that is equivalent to the one given to Application B without actually being given the Symbol from either Application A or B?
EDIT: Confirmed that Symbols can, in no way, be transferred across MessageChannels. So there's little point to the question...
Symbols cannot be transferred through MessageChannels. Tested in Firefox.
Related
Today, I was starting to build a trading tool. Luckily, TradingView.com gives you free access, where you can embed charts legally on your website via iframes.
I planned to select a stock symbol within the iframe (see picture: left corner "BTCUSDT") and read the chosen symbol name via javascript. For example, when I choose "BTCUSDT" (or Bitcoin) I want to fetch this value, so I know which cryptocurrency I want to order by using another API service. I found out that "for security reasons" this is not possible.
However, in the picture you can see that iframe elements can be easily accessed by hand. So, why can't we read them out from javascript as well? What kind of (effective) security breach would that be?
Well... I can understand that some people might use this for evil phishing purposes but will this effectively stop them? They might use a proxy or other workaround for that. On the other hand, only reading a simple value for a tool meant to be embedded cross-site makes it more complicated than it should be.
Python has a beautiful library called "BeautifulSoup". Here you can enter a URL, and it reads all the DOM elements from a website. I don't understand how this is possible in python but restricted in javascript.
I have found no reasonable answer or solution for these kind of scenarios. If this is meant for higher security, there are many ways to read out those values other than relying on javascript. So why restrict them?
I am working on a React-based web app that uses Tensorflow.js to run an AI model in realtime on the client in the browser. I've trained this AI model from scratch and I'd like to protect it from being intercepted and used in other projects. Are there any protections available to do this (obfuscation, DRM, etc.)?
From a business perspective, I'd only like the model to work on my web app, nowhere else.
The discussions (1 2 3) I've been able to find on this are more geared toward native apps, not web apps.
Here is an example open source web app that uses Tensorflow.js. These weights are an example of what I would like to protect in my app.
Client-side code obfuscation will never fully prevent it. Use a server instead.
Obfuscation
If your client-side application contains the model, then the user will be able to somehow extract it. You can make it harder for the user, but it will always be possible. Some techniques to make it harder are:
Obfuscating your code: That way the user will not be able to read your code and comments easily. Depending on your build tools, this might already be done for you when you produce a "production ready" build.
Obfuscating the library and its public API: Even if your code is obfuscated, the user might still be able to guess what is going on by seeing the public API calls of the library. Example: It would be rather easy to set a break point at the model.predict function and debug your code from there on. By also obfuscating libraries and their API, this will become harder.
Put "special checks" in your code: You could also check if the page the code is running on is your page (e.g. if the domain matches), etc. You also want to obfuscate this code as well.
Even if your code is perfectly obfuscated and well protected, your client-side code still contains your model somewhere. With these methods it will always be possible to somehow extract your model.
Server-side approach
To make it impossible to get your model, you need a different approach. Only put your "dumb logic" on the client. Exclude the part of code that you want to protect. Instead you offer a API on your server that executes the "protected part" of your code.
This way, instead of running model.predict on the client-side, you would make an AJAX request to your backend (with the parameters) and then return the results. That way the user only sees the input and the output and cannot extract the model itself.
Keep in mind that this means a lot more work, as you not only have to write the code for your client-side application but also for your server-side application, including the API. Depending on how your application looks like (e.g.: does it have a login?), this might be a lot more code.
Another way you can protect your model is to split the model into more than one blocks. Put some blocks at server side and some at client side. This method may also introduce a lot of engineering work, but once you do that you can trade off the computation loading and network latency between the server and client. Users can only get some model blocks which is useless without cooperating with server side blocks.
I want to implement some anti-crawler mechanism to protect data in my site. After reading many related topics in SO, I am going to focus on "enforce running javascript".
My plan is:
Implement a special function F (eg. MD5SUM) in javascript file C
Input: cookie string of current user (the cookie changes in each response)
Output: a verification string V
Send V along with other parameters to sensitive backend interface to request valuable data
Backend server has validation function T to check whether V is correct
The difficult part is how to obfuscate F. If crawlers can easily understand F, they will get V without C and bypass javascript.
Indeed, there are many js obfuscators, but I am going achieve the goal by implement a generator function G which is not appear in C.
G(K) generates F, where K is a large integer. F should be complicate enough, so that crawler writers have to take many hours to understand F. Given another K',
G(K') = F', F' should look like a new function in some extent, and again, crawler writers have to take hours to crack.
A possible implementation of G might be a mapping from integer to a digital circuit of many connected logic gates (like a maze). Using javascript grammar to represent it as F. Since F must be run in javascript, crawlers have to run PhantomJS. Furthermore, I can insert sleeps in F to slow down crawlers while normal users hardly aware 50-100ms delay.
I know there is a group of methods to detect crawlers. They will be applied. Let's only discuss "enforce running javascript" topic.
Could you give me some advice? Is there any better solution?
Using login to prevent the whole world to see the data is one option.
If you do not want logged in users to fetch all the data you make available to them, you could then limit the number of requests per minute for the user, adding a delay to your page load if it has been reached. Since the user is logged, you could easily track the requests server-side even if they manage to change cookies/localStorage/IP/Browser and whatnot.
You can use images for some texts, that will force them to use some resource-heavy mechanics to translate to usable information.
You could add hidden texts, this would even prevent users' copy/paste (you use spans filled with 3-4 random letters on every 3-4 real letter and make them font-size 0). That way they aren't seen, but still copied, and most likely will be taken from crawler.
Refuse connection from known crawler HTTP header signatures, although any crawler could mock those. And greasemonkey or some scripting extension could even turn a regular browser into a crawler so this has very little incidence.
Now, to force using javascript
The problem is that you cannot really force any javascript execution. What the javascript does is seen by everyone who has access to the page, so if it's some kind of MD5 hash you'd accomplish, this can be implemented in any language.
That's mainly unfeasible because the crawler has access to exactly everything the client's javascript has access to.
Forcing to use a javascript enabled crawler can be circumvented, and even if not, with the computing power available to anyone nowaday, it is very easy to launch a phantomJS instance... And as I said above, anyone with slight javascript knownledge can simply automate clicks on your website using their browser, which will make everything undetectable.
What should be done
The only bulletproof way to prevent crawlers to leech your data, and to prevent any automation is to ask something that only a human could do. Captcha comes to mind.
Think about your real users
First thing you should keep in mind is that is your website starts to get annoying to use for normal users, they will not come back. Having to type a 8 character captcha on each page request just because there MIGHT be someone who wants to pump the data will become too tedious for anyone. Also, blocking unknown browser agents might prevent legit users from accessing your website because of X or Y reason they are using a weird browser.
The impact on your legit users, and the time you'd take working hard on fighting crawlers might be too high to just accept that some crawling will happen. So your best bet is to rewrite your TOS to explicitly forbid crawling of any sort, log every http access of every user, and take action when needed.
Disclaimer:
I'm scrapping over a hundred websites monthly, following external
links to totalise about 3000 domains. At the time of posting, none of
them are resisting, while they employ one or more techniques of the
above. When a scrapping error is detected, it does not take long to
fix it...
The only thing is to crawl respectfully, not over crawl or make too
many requests in a small time frame. Just doing that will circumvent
most popular anti crawlers.
We have an app that sits behind a firewall and behind a CAS authentication layer. It has a feature that allows users with a special role to customize the way the app works by writing JavaScript functions that get inserted into the application at runtime, and which can be fired by events such as button clicks and page load and the like. (The JS is not "eval"'d - it is written into the page server-side.)
Needless to say, this feature raises security concerns!
Are there recommendations beyond what's being done already to secure this, that is beyond a) firewall, b) robust authentication and c) authorization.
EDIT:
In response to questions in comments:
1. Does the injected code become part of the application, or it is executed as an independent application (separated context)?
Yes, it becomes a part of the application. It currently gets inserted, server-side, into a script tag.
Does inserted JavaScript run on clients' browsers other than the original writer?
Yes. It gets persisted, and then gets inserted into all future requests.
(The application can be thought of as an "engine" for building custom applications against a generic backend data store which is accessed by RESTful calls. Each custom application can have its own set of custom these JavaScripts)
You really shouldn't just accept arbitrary JavaScript. Ideally, what should happen is that you tokenize whatever JavaScript is sent and ensure that every token is valid JavaScript, first and foremost (this should apply in all below scenarios).
After that, you should verify that whatever JavaScript is sent does not access sensitive information.
That last part may be extremely difficult or even impossible to verify in obfuscated code, and you may need to consider that no matter how much verification you do, this is an inherently unsafe practice. As long as you understand that, below are some suggestions for making this process a little safer than it normally is:
As #FDavidov has mentioned, you could also restrict the JavaScript from running as part of the application and sandbox it in a separate context much like Stack Snippets do.
Another option is to restrict the JavaScript to a predefined whitelist of functions (some of which you may have implemented) and globals. Do not allow it to interact directly with DOM or globals except of course primitives, control flow, and user-defined function definitions. This method does have some success depending on how robustly enforced the whitelist is. Here is an example that uses this method in combination with the method below.
Alternatively, if this is possible with what you had in mind, do not allow the code to run on anyone's machine other than the original author of the code. This would basically be moving a Userscript-like functionality into the application proper (which I honestly don't see the point), but it would definitely be safer than allowing it to run on any client's browser.
A persistent follow-up of an admittedly similar question I had asked: What security restrictions should be implemented in allowing a user to upload a Javascript file that directs canvas animation?
I like to think I know JS decent enough, and I see common characters in all the XSS examples I've come accoss, which I am somewhat familiar with. I am lacking good XSS examples that could bypass a securely sound, rationally programmed system. I want people to upload html5 canvas creations onto my site. Any sites like this yet? People get scared about this all the time it seems, but what if you just wanted to do it for fun for yourself and if something happens to the server then oh well it's just an animation site and information is spread around like wildfire anyway so if anyone cares then i'll tell them not to sign up.
If I allow a single textarea form field to act as an IDE using JS for my programming language written in JS, and do string replacing, filtering, and validation of the user's syntax before finally compiling it into JS to be echoed by PHP, how bad could it get for me to host that content? Please show me how you could bypass all of my combined considerations, with also taking into account the server-side as well:
If JavaScript is disabled, preventing any POST from getting through, keeping constant track of user session.
Namespacing the Class, so they can only prefix their functions and methods with EXAMPLE.
Making instance
Storing my JS Framework in an external (immutable in the browser?) JS file, which needs to be at the top of the page for the single textarea field in the form to be accepted, as well as a server-generated key which must follow it. On the page that hosts the compiled user-uploaded canvas game/animation (1 per page ONLY), the server will verify the correct JS filename string before echoing the rest out.
No external script calls! String replacing on client and server.
Allowing ONLY alphanumeric characters, dashes and astericks.
Removing alert, eval, window, XMLHttpRequest, prototyping, cookie, obvious stuff. No native JS reserved words or syntax.
Obfuscating and minifying another external JS file that helps to serve the IDE and recognize the programming language's uniquely named Canvas API methods.
When Window unloads, store the external JS code in to two dynamically generated form fields to be checked by the server in POST. All the original code will be cataloged in the DB thoroughly for filtering purposes.
Strict variable naming conventions ('example-square1-lengthPROPERTY', 'example-circle-spinMETHOD')
Copy/Paste Disabled, setInterval to constantly check if enabled by the user. If so, then trigger a block to the database, change window.location immediately and check the session ID through POST to confirm in case JS becomes disabled between that timeframe.
I mean, can I do it then? How can one do harm if they can't use HEX or ASCII and stuff like that?
I think there are a few other options.
Good places to go for real-life XSS tests, by the way, are the XSS Cheat Sheet and HTML5 Security Cheetsheet (newer). The problem with that, however, is that you want to allow Javascript but disallow bad Javascript. This is a different, and more complex, goal than the usual way of preventing XSS, by preventing all scripts.
Hosting on a separate domain
I've seen this referred to as an "iframe jail".
The goal with XSS attacks is to be able to run code in the same context as your site - that is, on the same domain. This is because the code will be able to read and set cookies for that domain, intiate user actions or redress your design, redirect, and so forth.
If, however, you have two separate domains - one for your site, and another which only hosts the untrusted, user-uploaded content, then that content will be isolated from your main site. You could include it in an iframe, and yet it would have no access to the cookies from your site, no access to redress or alter the design or links outside its iframe, and no access to the scripting variables of your main window (since it is on a different domain).
It could, of course, set cookies as much as it likes, and even read back the ones that it set. But these would still be isolated from the cookies for your site. It would not be able to affect or read your main site's cookies. It could also include other code which could annoy/harrass the user, such as pop-up windows, or could attempt to phish (you'd need to make it visually clear in your out-of-iframe UI that the content served is not part of your site). However, this is still sandboxed from your main site, where you own personal payload - your session cookies and the integrity of your overarching page design and scripts, is preserved. It would carry no less but no more risk than any site on the internet that you could embed in an iframe.
Using a subset of Javascript
Subsets of Javascript have been proposed, which provide compartmentalisation for scripts - the ability to load untrusted code and have it not able to alter or access other code if you don't give it the scope to do so.
Look into things like Google CAJA - whose aim is to enable exactly the type of service that you've described:
Caja allows websites to safely embed DHTML web applications from third parties, and enables rich interaction between the embedding page and the embedded applications. It uses an object-capability security model to allow for a wide range of flexible security policies, so that the containing page can effectively control the embedded applications' use of user data and to allow gadgets to prevent interference between gadgets' UI elements.
One issue here is that people submitting code would have to program it using the CAJA API. It's still valid Javascript, but it won't have access to the browser DOM, as CAJA's API mediates access. This would make it difficult for your users to port some existing code. There is also a compilation phase. Since Javascript is not a secure language, there is no way to ensure code cannot access your DOM or other global variables without running it through a parser, so that's what CAJA does - it compiles it from Javascript input to Javascript output, enforcing its security model.
htmlprufier consists of thousands of regular expressions that attempt "purify" html into a safe subset that is immune to xss. This project is bypassesed very few months, because it isn't nearly complex enough to address the problem of XSS.
Do you understand the complexity of XSS?
Do you know that javascript can exist without letters or numbers?
Okay, they very first thing I would try is inserting a meta tag that changes the encoding to I don't know lets say UTF-7 which is rendered by IE. Within this utf-7 enocded html it will contain javascript. Did you think of that? Well guess what there is somewhere between a hundred thousand and a a few million other vectors I didn't think of.
The XSS cheat sheet is so old my grandparents are immune to it. Here is a more up to date version.
(Oah and by the way you will be hacked because what you are trying to do fundamentally insecure.)