We have an app that sits behind a firewall and behind a CAS authentication layer. It has a feature that allows users with a special role to customize the way the app works by writing JavaScript functions that get inserted into the application at runtime, and which can be fired by events such as button clicks and page load and the like. (The JS is not "eval"'d - it is written into the page server-side.)
Needless to say, this feature raises security concerns!
Are there recommendations beyond what's being done already to secure this, that is beyond a) firewall, b) robust authentication and c) authorization.
EDIT:
In response to questions in comments:
1. Does the injected code become part of the application, or it is executed as an independent application (separated context)?
Yes, it becomes a part of the application. It currently gets inserted, server-side, into a script tag.
Does inserted JavaScript run on clients' browsers other than the original writer?
Yes. It gets persisted, and then gets inserted into all future requests.
(The application can be thought of as an "engine" for building custom applications against a generic backend data store which is accessed by RESTful calls. Each custom application can have its own set of custom these JavaScripts)
You really shouldn't just accept arbitrary JavaScript. Ideally, what should happen is that you tokenize whatever JavaScript is sent and ensure that every token is valid JavaScript, first and foremost (this should apply in all below scenarios).
After that, you should verify that whatever JavaScript is sent does not access sensitive information.
That last part may be extremely difficult or even impossible to verify in obfuscated code, and you may need to consider that no matter how much verification you do, this is an inherently unsafe practice. As long as you understand that, below are some suggestions for making this process a little safer than it normally is:
As #FDavidov has mentioned, you could also restrict the JavaScript from running as part of the application and sandbox it in a separate context much like Stack Snippets do.
Another option is to restrict the JavaScript to a predefined whitelist of functions (some of which you may have implemented) and globals. Do not allow it to interact directly with DOM or globals except of course primitives, control flow, and user-defined function definitions. This method does have some success depending on how robustly enforced the whitelist is. Here is an example that uses this method in combination with the method below.
Alternatively, if this is possible with what you had in mind, do not allow the code to run on anyone's machine other than the original author of the code. This would basically be moving a Userscript-like functionality into the application proper (which I honestly don't see the point), but it would definitely be safer than allowing it to run on any client's browser.
Related
I am working on a React-based web app that uses Tensorflow.js to run an AI model in realtime on the client in the browser. I've trained this AI model from scratch and I'd like to protect it from being intercepted and used in other projects. Are there any protections available to do this (obfuscation, DRM, etc.)?
From a business perspective, I'd only like the model to work on my web app, nowhere else.
The discussions (1 2 3) I've been able to find on this are more geared toward native apps, not web apps.
Here is an example open source web app that uses Tensorflow.js. These weights are an example of what I would like to protect in my app.
Client-side code obfuscation will never fully prevent it. Use a server instead.
Obfuscation
If your client-side application contains the model, then the user will be able to somehow extract it. You can make it harder for the user, but it will always be possible. Some techniques to make it harder are:
Obfuscating your code: That way the user will not be able to read your code and comments easily. Depending on your build tools, this might already be done for you when you produce a "production ready" build.
Obfuscating the library and its public API: Even if your code is obfuscated, the user might still be able to guess what is going on by seeing the public API calls of the library. Example: It would be rather easy to set a break point at the model.predict function and debug your code from there on. By also obfuscating libraries and their API, this will become harder.
Put "special checks" in your code: You could also check if the page the code is running on is your page (e.g. if the domain matches), etc. You also want to obfuscate this code as well.
Even if your code is perfectly obfuscated and well protected, your client-side code still contains your model somewhere. With these methods it will always be possible to somehow extract your model.
Server-side approach
To make it impossible to get your model, you need a different approach. Only put your "dumb logic" on the client. Exclude the part of code that you want to protect. Instead you offer a API on your server that executes the "protected part" of your code.
This way, instead of running model.predict on the client-side, you would make an AJAX request to your backend (with the parameters) and then return the results. That way the user only sees the input and the output and cannot extract the model itself.
Keep in mind that this means a lot more work, as you not only have to write the code for your client-side application but also for your server-side application, including the API. Depending on how your application looks like (e.g.: does it have a login?), this might be a lot more code.
Another way you can protect your model is to split the model into more than one blocks. Put some blocks at server side and some at client side. This method may also introduce a lot of engineering work, but once you do that you can trade off the computation loading and network latency between the server and client. Users can only get some model blocks which is useless without cooperating with server side blocks.
I've been reading articles online about what universal javascript is but I'm still not comfortable with the definitions each site is giving which is, "code that can run on the client and server." Does this mean that a node.js app is inherently universal javascript because it will have javascript running in the client side and server side. Or does universal javascript have to do with server side rendering then client side rendering?
Preface: I cannot find any highly-authoritative (e.g. ECMA, Microsoft, Mozilla or Google) source that provides a strict definition of "universal JavaScript" or "isomorphic JavaScript" - at most I've found a few blog posts (albeit by influential personalities) however I can see why a newcomer might be confused.
It seems there are two definitions going around which are similar, but with crucial differences:
1. To refer to JavaScript which runs anywhere
This definition refers to JavaScript which does not take a dependency on any specific client-side or server-side API, instead they only make use of features present in JavaScript's built-in library (String, Array, Date, function, Math etc) or on other libraries that also similarly restrict their dependencies (a transitive relation).
Remember that "JavaScript" does not mean that the DOM API, AJAX, HTML5 <canvas> (and so on) are available - it just means the JavaScript scripting language is being used - that's it. JavaScript has been available outside of web-browsers for over 20 years now (Windows support JavaScript as a shell-scripting language in cscript.exe/wscript.exe and ASP 3.0 supported server-side JScript as an alternative to VBScript - and the .NET Framework has "JScript.NET" too).
So in this case, if you wrote a library that adds some useful string functions, which only references String, then that script would work without issue in a Node.js server environment or an in-browser environment.
But if your script ever used the window object (only present in browsers) or express (a library only for Node) then it loses "universal" status because it cannot "run everywhere".
2. To refer to JavaScript which renders the same HTML whether on the server or on the client
e.g. http://isomorphic.net/
This definition is actually a strict subset of the first definition: as the same script must (by definition) run inside both a server/Node.js context, but also a browser DOM context - and when it runs it generates content (typically HTML) that is then displayed in the user's browser (and by doing this it must take a dependency on both a Node API and the W3C DOM - so then it cannot strictly run "anywhere" because neither are available in a cscript.exe environment, for example.
Note: There is debate if use of XMLHttpRequest or fetch makes a script universal or not - as their presence is not guaranteed (as technically they're part of the DOM, not JavaScript's built-in library).
In this 2015 blog post ( https://medium.com/#ghengeveld/isomorphism-vs-universal-javascript-4b47fb481beb ) the author argues that only the term "isomorphic JavaScript" should be used to refer to rendering code that runs in both browser and server environments, while "universal JavaScript" should refer to truly portable, environment-agnostic, JavaScript (i.e. my first definition).
Nowadays Single Page Applications have become very popular but they have problems, SEO, for example.
So, how does an SPA work? JavaScript loads in the browser and loads data from an API. Most of the rendering is done on the client Side. But search engine bots have a hard time indexing the page because it doesn't have much without JS.
Now, Universal/Isomorphic App comes to the rescue. At the initial page load, the original page renders on the server. After that, the app works like an SPA. It's got better SEO because when a search engine bot asks for a page, the server returns the whole rendered HTML page, with content and meta tags.
Edit
An isomorphic app can be done with JavaScript (Node.js), PHP or some other language, but if that app written with Node.js, then we can call it universal as both the backend and frontend are in JavaScript.
I'll try to explain it with examples, even if other answers seem already accurate.
A basic example
Imagine you develop a SPA that render an Hello World message. This means that your browser loads an HTML file with a <script> tag (or the reference to a JS file) that actually makes this happen. You can prove that "Hello world" is generated by JavaScript in the client browser, because if you deactivate JavaScript you won't see any message.
Now isolate the code that prints the string "Hello World", it doesn't need much to be adapted and work in the server side. In fact, the server just needs to send an HTML string that "contains" the <h1>Hello World</h1> inside its body.
So what it makes it universal/isomorphic? The fact that the code can understand in which environment it runs (the browser, the server or possibly an other environment) and it keeps functioning. Remember: code usually only runs in one of the two environments, the thing is that you wrote some common code that can run in both environments (universal).
The behavior of a more complex Universal App
Imagine that you struggle to develop a new Universal website. The code can acknowledge in which environment it is running and work just fine. So you have, let's say, 80% of your code that is shared, it doesn't even need to know the environment, and the rest of your code is there to managing the fact that your app can be used in the client or in the server.
How does this work?
The client first contacts the server that returns some HTML to the client with all the content of the page, elaborated in the server. So the server renders the application. In the mean time the browser downloads the script file where your single page can work in the client. The client is now rendering the same page again. You won't see anything, because if it is properly done, it will just be the same (of course all the animations and real time features have to work client-side, so you will eventually see your animations starting)
When the user clicks an internal link or uses an interactive feature, or eventually fills out and submits a form, the client-side code is in use. The server doesn't get any request, especially assuming that all the interaction are abstracted in an API that is not our isomorphic app.
If the user goes crazy and wants to deactivate JavaScript, how do you assure that, for example, forms still work? Here is a trick you can use:
<form
method="post"
action="/api/fakeBackendRoute"
onSubmit={this.handleSubmit}
>
[input fields here]
</form>
When the client JS is available, the handleSubmit is executed and the propagation of the event is prevented. This way the server side code will never fire.
If the client JS is disabled, then handleSubmit will never be executed and you have to care that your /api/fakeBackendRoute will handle the data exactly how the client would.
Why do people use it?
In my opinion the difficulty of undertaking the development of an Universal App is often underestimated. Good reasons to use it are:
Be more SEO friendly
Support very old browsers. For example, if you want to support IE8, you could do something like this:
<!--[if gt IE 8]><!-->
<script src="yourfile.js"></script>
<!--<![endif]-->
Be more accessible for people that don't want to use JavaScript
Other reasons could be:
Performance, if it matters to your application. You can improve your response time by using, for example, a lot of Node capabilities to stream your HTML string in the first request, and eventually later be more in the client, where things will be likely faster. But you could decide whether it is faster to render on the client or on the server, depending on the content and how you create your assets.
If someone knows other good reasons, just comment below and I will add them.
Some good reference links:
https://medium.com/airbnb-engineering/isomorphic-javascript-the-future-of-web-apps-10882b7a2ebc
https://medium.com/front-end-developers/handcrafting-an-isomorphic-redux-application-with-love-40ada4468af4
https://github.com/xgrommx/awesome-redux
Okay, the title of this topic is really stupid - but I am not able to sum it up in a better way than that. So here is more detailed version of my problem:
I am creating a small JavaScript library that enables developers to send strings on custom events to a dedicated server (url defined in the library). Lets say the library is called "testLib", the developer that uses this library could write something like this:
function success() {
testLib.send("Everything OK");
}
So everytime this success function is called, a REST call (POST-request) is made to the server that is definded inside the library. So far, thats no problem.
But the ugly thing is that everyone with firebug or similar could call these "testLib.send()" method too. Thats really ugly, because the hole sense behind this library is to track only the events that the developer has defined. Of course, the server will take care of the basic validation (origin check, API key,..), but still: One could start firebug and just call the "testLib.send" method.
Is there any chance to build an authorization mechanism that prevent the "firebug user" from sending rest calls via the predefined library methods?
Nothing practical.
The library runs on the client's computer. You have no control over that. They can edit the JS to their heart's content. They can bypass it entirely and send hand-crafted HTTP requests if they want (or write a quick script to bomb the server with requests).
Any real protection you implement has to be on the server.
Writing Javascript is like writing open source. FireBug is but one of the plugins which can get into your script, modify it on the fly, invoke methods, access variables, etc. In fact, you don't have to go that far: The Javascript console in most browsers contains a quick eval input box. Because Javascript is an interpreter, anyone can get in and do as they wish.
You have two options which might make it a tad more difficult (though certainly NOT impassable):
1) Obfuscation and/or packing the script, when you are done - though most obfuscators can easily be bypassed
2) Having your methods check who called them - have a look at arguments.callee.caller for that. That said, this might run into problem in strict mode.
Your best bet is to repeat any validation in the server side, as you say. If the server side validation fails - this actually tells you something: Someone deliberately bypassed your Javascript, and you can deal with him accordingly.
Hope this helps
TG
Authenticating users
If you application authenticates a user when the page loads, then every request from the client side will come along with authentication cookie so basically you will be able to detect who the sender is.
Obfuscation and private closures
But if you'd like to prevent programmatic access to that particular function that your best bet is function closure to make that function private and inaccessible and some code obfuscation that prevents people from plainly rewriting the whole stuff. One great obfuscation is the Javascript packer with Base62 enabled.
But this kind of things will of course obfuscate your library, but publicly accessible functions would still be accessible.
Preventing anonymous users
However. If you'd like anonymous users to prevent from sending stuff to your server you can't do that really, but you can identify unauthorised requests, by having your functions to require some sort of a registration key that your developers (real users) would have, but anons wouldn't.
And maybe some other resources found on Google may help just a well. Just to scratch the surface. XHR for instance allows users to send username and password to authenticate the request which may be exactly what you're after. But you should know better since you have the library design not us.
No. Because javascript runs on the client side, there's nothing you can do to prevent someone from reading what the client is executing and executing it themselves. There are things you can do to obfuscate your calls, but this is security through obscurity - and shouldn't be relied on. If you want to make sure that ONLY the developer is making calls to your API, they would need to do it on the server side.
A persistent follow-up of an admittedly similar question I had asked: What security restrictions should be implemented in allowing a user to upload a Javascript file that directs canvas animation?
I like to think I know JS decent enough, and I see common characters in all the XSS examples I've come accoss, which I am somewhat familiar with. I am lacking good XSS examples that could bypass a securely sound, rationally programmed system. I want people to upload html5 canvas creations onto my site. Any sites like this yet? People get scared about this all the time it seems, but what if you just wanted to do it for fun for yourself and if something happens to the server then oh well it's just an animation site and information is spread around like wildfire anyway so if anyone cares then i'll tell them not to sign up.
If I allow a single textarea form field to act as an IDE using JS for my programming language written in JS, and do string replacing, filtering, and validation of the user's syntax before finally compiling it into JS to be echoed by PHP, how bad could it get for me to host that content? Please show me how you could bypass all of my combined considerations, with also taking into account the server-side as well:
If JavaScript is disabled, preventing any POST from getting through, keeping constant track of user session.
Namespacing the Class, so they can only prefix their functions and methods with EXAMPLE.
Making instance
Storing my JS Framework in an external (immutable in the browser?) JS file, which needs to be at the top of the page for the single textarea field in the form to be accepted, as well as a server-generated key which must follow it. On the page that hosts the compiled user-uploaded canvas game/animation (1 per page ONLY), the server will verify the correct JS filename string before echoing the rest out.
No external script calls! String replacing on client and server.
Allowing ONLY alphanumeric characters, dashes and astericks.
Removing alert, eval, window, XMLHttpRequest, prototyping, cookie, obvious stuff. No native JS reserved words or syntax.
Obfuscating and minifying another external JS file that helps to serve the IDE and recognize the programming language's uniquely named Canvas API methods.
When Window unloads, store the external JS code in to two dynamically generated form fields to be checked by the server in POST. All the original code will be cataloged in the DB thoroughly for filtering purposes.
Strict variable naming conventions ('example-square1-lengthPROPERTY', 'example-circle-spinMETHOD')
Copy/Paste Disabled, setInterval to constantly check if enabled by the user. If so, then trigger a block to the database, change window.location immediately and check the session ID through POST to confirm in case JS becomes disabled between that timeframe.
I mean, can I do it then? How can one do harm if they can't use HEX or ASCII and stuff like that?
I think there are a few other options.
Good places to go for real-life XSS tests, by the way, are the XSS Cheat Sheet and HTML5 Security Cheetsheet (newer). The problem with that, however, is that you want to allow Javascript but disallow bad Javascript. This is a different, and more complex, goal than the usual way of preventing XSS, by preventing all scripts.
Hosting on a separate domain
I've seen this referred to as an "iframe jail".
The goal with XSS attacks is to be able to run code in the same context as your site - that is, on the same domain. This is because the code will be able to read and set cookies for that domain, intiate user actions or redress your design, redirect, and so forth.
If, however, you have two separate domains - one for your site, and another which only hosts the untrusted, user-uploaded content, then that content will be isolated from your main site. You could include it in an iframe, and yet it would have no access to the cookies from your site, no access to redress or alter the design or links outside its iframe, and no access to the scripting variables of your main window (since it is on a different domain).
It could, of course, set cookies as much as it likes, and even read back the ones that it set. But these would still be isolated from the cookies for your site. It would not be able to affect or read your main site's cookies. It could also include other code which could annoy/harrass the user, such as pop-up windows, or could attempt to phish (you'd need to make it visually clear in your out-of-iframe UI that the content served is not part of your site). However, this is still sandboxed from your main site, where you own personal payload - your session cookies and the integrity of your overarching page design and scripts, is preserved. It would carry no less but no more risk than any site on the internet that you could embed in an iframe.
Using a subset of Javascript
Subsets of Javascript have been proposed, which provide compartmentalisation for scripts - the ability to load untrusted code and have it not able to alter or access other code if you don't give it the scope to do so.
Look into things like Google CAJA - whose aim is to enable exactly the type of service that you've described:
Caja allows websites to safely embed DHTML web applications from third parties, and enables rich interaction between the embedding page and the embedded applications. It uses an object-capability security model to allow for a wide range of flexible security policies, so that the containing page can effectively control the embedded applications' use of user data and to allow gadgets to prevent interference between gadgets' UI elements.
One issue here is that people submitting code would have to program it using the CAJA API. It's still valid Javascript, but it won't have access to the browser DOM, as CAJA's API mediates access. This would make it difficult for your users to port some existing code. There is also a compilation phase. Since Javascript is not a secure language, there is no way to ensure code cannot access your DOM or other global variables without running it through a parser, so that's what CAJA does - it compiles it from Javascript input to Javascript output, enforcing its security model.
htmlprufier consists of thousands of regular expressions that attempt "purify" html into a safe subset that is immune to xss. This project is bypassesed very few months, because it isn't nearly complex enough to address the problem of XSS.
Do you understand the complexity of XSS?
Do you know that javascript can exist without letters or numbers?
Okay, they very first thing I would try is inserting a meta tag that changes the encoding to I don't know lets say UTF-7 which is rendered by IE. Within this utf-7 enocded html it will contain javascript. Did you think of that? Well guess what there is somewhere between a hundred thousand and a a few million other vectors I didn't think of.
The XSS cheat sheet is so old my grandparents are immune to it. Here is a more up to date version.
(Oah and by the way you will be hacked because what you are trying to do fundamentally insecure.)
I understand the term sandbox. But my limited skills in JS is unable to help me understand what is sandboxing in JS. So, what actually is sandboxing? Apart from security, why do we need to sandbox JS?
the javascript sandbox does exactly what you've said. It limits the scope of what a script can do. There are also benefits in terms of virtualising the resources the script can call on. This allows the sandbox host to marshal those resources for better performance and say, stop an endlessly looping script bringing the whole browser crashing down.
Sandboxing is the act of creating a scope in which no other part of the application can operate (unless given an opportunity to). More specifically, this is usually a function scope that exposes a limited subset of what's actually going on within it.
One library that's founded on the idea of sandboxes is YUI3. The basic unit of the application is a YUI instance sandbox:
var Y = YUI(); // creates a configurable YUI instance
// Creates a sandbox for one part of your application,
// including the 'node' module.
Y.use('node', function(Z) {
// Z is a YUI instance that's specific to this sandbox.
// Operations inside it are protected from outside code
// unless exposed explicitly. Any modules you request in
// use statement will be separately instanced just for
// this sandbox (in this case, the 'node' module)
//
// That way, if another part of your application decides
// to delete Z.Node (or worse, replace it with a
// malicious proxy of Z.Node) the code you've written
// here won't be affected.
});
The advantages of sandboxes are primarily to reduce application complexity: since sandboxes are immutable, they're much easier to reason about and verify. They also improve runtime security, since a well-designed sandbox should be able to operate as a black box to other scripts running on the page. It does not prevent against all possible attacks, but it protects against many of the simple ones.
Sandboxing creates a limited scope for the script to use. Assuming you're coding for a website, t's worth sandboxing to avoid making edits to a live site when you are uncertain about whether they will work exactly as you expect - and it's impossible to be truly certain without testing. Even if it works properly, if there's a chance of you making a series of alterations to the JS until you've got it tweaked the way you like, you could easily disrupt anyone attempting to use the site while you're updating it.
It's also much easier to tell what's broken when you break things, because of the limited nature of the sandbox.