Secure, sandboxable user exposed programming language / environment?

Secure, sandboxable user exposed programming language / environment? - javascript

Beyond offering an API for my website, I'd like to offer users the ability to write simple scripts that would run on my servers . The scripts would have access to objects owned by the user and be able to manipulate, modify, and otherwise process their data.
I'd like to be able to limit resources taken by these scripts at a fine level (eg. max execution time should be 100ms). I'd also like to ensure a secure sandbox such that each user will have access to only a limited set of data and resources, and be prevented from accessing disk, other people's data, etc.
Generally the scripts will be very simple (eg. create the sum or average of the values that match certain criteria), and they'll often be used in templates (eg. fill in the value of this cell or html element with the average or sum).
Ideally I'd like to use a sandboxed subset of a well know, commonly available programming language so it's easy for users to pick up. The backend is written in Python, so a Python based language could have benefits, but I'm open to other languages and technologies. Javascript is also attractive due to its simple nature and common availability.
The languages should support creation of DSLs and libraries.
The target audience is a general user base for a web based application, not necessarily very technical. In other words, it's not targeted at a base with particular knowledge of any particular programming language. My expectation is a subset of users will create scripts that will be used by the larger majority.
Any ideas or recommendations for the language and technology? Any examples of others trying this and the successes and failures they encountered?

I use Lua for this, but it's directed at a Lua capable community. So my answer would be who are your users?
If your users are internal, like my case, and proficient with Python use Python. However if this is something for the world wide web, I'd probably choose javascript, because its the lingua franca, (every developer knows it, and its easy to pickup). As for an Engine... well V8 would be nice, but its not 100% thread safe, in that you can't run several engine within the same process in a lock free manner, as you can with SpiderMonkey. So You might want to use that. Also since javascript is sandboxed by default you won't have to worry about implementing much on your side.

Related

Is there any difference between making DOM on the server/client side? (speed perspective) [duplicate]

I've done some web-based projects, and most of the difficulties I've met with (questions, confusions) could be figured out with help. But I still have an important question, even after asking some experienced developers: When functionality can be implemented with both server-side code and client-side scripting (JavaScript), which one should be preferred?
A simple example:
To render a dynamic html page, I can format the page in server-side code (PHP, python) and use Ajax to fetch the formatted page and render it directly (more logic on server-side, less on client-side).
I can also use Ajax to fetch the data (not formatted, JSON) and use client-side scripting to format the page and render it with more processing (the server gets the data from a DB or other source, and returns it to the client with JSON or XML. More logic on client-side and less on server).
So how can I decide which one is better? Which one offers better performance? Why? Which one is more user-friendly?
With browsers' JS engines evolving, JS can be interpreted in less time, so should I prefer client-side scripting?
On the other hand, with hardware evolving, server performance is growing and the cost of sever-side logic will decrease, so should I prefer server-side scripting?
EDIT:
With the answers, I want to give a brief summary.
Pros of client-side logic:
Better user experience (faster).
Less network bandwidth (lower cost).
Increased scalability (reduced server load).
Pros of server-side logic:
Security issues.
Better availability and accessibility (mobile devices and old browsers).
Better SEO.
Easily expandable (can add more servers, but can't make the browser faster).
It seems that we need to balance these two approaches when facing a specific scenario. But how? What's the best practice?
I will use client-side logic except in the following conditions:
Security critical.
Special groups (JavaScript disabled, mobile devices, and others).

In many cases, I'm afraid the best answer is both.
As Ricebowl stated, never trust the client. However, I feel that it's almost always a problem if you do trust the client. If your application is worth writing, it's worth properly securing. If anyone can break it by writing their own client and passing data you don't expect, that's a bad thing. For that reason, you need to validate on the server.
Unfortunately if you validate everything on the server, that often leaves the user with a poor user experience. They may fill out a form only to find that a number of things they entered are incorrect. This may have worked for "Internet 1.0", but people's expectations are higher on today's Internet.
This potentially leaves you writing quite a bit of redundant code, and maintaining it in two or more places (some of the definitions such as maximum lengths also need to be maintained in the data tier). For reasonably large applications, I tend to solve this issue using code generation. Personally I use a UML modeling tool (Sparx System's Enterprise Architect) to model the "input rules" of the system, then make use of partial classes (I'm usually working in .NET) to code generate the validation logic. You can achieve a similar thing by coding your rules in a format such as XML and deriving a number of checks from that XML file (input length, input mask, etc.) on both the client and server tier.
Probably not what you wanted to hear, but if you want to do it right, you need to enforce rules on both tiers.

I tend to prefer server-side logic. My reasons are fairly simple:
I don't trust the client; this may or not be a true problem, but it's habitual
Server-side reduces the volume per transaction (though it does increase the number of transactions)
Server-side means that I can be fairly sure about what logic is taking place (I don't have to worry about the Javascript engine available to the client's browser)
There are probably more -and better- reasons, but these are the ones at the top of my mind right now. If I think of more I'll add them, or up-vote those that come up with them before I do.
Edited, valya comments that using client-side logic (using Ajax/JSON) allows for the (easier) creation of an API. This may well be true, but I can only half-agree (which is why I've not up-voted that answer yet).
My notion of server-side logic is to that which retrieves the data, and organises it; if I've got this right the logic is the 'controller' (C in MVC). And this is then passed to the 'view.' I tend to use the controller to get the data, and then the 'view' deals with presenting it to the user/client. So I don't see that client/server distinctions are necessarily relevant to the argument of creating an API, basically: horses for courses. :)
...also, as a hobbyist, I recognise that I may have a slightly twisted usage of MVC, so I'm willing to stand corrected on that point. But I still keep the presentation separate from the logic. And that separation is the plus point so far as APIs go.

I generally implement as much as reasonable client-side. The only exceptions that would make me go server-side would be to resolve the following:
Trust issues
Anyone is capable of debugging JavaScript and reading password's, etc. No-brainer here.
Performance issues
JavaScript engines are evolving fast so this is becoming less of an issue, but we're still in an IE-dominated world, so things will slow down when you deal with large sets of data.
Language issues
JavaScript is weakly-typed language and it makes a lot of assumptions of your code. This can cause you to employ spooky workarounds in order to get things working the way they should on certain browsers. I avoid this type of thing like the plague.
From your question, it sounds like you're simply trying to load values into a form. Barring any of the issues above, you have 3 options:
Pure client-side
The disadvantage is that your users' loading time would double (one load for the blank form, another load for the data). However, subsequent updates to the form would not require a refresh of the page. Users will like this if there will be a lot of data fetching from the server loading into the same form.
Pure server-side
The advantage is that your page would load with the data. However, subsequent updates to the data would require refreshes to all/significant portions of the page.
Server-client hybrid
You would have the best of both worlds, however you would need to create two data extraction points, causing your code to bloat slightly.
There are trade-offs with each option so you will have to weigh them and decide which one offers you the most benefit.

One consideration I have not heard mentioned was network bandwidth. To give a specific example, an app I was involved with was all done server-side and resulted in 200Mb web page being sent to the client (it was impossible to do less without major major re-design of a bunch of apps); resulting in 2-5 minute page load time.
When we re-implemented this by sending the JSON-encoded data from the server and have local JS generate the page, the main benefit was that the data sent shrunk to 20Mb, resulting in:
HTTP response size: 200Mb+ => 20Mb+ (with corresponding bandwidth savings!)
Time to load the page: 2-5mins => 20 secs (10-15 of which are taken up by DB query that was optimized to hell an further).
IE process size: 200MB+ => 80MB+
Mind you, the last 2 points were mainly due to the fact that server side had to use crappy tables-within-tables tree implementation, whereas going to client side allowed us to redesign the view layer to use much more lightweight page. But my main point was network bandwidth savings.

I'd like to give my two cents on this subject.
I'm generally in favor of the server-side approach, and here is why.
More SEO friendly. Google cannot execute Javascript, therefor all that content will be invisible to search engines
Performance is more controllable. User experience is always variable with SOA due to the fact that you're relying almost entirely on the users browser and machine to render things. Even though your server might be performing well, a user with a slow machine will think your site is the culprit.
Arguably, the server-side approach is more easily maintained and readable.
I've written several systems using both approaches, and in my experience, server-side is the way. However, that's not to say I don't use AJAX. All of the modern systems I've built incorporate both components.
Hope this helps.

I built a RESTful web application where all CRUD functionalities are available in the absence of JavaScript, in other words, all AJAX effects are strictly progressive enhancements.
I believe with enough dedication, most web applications can be designed this way, thus eroding many of the server logic vs client logic "differences", such as security, expandability, raised in your question because in both cases, the request is routed to the same controller, of which the business logic is all the same until the last mile, where JSON/XML, instead of the full page HTML, is returned for those XHR.
Only in few cases where the AJAXified application is so vastly more advanced than its static counterpart, GMail being the best example coming to my mind, then one needs to create two versions and separate them completely (Kudos to Google!).

I know this post is old, but I wanted to comment.
In my experience, the best approach is using a combination of client-side and server-side. Yes, Angular JS and similar frameworks are popular now and they've made it easier to develop web applications that are light weight, have improved performance, and work on most web servers. BUT, the major requirement in enterprise applications is displaying report data which can encompass 500+ records on one page. With pages that return large lists of data, Users often want functionality that will make this huge list easy to filter, search, and perform other interactive features. Because IE 11 and earlier IE browsers are are the "browser of choice"at most companies, you have to be aware that these browsers still have compatibility issues using modern JavaScript, HTML5, and CSS3. Often, the requirement is to make a site or application compatible on all browsers. This requires adding shivs or using prototypes which, with the code included to create a client-side application, adds to page load on the browser.
All of this will reduce performance and can cause the dreaded IE error "A script on this page is causing Internet Explorer to run slowly" forcing the User to choose if they want to continue running the script or not...creating bad User experiences.
Determine the complexity of the application and what the user wants now and could want in the future based on their preferences in their existing applications. If this is a simple site or app with little-to-medium data, use JavaScript Framework. But, if they want to incorporate accessibility; SEO; or need to display large amounts of data, use server-side code to render data and client-side code sparingly. In both cases, use a tool like Fiddler or Chrome Developer tools to check page load and response times and use best practices to optimize code.
Checkout MVC apps developed with ASP.NET Core.

At this stage the client side technology is leading the way, with the advent of many client side libraries like Backbone, Knockout, Spine and then with addition of client side templates like JSrender , mustache etc, client side development has become much easy.
so, If my requirement is to go for interactive app, I will surely go for client side.
In case you have more static html content then yes go for server side.
I did some experiments using both, I must say Server side is comparatively easier to implement then client side.
As far as performance is concerned. Read this you will understand server side performance scores.
http://engineering.twitter.com/2012/05/improving-performance-on-twittercom.html

I think the second variant is better. For example, If you implement something like 'skins' later, you will thank yourself for not formatting html on server :)
It also keeps a difference between view and controller. Ajax data is often produced by controller, so let it just return data, not html.
If you're going to create an API later, you'll need to make a very few changes in your code
Also, 'Naked' data is more cachable than HTML, i think. For example, if you add some style to links, you'll need to reformat all html.. or add one line to your js. And it isn't as big as html (in bytes).
But If many heavy scripts are needed to format data, It isn't to cool ask users' browsers to format it.

As long as you don't need to send a lot of data to the client to allow it to do the work, client side will give you a more scalable system, as you are distrubuting the load to the clients rather than hammering your server to do everything.
On the flip side, if you need to process a lot of data to produce a tiny amount of html to send to the client, or if optimisations can be made to use the server's work to support many clients at once (e.g. process the data once and send the resulting html to all the clients), then it may be more efficient use of resources to do the work on ther server.

If you do it in Ajax :
You'll have to consider accessibility issues (search about web accessibility in google) for disabled people, but also for old browsers, those who doesn't have JavaScript, bots (like google bot), etc.
You'll have to flirt with "progressive enhancement" wich is not simple to do if you never worked a lot with JavaScript. In short, you'll have to make your app work with old browsers and those that doesn't have JavaScript (some mobile for example) or if it's disable.
But if time and money is not an issue, I'd go with progressive enhancement.
But also consider the "Back button". I hate it when I'm browsing a 100% AJAX website that renders your back button useless.
Good luck!

2018 answer, with the existence of Node.js
Since Node.js allows you to deploy Javascript logic on the server, you can now re-use the validation on both server and client side.
Make sure you setup or restructure the data so that you can re-use the validation without changing any code.

Can I whitelist libraries available to user-provided Javascript?

To learn node.js, I am writing a web site that allows users to play the online game Mafia. For those unfamiliar with Mafia, it is a game most commonly played on forums, and pits an uninformed majority (the "Town") against an informed minority (the "Mafia"). However, although this is an accurate brief overview, in fact every game session can exhibit widely varying house rules that can dramatically change the game mechanics.
I want my website to be able to handle all of these variations. At first I planned for my website to implement a comprehensive framework that could run all Mafia variants itself. However, after going over a ton of rule sets for finished games archived on several different forums, I realized that the space of reasonable rules and gameplay mechanics is so huge that I would essentially have to create a new domain-specific programming language to allow all possible variants. Inventing a new language for a an otherwise straightforward personal project is rather silly and not something I'm interested in at the moment, especially given I have a perfectly good language at hand, namely JavaScript.
Therefore, I decided to let variant authors to upload a JavaScript file containing the variant code that my website will call into at the appropriate points. Essentially, JavaScript modules implementing Mafia variant game logic (which my website code will require()) will act as a scripting language to my web site's "game engine". Think Lua for C++ games. Unfortunately, this introduces a severe security problem. Unlike in the browser, node-run JavaScript has access to the file system, the network, etc. etc. So it would be trivial for a malicious user to upload a variant file that deletes the contents of my hard drive, or starts Bitcoin mining, or whatever.
My first thought was to do a replace() over each user's uploaded code for dangerous libraries such as 'fs' and 'http' into invalid strings, and catch the consequent exceptions when I try to load the file. However, this ad-hoc blacklisting technique feels like the kind of approach that one of the many people smarter/more knowledgeable that me will be able to overcome in a heartbeat. What I really need is a way to whitelist the libraries that user-uploaded code has access to. Is there a way to do this using JavaScript in node.js? If not, how would you recommend I secure the computer my node server will be running on as much as possible?
My current strategy is to require myself and a small number of trusted users to review and then vote unanimously in favor of user-uploaded JavaScript variant code before it is brought into the system, but I'm hoping there is a more automatic way of doing it.

You need to use vm module for this. Basically it allows to run scripts in customized contexts, so you can put whatever globals you want, define your own require etc.
You should also remember that in node.js it's possible to harm your app without any libraries — a user can simply add something like while (true) {} which will stop the whole process. So you need to run all untrusted code in separate processes and be ready to kill them, when they start to abuse cpu or memory.

How do you mitigate risk when introducing multivariate testing into a dynamic web application?

My company is engaging with a multivariate testing vendor (I can't disclose which one just yet), and I am being asked to integrate their system into our flagship B2C commerce site.
Usually, the word "integrate" is a heavy-duty term. However, here it means to add a <script> tag on several views, and then be out of the loop from that point forward. The mulitvariate experiments will be set up by our marketing department (and/or the vendor). Our development team will not be involved in that, and may not even be aware when it's happening.
Essentially, I am being told to deliberately add a JavaScript-injection attack vector to our flagship application... and hope for the best. My concerns are not so much with malicious code as they are with inadvertent screw-ups. Our CSS and basic page layout is a mess already, and I anticipate catiching the heat when someone else's multivariate experiment blows up the home page look-n-feel, etc. Much more importantly, there is a lot of ugly AJAX and server-side cruft that depends on predictable HTML elements, id attributes, etc... and so if an experiment changes HTML elements too much, core commerce functionality will simply break altogether in unpredictable ways.
My concerns are amplified by the lack of a real test environment. The vendor does filtering, so that only AJAX calls from our production URL are processed. Calls from our Dev or QA environments are ignored. However, that's not the same thing as having a test environment. Rather, it means that production is now our test environment for each experiment.
Given all of the above, are there any practices available to mitigate some of these risks? Multivariate testing is such a new field, the first several pages of a Google search consists of shady consultants selling buzzwords to CIO's. There's not much written for the in-house technical audience ultimately responsible for keeping an eye on risk. Are there any ways to properly QA under the above restraints, or is there simply no alternative but to push back(*) in such a scenario?
(*) Note: Given the politics of the vendor relationship, we would need to build an extraordinarily airtight case for push-back, and it may be ignored anyway.

You need to remember that multi-variate testing is not testing in the sense of what you think (or, it shouldn't be). You are not supposed to be testing the code or content. You are testing visitor reactions to that content.
Meaning... the content to be tested by your vendor should still pass through some quality assurance process before being deployed in their platform. This should suggest, then, that the best way to mitigate the risks you are concerned about is to get yourself injected into their testing/pre-release cycle to ensure the security and stability of their code before it goes out. You don't need to (and don't want to) belabor their efforts with full regression tests or anything like that. But you can take the initiative to set up a highly-specific set of tests for the dynamic content to pass before it goes to prod.
If the vendor is worth their weight in [insert whatever material here] then they should already have accounted for this in their processes, or at least be able to accommodate such a request. It's not too much to ask that you get some birds-eye sign-off before any intrusion-risk code is injected.
Or... get a waiver signed relieving you of any security or performance responsibility during the vendor's exercises. :)

Advantages of All-Javascript Webpages

I have noticed that many big(huge) sites like Google and Facebook when looking to the page source 99% of the source is JavaScript.
Does anybody know the advantages to this approach versus regular HTML+JavaScript pages?
Is it just to add some security or does it have benefits in terms of performance or maintainability?

One reason why I have implemented pages in this pattern is because I wanted to have a client-agnostic server that just serves data packaged in an easily-parseable format (such as JSON) so that the same server could be used to drive a traditional webapp as well as other things such as native Android and iPhone applications without needing any special modifications to server code.
A JavaScript-heavy page allows you to work with such a setup by having the JavaScript request the required data from the server and then use it to construct an interface in HTML. Given that most of the major players have similar concerns with wanting a single server architecture to power an application across a large number of platforms, that may be a contributing factor with respect to why they have chosen to implement their webpages primarily in JavaScript.

Advantages of All-Javascript Webpages
There are many disadvantages, one being that accessibility is destroyed. Another is that you very often end up completely rewriting the client UI, which then leads to cross-browser issues and clunky performance since native browser functions are replaced with DOM equivalents.
Try any of those sites with an older or non-mainstream browser and older PC. You probably won't like the experience.
Lastly, search bots won't index your site unless they are clever enough to understand the script and data - I don't think many do.

None of those in my opinion. It's just more interactive and easier to do if you want to show differently on different screen sizes and so on.
And some perfomances for some parts (I saw a thing from msdn where they were storing chunks on localstorage with JavaScript and therefore incredibly decreasing the number of http requests but it means droping browsers without JS).

I would definitely not recommend writing pages predominantly with javascript 'manually'. I think the most likely reason their pages look like this is that they are using libraries such as JSF etc. that generate javascript dynamically for them. Javascript does have the performance benefit of running directly on the client instead of needing to request the server to do the work (which implies a round trip to the server) although this is usually limited to trivial tasks that do not require access to server resources. As for maintainability though, I would say a page written with too much 'manual' javascript would be harder to maintain. For better maintainability you should include javascript in a separate js file.

Server Side Javascript: Why?

Is the use of server side javascript prevalent? Why would one use it as opposed the any other server side scripting? Is there a specific use case(s) that makes it better than other server side languages?
Also, confused on how to get started experimenting with it, I'm on freeBSD, what would I need installed in order to run server side javascript?

It goes like this:
Servers are expensive, but users will give you processing time in their browsers for free. Therefore, server-side code is relatively expensive compared to client-side code on any site big enough to need to run more than one server. However, there are some things you can't leave to the client, like data validation and retrieval. You'd like to do them on the client, because it means faster response times for the users and less server infrastructure for yourself, but security and accessibility concerns mean server-side code is required.
What typically happens is you do both. You write server-side logic because you have to, but you also write the same logic in javascript in hopes of providing faster responses to the user and saving your servers a little extra work in some situations. This is especially effective for validation code; a failed validation check in a browser can save an entire http request/response pair on the server.
Since we're all (mostly) programmers here we should immediately spot the new problem. There's not only the extra work involved in developing two sets of the same logic, but also the work involved in maintaining it, the inevitable bugs resulting from platforms don't match up well, and the bugs introduced as the implementations drift apart over time.
Enter server-side javascript. The idea is you can write code once, so the same code runs on both server and client. This would appear to solve most of the issue: you get the full set of both server and client logic done all at once, there's no drifting, and no double maintenance. It's also nice when your developers only need to know one language for both server and client work.
Unfortunately, in the real world it doesn't work out so well. The problem is four-fold:
The server view of a page is still very different from the client view of a page. The server needs to be able to do things like talk directly to a database that just shouldn't be done from the browser. The browser needs to do things like manipulate a DOM that doesn't match up with the server.
You don't control the javascript engine of the client, meaning there will still be important language differences between your server code and your client code.
The database is normally a bigger bottleneck than the web server, so savings and performance benefit ends up less than expected.
While just about everyone knows a little javascript, not many developers really know and understand javascript well.
These aren't completely unassailable technical problems: you constrain the server-supported language to a sub-set of javascript that's well supported across most browsers, provide an IDE that knows this subset and the server-side extensions, make some rules about page structure to minimize DOM issues, and provide some boiler-plate javascript for inclusion on the client to make the platform a little nicer to use. The result is something like Aptana Studio/Jaxer, or more recently Node.js, which can be pretty nice.
But not perfect. In my opinion, there are just too many pitfalls and little compatibility issues to make this really shine. Ultimately, additional servers are still cheap compared to developer time, and most programmers are able to be much more productive using something other than javascript.
What I'd really like to see is partial server-side javascript. When a page is requested or a form submitted the server platform does request validation in javascript, perhaps as a plugin to the web server that's completely independent from the rest of it, but the response is built using the platform of your choice.

I think a really cool use of server-side Javascript that isn't used nearly often enough is for data validation. With it, you can write one javascript file to validate a form, check it on the client side, then check it again on the server side because we shouldn't trust anything on the client side. It lets you keep your validation rules DRY. Quite handy.
Also see:
Will server-side Javascript take off? Which implementation is most stable?
When and how do you use server side JavaScript?

Javascript is just a language. Because it is just a language, you can use it anywhere you want... in your browser, on the server, embedded in other applications, stand-alone applications, etc.
That being said, I don't know that there is a lot of new development happening with "Server-Side Javascript"

Javascript is a perfectly good language with a self / scheme prototype style base and a C style syntax. There are some problems, see Javascript the Good Parts, but in general it's a first rate language. The problem is that most javascript programmers are terrible programmers because it's very accessible to get started.
One team at google built out Rhino on Rails, which is an MVC framework like Ruby on Rails which is written in javascript and runs on Rhino a javascript interpreter for the Java VM. In this case they had a requirement to use the Java VM, but wanted to get a language which was fast (javascript is fast), supported duck typing, and was flexible.
Another example is something like CouchDB, a document oriented database which uses json as it's transport format and javascript as it's query & index language. They wanted the database to be as web native as possible.
Javascript is good at string and dom (xml) manipulation, being sandboxed, networking, extending itself, etc... Those kind of features are the thing you often do when developing web applications.
All that said, i don't actually develop server side javascript. It's not a bad idea, but definitely less common.

We use javascript on the client because it is there, not because from a list of languages it was our choice. I wouldn't choose it for any chore on the server.
You can run any language you like on the server, in fact, as many as you like.
javascript is reliable and easy to use, but it is just too labor intensive for common tasks on the server.

I have used both Javascript (NodeJS) and compiled languages (such as Java or C#.NET). There are huge discussions on the internet about which is preferable. This question is very old (2009). Since then the Javascript world has changed considerably, whereas honestly the Java world has not changed much (relatively).
There have been huge advancements in the Javascript world with Typescript, amazing frameworks (such as Next.JS), reactive programming, functional programming, GraphQL, SSR etc.
When I look at compiled code, especially Java code it all still seems to be the same old tools - Spring (maybe SpringBoot) and Jackson. .NET has advanced server side, but not to the extent of the JS world.
Sure my list there can be used with several languages, but I believe Javascript has improved the software engineering world considerably.
Server side development with Javascript, Typescript and NodeJs is engaging, fun and productive. Use it and enjoy it. Like millions are today.

We Keep Coding

JavaScript is the programming language of the Web.