Can eval() be made safe by pre-parsing the passed expression? - javascript

I understand at a high level why one would not want to allow arbitrary code to execute in a web browser via the JS eval() function.
But I wonder if there are any practical approaches to preventing attacks by parsing the code that is passed to eval() to check that it is safe. For example:
disallowing any flow control functions, e.g. for, while. (Should stop infinite loops)
disallowing any variable names / function calls that don't match a whitelist. (Should stop any access to the DOM, built-in APIs, or malicious functions)
If you don't think this can be done safely, could you describe the predicted pitfalls? It's valuable to me if somebody says "this isn't practical because X" rather than just some blanket statement. Trust me - if I can't convince myself with certainty that it can be done safely, I won't do it.
I know that I can write my own my expression evaluator or use a 3rd-party library that does the same. And I may do that. But I remain interested in using eval() because of certain advantages - native implementation performance and language consistency.

Yes, in general this is possible - basically you develop your own programming language that you know does only safe operations, you write your own parser for it, you write your own interpreter for it, and then you optimise that interpreter into a compiler targeting JavaScript that runs the result through eval.
However, using JavaScript as the base language and then stripping away unsafe parts, or even whitelisting some things, is not a good approach. The "whitelisting" would need to be sophisticated enough that starting to develop your own language is generally simpler. The two example restriction you've presented in your question fail to reach their goals: to avoid infinite execution you also need to prevent recursion, and to prevent access to builtins you more or less also need to prevent dynamic property access. A lot of work has been done to define a proven "safe" subset of ECMAScript that one could fearlessly evaluate, and believe me, it is far from trivial.
So no, this is not a practical approach.

Related

Is `Object.freeze(Object.prototype)` only the hazard for extending `Object.prototype` with Symbols?

So, fundamentally, this question is Not opinion-based. I seriously pursuit this issue objectively without feeling mostly arisen from the predominant opinion - Why is extending native objects a bad practice?
and this quesion is related but unanswered questions:
If Object.prototype is extended with Symbol property, is it possible to break code without specifically designed APIs to access Symbol in JavaScript?
Should the extension of built-in Javascript prototypes through symbols also be avoided?
The first question is already closed as they say it's opinion based, and as you might know in this community once a question is Banned, however we modified it, moderators will never bother to re-open. That is the way how the things work here.
For the second question. For some unknown reason, the question has been taken more seriously and not considered as opinion based although the context is identical.
There are two parts to the "don't modify something you don't own" rule:
You can cause name collisions and you can break their code.
By touching something you don't own, you may accidentally overwrite
something used by some other library. This will break their code in
unexpected ways.
You can create tight dependencies and they can break your code.
By binding your code so tightly to some other object, if they make
some significant change (like removing or renaming the class, for
example), your code might suddenly break.
Using symbols will avoid #1, but you still run into #2. Tight dependencies between classes like that are generally discouraged. If the other class is ever frozen, your code will still break. The answers on this question still apply, just for slightly different reasons.
Also, I've read opinions(how can we discuss such a thing here without "opinion" base?), they claim
a) Library code Using Symbols exists and they may tweak Symbol API (such as Object.getOwnPropertySymbols())
b) extending object property with Symbol is not different from non-Symbol property fundamentally.
Here, for the major rationale of untouching Object.prototype is due to #1, almost all answers I saw claimed that and we don't have to discuss if there is no Symbol usage.
However, Using symbols will avoid #1 as they say. So most of the traditional wisdom won't apply anymore.
Then, as #2 says,
By binding your code so tightly to some other object, if they make some significant change (like removing or renaming the class, for example), your code might suddenly break.
well, in principle, any fundamental API version upgrade will break any code. The well-known fact is nothing to do with this specific question. #2 did not answer the question.
Only considerable part is Object.freeze(Object.prototype) can be the remaining problem. However, this is essentially the same manner to upgrade the basic API by some other unexpectedly.
As the API users not as API providers, the expected API of Object.prototype is not frozen.
If some other guys touch the basic API and modifies it as frozen, it is he/she who broke the code. They upgraded the basic API without notice.
For instance, in Haskell, there are many language extensions. Probably they solve the collision issue well, and most importantly, they won't "freeze" the basic API because freezing the basic API would brake their eco.
Therefore, I observe that Object.freeze(Object.prototype) is the anti-pattern. It cannot be justified as a matter of course to prevent Object.prototype extension with Symbols.
So here is my question. Although I observe this way, is it safe to say:
In case of that Object.freeze(Object.prototype) is not performed, which is the anti-pattern and detectable, it is safe to perform extending Object.prototype with Symbols?
If you don't think so, please provide a concrete example.
Is it safe? Yes, if you are aware of all the hazards that come with it, and either choose to ignore them, shrug them off, or invest effort to ensure that they don't occur (testing, clear documentation of compatibility requirements), then it is safe. Feel free to do it in your own code where you can guarantee these things.
Is it a good idea? Still no. Don't introduce this in code that other people will (have to) work with.
If some other guys touch the basic API and modifies it as frozen, it is he/she who broke the code. Therefore, I observe that Object.freeze(Object.prototype) is the anti-pattern.
Not quite. If you both did something you shouldn't have done, you're both to blame - even if doing only one of these things gets away with working code. This is exactly what point #2 is about: don't couple your code tightly to global objects that are shared with others.
However, the difference between those things is that freezing the prototype is an established practice to harden an application against prototype pollution attacks and generally works well (except for one bit),
whereas extending the prototype with your own methods is widely discouraged as a bad practice (as you already found out).
In Haskell, there are many language extensions. Probably they solve the collision issue well, and most importantly, they won't "freeze" the basic API because freezing the basic API would brake their eco.
Haskell doesn't have any global, shared, mutable object, so the whole problem is a bit different. The only collision issue is between identifiers from "star-imported" modules, including the prelude from the base API. However, this is per module, not global, so it doesn't break composability as you can resolve the same identifier to different functions in separate modules.
Also yes, their base API is frozen and versioned, so they can evolve it without breaking old applications (who can continue using old dependencies and old compilers). This is a luxury that JavaScript doesn't have.
Is it safe to extend Object.prototype with a pipe symbol so that something[pipe](f) does f(something), like something |> f in F# or the previous proposal of pipe-operator?
No, it's not safe, not for arbitrary values of something. Some obvious values where this doesn't work are null and undefined.
However, it doesn't even work for all objects: there are objects that don't have Object.prototype on their prototype chain. One example is Object.create(null) (also done for security purposes), another example are objects from other realms (e.g. iframes). This is also the reason why you shouldn't expect .toString() to work on all objects.
So for your pipe operator, better use a static standalone method, or just use a transpiler to get the syntax you actually want. An Object.prototype method is only a bad approximation.
Extending the Object prototype is a dangerous practice.
You have obviously done some research and found that the community of javascript developers overwhelmingly considers it to be a very bad practice.
If you're working on a personal project all by yourself, and you think the rest of us are all cowards for being unwilling to take the risk, then by all means: go ahead and modify your Object prototype! Nobody can stop you. It's up to you whether you will be guided by our advice. Sometimes the conventional wisdom is wrong. (Spoiler: in this case, the conventional wisdom is right.)
But if you are working in a shared repository, especially in any kind of professional setting, do not modify the Object prototype. Whatever you want to accomplish by this technique, there will be alternative approaches that avoid the dangers of modifying the base prototypes.
The number one job of code is to be understood by other developers (including yourself in the future), not just to work. Even if you manage to make this work, it is counterintuitive, and nobody who comes after you will expect to find this. That makes it unacceptable by definition, because what matters here is reasonable expectations, NOT what the language supports. Any person who fails to recognize that professional software development is a team effort has no place writing software professionally.
You are not going to find a technical limitation to extending the Object prototype. Javascript is a very flexible language -- it will give you plenty of rope with which to hang yourself. That does not mean it's a good idea to place your head in the noose.

Implement a javascript sandbox using new Harmony direct proxies

I found this gist to implement a sandbox for 3rd-party code using with and the Harmony direct proxies. How useful is it? Would it be possible to implement a proper javascript sandbox using proxies? What are the quirks and / or downsides of this approach?
(I'm looking for a javascript-only solution in this question, so no Caja and similar server-side projects)
In principle, that approach should probably work. However, a couple of things to note:
Clearly, this requires putting all untrusted code into the with-scope. In practice, that might become rather unwieldy.
Moreover, it subtly changes the meaning of outermost var/function declarations contained in that code, which now become local instead of being properties on the global object. Undeclared variables, on the other hand, will still end up on the global object. This may break some programs.
Because of the insane semantics of 'with', modern JavaScript VMs give up most attempts to optimise code in its scope. Generated code can easily be two orders of magnitude slower for something that has a 'with'.
So overall, I wouldn't recommend this approach. You are far better off with SES or Caja (not sure in which sense you call those server-side).
(It's also worth noting that ES6's module loaders will provide a cleaner way to sandbox the global object. But it is hard to tell when those will become available. Not soon.)

Resources on variable type prediction for Javascript?

Quoting from this academic paper: Syntactic and Semantic Prediction in Dynamic
Languages:
IntelliSense is practically based on the knowledge of variables type.
Hence the symbols describing variables have to contain their type if
it's known and if it could be resolved Moreover in dynamic languages
one variable can carry different type instance in different parts of
the code and in different program executions. Also the variable can
be initiated dynamically, and its type can be known only at runtime.
Therefore the symbol should contain some list of possible types
resolved within semantic analysis. But in general it cannot be
resolved definitely in dynamic languages; even single program
execution won't help.
Where can I find more resources on this, I am interested especially on the emphasized statements in the above block-quote, eventually some statistics on the rate of prediction on static analysis of code.
What this says is essentially the famous Turing incomputability result: in general, you can't know the answer to what a computation does (or generates as a type).
While this is true in general, it says nothing about specific circumstances. A little thought should convince you that if programmers didn't have some idea what the type of some identifier was, they probably couldn't write any code that used it. So the original coders know (unless they wrote buggy code). But they know by making assumptions about the rest of the system, and then enforcing those assumptions (often elsewhere).
A static analyzer doesn't know what assumptions the programmers made, so it can't be as precise. But in many specific circumstances a static analysis can infer the type. The question is, how much of the code does it have, and can it interpret that code using the deep semantics that make the languages behind the code?
I'm always bothered by the concept that "analyzers can't be (as) good (as humans)". If an analyzer has access to the same information the human has, it should be at least as good, and often better; it can keep track of interactions in ways people cannot. More importantly, if the static analyzer hasn't got access to key assumptions behind your code, how can you expect other programmers to work on that code? What a proiri reason exists that insists they know all the background/assumptions used in some block of code?
So I think much of the limitation of static analyzers, even applied to dynamic languages, is caused by our unwillingness or inability to write down the assumptions that we use when we write the code. (After that, there's the problem of the energy to engineer a suitably strong analyzer).

In Browser Javascript Editor and Execution

I am developing an Enyo web application and would like to allow users to write their Javascript code in the browser and execute it.
I can do this by using window.eval. However, I have read about the evils of eval.
Is there anyone that could shed some light on how examples like http://learn.knockoutjs.com/, http://jsfiddle.net, etc do in browser execution safely and what the best practices are?
Eval is considered evil for all but one specific case, which is your case of generating programs during runtime (or metaprogramming). The only alternative would be to write your parser/interpreter (which can be done relatively easily in javascript, but rather for a simpler language than javascript itself - I did it and it was fun). Thus using eval() function here is legitimate (for making a browser-side compiler to a reasonably fast code, you need to use eval for generated compiled javascript anyway).
However, problem with eval is security, because evaluated code has the same privileges and access to its environment as your script that runs it. This is a topic quite hot recently and EcmaScript 5 was designed to partially address this issue by introducing strict mode, because the strict-mode code can be statically analyzed for dangerous operations.
This is usually not enough (or problematic for backward compatibility reasons), so there are approaches like Caja that solves security by analyzing the code on a server and allows only strict safe subset of javascript be used.
Another often used approach is protect the user, but not protecting from malicious attacks using running the user generated javascript in an <iframe> element embedded in the parent page (usually used by sites like jsfiddle). But it is not secure for the iframe can access its parent page and get to its content.
Even in this iframe approach there has been some progress recently e.g. in chrome to make it less vulnerable by using sandbox attribute
<iframe src="sandboxedpage.html" sandbox="allow-scripts"></iframe>
where you can even specify different privileges.
Hopefully, we will have an easy way to use safe and easy metaprogramming soon, but we are not there yet.

What's low level Javascript?

I've seen the term "Low level Javascript" come up a few times but I've no idea what it means. Google shows no results surprisingly. Can someone shed some light on it?
As of 2012, someone saying "low level JavaScript" could be refering to LLJS. It's a subset of JavaScript that compiles to a JavaScript code that is garbage collector friendly but unreadable (making heavy usage of WebGL typed arrays to manage memory).
I would say it is "javascript without using cross-browser frameworks" such as jQuery or YUI.
Can be particularly tricky when it comes to supporting multiple browsers.
Sometimes people make up their own terms when they shouldn't. "Low level Javascript" is one of them. There's nothing "low level" about Javascript. It's interpreted at run-time inside an environment of high-level abstractions, like the DOM.
It's a very specialized tool that allows You to write CPU and Memory (explicit memory management, not GC) optimized JavaScript code. Using binary data, not standard JS objects and types.
Why? Because in some cases You need top performance.
I suppose it means Javascript without any framework such as prototype/jQuery/YUI, which help with cross-browser compatibility, and generally provide a lot of useful functions, so you don't have to spend your time re-inventing the wheel.
Also, maybe it has something to do with the "new" way of doing Javascript -- i.e. object-oriented, using Frameworks, ... In opposition to the crappy code we used to seen a couple of years ago.
low level JS is concise, precise code that executes efficiently, usually taking advantage of the language's intricacies
bitwise operation, type conversions / short circuit logical operators, prototype chaining, context binding, ternary assignation, event bubbling / propagation, object referencing, using the GPU, etc.

Categories