My only requirement for executing this arbitrary js code is that certain global variables/functions (e.g. setInterval) are not exposed.
My current strategy involves parsing through the js code and making a var declaration (at the beginning of the enclosing closure) for every global reference.
I'm wondering if there's any other obvious way to solving this.
Also just to clarify, this arbitrary code is not being run with things like eval. Rather, it's being wrapped inside a closure and appended to the base code.
One of the options is to override the globals by supplying your own function or variables. Example:
window.alert = function() {
// your code goes here
// Optionally call window.alert if needed
}
This should be manageable if you have a small finite list of things you wish to hide. This will make them globally unavailable.
Related
Loading a module from a source dynamically:
var src="HERE GOES MY SOURCE"
var Module = module.constructor;
var m = new Module();
m._compile(src, 'a-path-that-does-not-exist');
Need to achieve following:
Pass some variables/functions so that they can be used inside the src script globally. Can set them in "m.foo", but want the script to use "foo" without using "module.foo". "global.foo" works, but see the point 2.
How to restrict the src script from accessing global scope?
How to restrict the src from loading other modules using require or other means.
How to restrict the src from running async operations?
All, I can think of is to wrap the script in its own function, kind of like nodejs already does for commonJS modules. This is the regular wrapper.
(function(exports, require, module, __filename, __dirname) {
// Module code actually lives in here
});
If you wrap that user code with your own wrapper and then when you call it to execute it, you can define your own values for require, module and any other semi-global symbols.
If you also put 'use strict'; as the very first line of your wrapper function (before any of the user code), then that will eliminate default assignment to the global object with just something like x = 4 because that will be an error without explicitly defining x first. If you then also make your own global object and pass it as an argument, that can keep anyone from assigning to the real global object. I don't think you can prevent implicit read access to pre-existing globals.
So, your wrapper could look like this:
(function(exports, require, module, __filename, __dirname, global) {
'use strict';
// insert user code here before evaluating it with eval()
// and getting the function which you can then call and pass the desired arguments
});
Then, when you call this function, you pass it the values you want to for all the arguments (something other than the real ones).
Note, it's hard to tell how leak-proof this type of scheme really is. Any real security should likely be run in a resource restricted VM.
Another idea, you could run in a Worker Thread which has it's own virgin set of globals. So, you do all of the above and run it in a Worker Thread.
Addressing your questions in comments:
Does the 'use strict'; need to go inside the wrapper function or outside?
It needs to be the first line of code inside the wrapper function, right before where you insert the user code. The idea is to force that function scope (where the user code lives) inside that wrapper to be in strict mode to limit some of the things it can do.
Could you explain the "I don't think you can prevent implicit read access to pre-existing globals."? If i provide my own object as global, how can the inner script access preexisting globals?
Any code, even strict mode code can access pre-existing globals without the global prefix. While you can prevent the code from creating new globals by shadowing it with your own global in the wrapper function arguments and by forcing it into strict mode, you can't prevent strict mode code from reading existing globals because they can do so without the global prefix. So, if there's a pre-existing global called "foo", then existing code can reference that like:
console.log(foo);
or
foo = 12;
If there is no foo in a closer scope, the interpreter will find the foo on the global object and use that.
Note that strict mode prevents the automatic creation of a new global with something like:
greeting = "happy birthday"
Could you elaborate more no "resource restricted VM"?
I was talking about real hardware/OS level VMs that allow you to fully control what resources a process may use (disk access, sockets, memory, hardware, etc...). It's essentially a virtual computer environment separate from any other VMs on the same system. This is a more rigorous level of control.
WorkerThread is a very interesting concept. will take a look! My understanding was that WorkerThreads provide memory isolation and the only way to share data is by sending messages (effectively creating copies)?
Yes, Worker Threads provide pretty good isolation as they start up a whole new JS engine and have their own globals. They can shared ArrayBuffers in certain ways (if you choose to do that), but normal JS variables are not accessible across thread boundaries. They would normally communicate via messaging (which is automatically synchronized through the event queue), but you could also communicate via sockets if you wanted.
In web environment think this code:
window.alert('foo'); //code1
alert('foo'); //code2
Why code2 is higher than code1 in performance? Why does this happened?
First, remember that many globals (including all of the traditional predefined globals) are properties of the global object, and that on browsers that global object is the window object. window isn't special, it's just a predefined global that refers to the global object. alert is also a predefined global.
window.alert(...) is slower than alert(...) because in window.alert(...) the JavaScript engine first has to:
Go look up the identifier window, which it ultimately finds as a property of the global object, then
Look up alert in that object's properties, then
Call the function it found
In alert(...), it only has to:
Go look up the identifier alert, which it ultimately finds as a property of the global object, then
Call the function it found
So instead of looking for two things, it only has to look for one.
Less work = faster performance.
Having said that, two observations:
The speed with which you show an alert is...not usually important. :-) alert does, after all, seize up the JavaScript engine until the user dismisses it.
Even with something that doesn't seize up the JavaScript engine, the difference between window.xyz(...) and xyz(...) will be trivially small in most cases. Typically you see window.xyz(...) for other reasons (such as some other xyz being in scope when you want to access the global one). (I'm not saying they're good reasons; typically you can avoid that, and of course one tries to avoid globals wherever possible anyway...)
If performance calling a global function matters (by definition, this means you're calling it a lot), you probably don't want to use the global reference to it anyway; grab a local reference to it in your inner scope (var f = theGlobalFunction;) and then use f(); to call it repeatedly. This avoids both lexical environment traversal and lookup on the global object. And again: Only matters in very rare cases; don't do it until/unless you've identified an observable performance problem and narrowed it down to the global function call. :-)
The time for that first lookup (the global) depends on how deeply nested the code is (and thus how many lexical environment objects need searching before the global one is reached) and how many properties the global object has. The second part, looking up the property, is only dependent on the second one. A truly hideously massive mass of globals (such as is created if you use ids on lots of elements, since those get added as automatic globals) can conceivably slow down global lookup time.
So this is puzzling to me. Are variables expensive?
Let's take a look at this code:
var div1 = document.createElement('div');
var div2 = document.createElement('div');
div2.appendChild(div1);
Now this one
var div2 = document.createElement('div');
div2.appendChild(document.createElement('div'));
So, on the second example, I'm doing the same thing as on the first one. Is it more expensive to declare a variable than not? Do I save memory by just creating and using an element on the fly?
EDIT: this is a sample code to illustrate the question. I know that in this specific case, the memory saved (or not) in minimal.
Variables are extremely cheap in JavaScript, yet, no matter how cheap they are, everything has a cost.
Having said that, it's not a noticable difference, and you should be thinking more about making your code readable rather than performing micro-optimizations.
You should be using variables to save repeated operations. In your example above, it's likely you need the variable pointing to the newly created div, or you'll be able to do nothing to it... unless you end up retrieving it from the DOM; which will prove many times more costly than the variable, as DOM manipulation is one of the slowest parts of JavaScript.
In theory, there is a tiny little amount of memory you save by not doing the variable declaration, because a variable declaration requires that the JavaScript engine creates a property on the variable binding object for the execution context in which you declare it. (See the specification, Section 10.5 and associated sections.)
In reality, you will never notice the difference, not even on a (relatively) slow engine like IE's. Feel free to use variables wherever they make the code clearer. Creating a property on a variable binding object (e.g., declaring a variable) is a very, very, very quick operation. (Avoid it at global scope, of course, but not because of memory use.)
Caveat: If the variable refers to a big memory structure that you only hold temporarily, and you create closures within that function context, the big memory structure may (in some engines) be kept in memory longer than necesary. Example:
function foo(element) {
var a = /* ...create REALLY BIG structure here */;
element.addEventListener("event", function() {
// Do something **not** referencing `a` and not doing an `eval`
}, false);
}
In an unsophisticated engine, since the event handler is a closure over the context of the call to foo, in theory a is available to the handler and so what it references remains "referencible" and not available for garbage collection until/unless that event handler function is no longer referencible. In any decent engine (like the V8 engine in Chrome), because you neither refer to a nor use eval within the closure (event handler), it's fine. If you want to defend yourself from mediocre engines, set a to undefined before foo returns, so the engine (no matter how mediocre) knows that the event handler won't be referencing that structure even if it were to reference a.
Yes, you save memory, no it is in no way an amount that matters.
JavaScript: The Good Parts defines these kinds of declarations as bad:
foo = value;
The book says "JavaScript’s policy of making forgotten variables global creates
bugs that can be very difficult to find."
What are some of the problems of these implied global variables other than the usual dangers of typical global variables?
As discussed in the comments on this answer, setting certain values can have unexpected consequences.
In Javascript, this is more likely because setting a global variable actually means setting a property of the window object. For instance:
function foo (input) {
top = 45;
return top * input;
}
foo(5);
This returns NaN because you can't set window.top and multiplying a window object doesn't work. Changing it to var top = 45 works.
Other values that you can't change include document. Furthermore, there are other global variables that, when set, do exciting things. For instance, setting window.status updates the browser's status bar value and window.location goes to a new location.
Finally, if you update some values, you may lose some functionality. If, for instance, you set window.frames to a string, for instance, you can't use window.frames[0] to access a frame.
Global variable make it very difficult to isolate your code, and to reuse it in new contexts.
Point #1:
If you have a Javascript object that relies on a global var. you will not be able to create several instances of this object within your app because each instance will change the value of the global thereby overwriting data the was previously written by the another instance.
(Unless of course this variable holds a value that is common to all instances - but more often than not you'll discover that such an assumption is wrong).
Point #2:
Globals make it hard to take existing pieces of code and reuse them in new apps. Suppose you have a set of functions defined in one file and you want to use them in some other file (in another app). So you extract them to a new file and have that file included in the new app. If these function rely on a global your 2nd app will fail at runtime because the global variable is not there. The dependency on globals is not visible in the code so forgetting these variables (when moving functions to new files) is a likely danger.
They're global variables, so yes, all of the "usual dangers" apply. The main thing that distinguishes them from global variables in other languages is that:
You don't explicitly declare them in a global scope. If you mistakenly omit var in a variable declaration, you've accidentally declared a global variable. JavaScript makes it a little too easy to unintentionally declare global variables; contrast this with Scheme, which raises an error if a variable is not defined in the global scope.
Global variables, at least in the browser, are aliased by window[variable_name]. This is potentially worrisome. For example, some of your code might access window['foo'] (with the intention of accessing a global variable). Then, if you accidentally type foo instead of var foo elsewhere in the program, you have declared a reference to window['foo'], which you meant to keep separate.
One issue is that you may be trampling on already defined variables and not know it, causing weird side effects in other parts of the code that can be a bear to track down.
Another is that it is just sloppy code. You should not be creating variable with more scope than they need since at the very least it keeps more variables in memory and at worst it can create data scenarios you didn't intend.
The bottom line is that when you do that you don't know for sure that you are not messing up other functions that use a global variable of the same name. Sometimes it isn't even your fault, a lazy programmer of another plugin left something global that was meant to have scope inside of a function. So it is a very practical safeguard for writing better and less buggy code.
The problems of typical global variables is that they are, well, global - there is no scope to enclose them, and any code that you are executing / interacting with (such as a library that you call down the road) could modify the variable without a warning.
However, these problems are compounded in Javascript by two things:
You can define a global variable anywhere - the only requirement for that is to forget the var keyword.
It is extremely easy to define a global variable when you had no intent to do so. That is the problem that "implied" globals have over "typical" globals - you will create them without even knowing you did.
In a reasonably-designed language that includes truly global variables (ok, so not that reasonably-designed), you would have a limited handful of places to define globals, and it would require a special keyword to do so.
This is perhaps a dumb question, but I am new to Javascript and desperate for help.
If the Javascript engine will look for global variables outside of a function, then what is the point of passing parameters to it? What do you gain?
I understand that global variables are generally frowned upon, but I still don't understand the purpose of passing variables. Does it have something to do with data encapsulation?
There's a few magic words that are used by programmers to describe different kinds of functions. Here's a few:
Re-entrant
ThreadSafe
Referentially Transparent
Idempotent
Pure
Side-Effects
You can look some of them up if you want a headache. The point is that Computer science and engineering progress has always been about reducing complexity. We have spent quite a lot of time thinking about the best way to write a function to achieve that goal. Hopefully, you can stuff tiny bits of your program into your head at a time, and understand those bits, without having to also understand the overall functioning of the entire program simultaneously, or the detailed implementation of the insides of all the other functions. A function that uses global variables can't do that very well because:
You can't guarantee that the global variables exist
You can't guarantee that the global variables are what you think they are
You can't guarantee that other parts of the program haven't modified those variables in a way you didn't expect.
You can't easily generalise to use the function multiple times on multiple sets of variables.
You can't easily verify that the function works as advertised without first setting up the function's external environment and its dependencies.
If the global variables have changed in a way you didn't expect, it's really hard to track down which part of the program is the culprit. It could be any of 500 different functions that write to that variable!
On the other hand, if you explicitly pass in all the data a function needs to operate, and explicitly return all the results:
If something goes wrong with any of those variables, it's easy to find the source of the problem
It's easier to add code to verify the "domain" of your inputs. Is it really a string? Is it over a certain length, is it under a certain length? Is it a positive number? is it whole, or fractional? All these assumptions that your code needs to operate correctly can be explicit at the start of the function, instead of just crossing your fingers and hoping nothing goes wrong.
It's easier to guess what a particular function will actually do, if its output depends only on its input.
a function's parameters are not dependant on the naming of any external variables.
And other advantages.
if you were only going to use global variables that the functions worked on then you'd always have to know the inner workings of the functions and what your global variable names had to be for them to work.
also, something like Math.abs(n) would be hard to call twice in one line if using global variables.
Functions are reusable components of your code, that executes a particular snippet on the provided variable exhibiting varying behavior.
Encapsulation comes from being Object Oriented. Functions are more for giving structure to your program.
Also, you shouldn't undermine the execution time of a method, if the variable it access exists in the context rather than being global.
If you don't need parameters to be passed to functions, then, you really don't need functions.
Functions are usually (and should be) used to provide code re-use -- use the same function on different variables. If a function accesses global variables, then every time I use it it will perform the same action. If I pass parameters, I can make it perform a different action (based on those different parameters) every time I use it.
One of the main benefits is that it keeps all the information the function needs, nearby. It becomes possible to look at just the function itself and understand what its input is, what it does, and what its output will be. If you are using global variables instead of passing arguments to the function, you’ll have to look all over your code to locate the data on which the function operates.
That’s just one benefit, of many, but an easy one to understand.
Global variables (in any language) can sometimes become stale. If there is a chance of this it is good to declare, initialise and use them locally. You have to be able to trust what you are using.
Similarly, if something/someone can update your global variables then you have to be able to trust the outcome of what will happen whenyou use them.
Global variables are not always needed by everything so why keep them hanging around?
That said, variables global to a namesapce can be useful especially if you are using something like jquery selectors and you want to cache for performance sake.
Is this really a javascript question? I haven't encountered a language that doesn't have some form of global variables yet.