JavaScript experts: why does `with` nullify the compiler's scope-related optimizations - javascript

Reading Kyle Simpson's You Don't Know JS: Scopes & Closures, he argues that you should stay away from both the eval() function and the with keyword because whenever the compiler sees these 2 (i'm paraphrasing), it doesn't perform some optimizations related to lexical-scope and storing the location of identifiers because these keywords could potentially modify the lexical scope therefore making the compiler's optimizations sort of incorrect (i am assuming that the optimizations are something like the compiler storing the location of each identifier so it can provide an identifier's value without searching for it when it is requested during runtime).
Now I understand why that would happen when you used the eval() keyword: your eval could be evaluating user input, and that user input could be the declaration of a new variable, which shadows another variable that you are accessing later in let's say the function that is executing, if the compiler had stored the static location, the access would return the value of the wrong identifier (since the access should have returned the value of the identifier that was declared by eval(), but it returned the value of the variable that was stored by the compiler to optimize the look-ups). So I am just assuming this is why the compiler doesn't perform its scope-related look-ups whenever it spots an eval() in your code.
But why does the compiler do the same thing for the with keyword ? The book says that it does so because with creates a new lexical scope during runtime and it uses the properties of the object passed as the argument to with to declare some new identifiers. I literally have no idea what this means and i have a very hard time trying to visualize all this since it is all the compiler-related stuff in this book is all theory.
I know that i could be on the wrong track, in that case, please kindly correct all my misunderstandings :)

The optimization referred to here is based on this fact: the variables declared within a function can always be determined via simple static analysis of the code (i.e., by looking at var/let and function declarations), and the set of declared variables within a function never changes.
eval violates this assumption by introducing the ability to mutate a local binding (by introducing new variables to a function's scope at run time). with violates this assumption by introducing a new non-lexical binding within a function whose properties are computed at runtime. Static code analysis cannot always determine the properties of a with object, so the analyzer cannot determine what variables exist within a with block. Importantly, the object supplied to with may change between executions of the function, meaning that the set of variables within that lexical section of the function can never be guaranteed to be consistent.
Consider the simple function:
function foo() {
var a, b;
function c() { ... }
...
}
All points in foo have three locally-scope variables, a, b and c. An optimizer can attach a permanent kind of "note" to the function that says, "This function has three variables: a, b, and c. This will never change."
Now consider:
function bar(egg) {
var a, b;
function c() { ... }
with(egg) {
...
}
}
In the with block, there is no knowing what variables will or won't exist. If there is an a, b or c in the with, we don't know until run time if that refers to a variable of bar or one created by the with(egg) lexical scope.
To show a semi-practical example of how this is a problem, finally consider:
function baz(egg) {
with(egg) {
return function() { return whereami; }
}
}
When the inner function executes (e.g., bar({...})()), the execution engine will look up the scope chain to find whereami. If the optimizer had been allowed to attach a permanent scope-note to baz, then the execution engine would immediately know look in the function's baz closure for the value of whereami, because that would be guaranteed to be the home of whereami (any similarly-named variable up the scope chain would be shadowed by the closest one). However, it doesn't know if whereami exists in baz or not, because it could be conditionally created by the contents of egg on the particular run of bar that created that inner function. Therefore, it has to check, and the optimization is not used.

Take this example:
{
let a = 1; //stored at 123
{
let b = 2; //stored at 124
console.log(a/*123*/,b/*124*/);
}
}
And now this:
{
let a = 1;//stored at 123
with({a:3}){
console.log(a /*123 ??*/);
}
}

Related

Should closures be used to structure code even if they are not depended on lexical scope? How about redeclaration inside loops?

2 questions:
Closures have the advantage of being able to access the outer scopes and therefore are a great tool in our toolbox.
Is it frowned upon just using them to structure the program if scoping is not needed?.
foo = () => {
closure = (_)=> {
...
}
if(...){
closure(bar);
}else{
closure(baz);
}
}
In this case the function does not depend on the scope and could
be moved one level higher without change in functionality. Semantically it makes sense to place it there since it will only be used inside foo.
How do closures behave if they are declared inside loops? Does redeclaration hurt performance?`
foo.forEach( x => {
closure = () => ...
})
Is it frowned upon just using them to structure the program if scoping is not needed?
There is no one way to write JavaScript code (or any other code). This part of the question calls for opinion, which is off-topic for SO. :-)
There are a couple of objective observations that can be made about doing that:
It keeps the functions private, they can only be used in the function they're created in (assuming you don't return them out of it or assign them to variables declared in an outer scope). That could be argued as being good (for encapsulation) and as bad (limited reuse).
Modules probably reduce the desire to do this a bit (though not entirely).
How do closures behave if they are declared inside loops?
A couple of things I need to call out about your example relative to the question you've asked there:
You haven't declared a function at all. You've created one, via a function expression, but you haven't declared one. (This matters to the answer; we'll come back to it in a moment.)
Your example doesn't create a function in a loop, it creates it inside another function — forEach's callback. That function is called several times, but it isn't a loop per se.
This code creates a function in a loop:
for (const value of something) {
closure = () => {
// ...
};
}
It works just like creating a function anywhere else: A new function object is created each time, closing over the environment where it was created (in this case, the environment for each iteration of the loop). This can be handy if it's using something specific to the loop iteration (like value above).
Declaring a function in a loop looks like this:
for (const value of something) {
function foo() {
// ...
}
}
Never do that in loose-mode code, only do it in strict mode (or better yet, avoid doing it entirely). The loose-mode semantics for it aren't pretty because it wasn't specified behavior for a long time (but was an "allowed extension") and different implementations handled it in different ways. When TC39 specified the behavior, they could only specify a subset of situations that happened to be handled the same way across all major implementations.
The strict mode semantics for it are fairly reasonable: A new function object is created every time and the function's identifier exists only in the environment of the loop iteration. Like all function declarations, it's hoisted (to the top of the block, not the scope enclosing the loop):
"use strict";
const something = [1, 2, 3];
console.log(typeof foo); // Doesn't exist here
for (const value of something) {
foo();
function foo() {
console.log(value);
}
}
console.log(typeof foo); // Doesn't exist here
Does redeclaration hurt performance?
Not really. The JavaScript engine only has to parse the code once, creating the bytecode or machine code for the function, and then can reuse that bytecode or machine code when creating each of the function objects and attaching them to the environment they close over. Modern engines are very good at that. If you're creating millions of temporary objects, that might cause memory churn, but only worry about it if and when you have a performance problem that you've traced to a place where you've done it.

Is there block scope really in JavaScript (as a first-class language concept)?

As we know, traditionally JS has been lacking block scope. As we know JS has had only function scope up until recently.
In recent versions of JS though, we can have let and const variables which are visible only in the scope where they are defined.
But... deep down... how is this done/implemented really? Is there really now in the language a first-class notion of block scope in JS, or is the block scope thing just some simulation to make certain variables visible only in the block where they are defined?
I mean is block scope in recent JS versions a first-class thing, just like function scope is, or is block scope just some sort of simulation while we actually still have just the old good function scope?
But... deep down... how is this done/implemented really? Is there really now in the language a first-class notion of block scope in JS...?
Yes, there is. A new block¹ creates a new lexical environment in the same way that creating a function does (without, obviously, all the other aspects of creating a function). You can see that in the Evaluation section of blocks in the spec.
It's a first-class language construct.
¹ I originally wrote "...containing a let, const, or class declaration..." but the specification doesn't actually make that distinction, though I expect JavaScript engines do (since there's no need for a new environment if there are no lexically-declared bindings).
In a comment you've asked:
What about hoisting? I read that block scoped-variables are hoisted to the top of the block they are defined in... but then also... You get an error if you try to access a block-scoped variable before the line/statement where it is declared in its block? This sounds contradictory, no? If they are hoisted we would not be getting this error but we would be getting undefined. What is the truth here?
In my book I describe them as half-hoisted: The creation of the variable (more generally, the "binding") is hoisted to the top of the scope in which its declaration (let x or whatever) appears (the block, in this case), but the binding isn't initialized until the declaration is reached in the step-by-step execution of the code. The time between creation and initialization is called the Temporal Dead Zone. You can't use the binding (at all) within the TDZ.
This only applies to let, const, and class declarations. var is handled differently in two ways: 1. Obviously, var is hoisted to the top of the function (or global) scope, not just block scope. 2. Less obviously, var bindings are both created and initialized (with the value undefined) upon entry to the scope. They're fully hoisted. (The declaration of them is; any initializer on the var statement is actually an assignment, and done when that statement is reached in the step-by-step execution of the code.)
Here's an example:
function foo(n) {
// `l1` and `l2` don't exist here at all here
// `v` is created and initialized with `undefined` here, so we can happily
// do this:
console.log(`v = ${v}`);
if (n > 10) {
// `l1` and `l2` are created here but not initialized; if we tried to
// use them here, we'd get an error; uncomment this line to see it:
// console.log(`l1 = ${l1}`);
console.log("n is > 10");
var v = "a"; // `v` is assigned the value `"a"` here, replacing the
// value `undefined`
let l1 = "b"; // `l1` is initialized with the value `"b"` here
console.log(`l1 = ${l1}`);
let l2; // `l2` is initialized with the value `undefined `here
console.log(`l2 = ${l2}`);
l2 = "c"; // `l2` is assigned the value `"c"` here, replacing the
// value `undefined`
console.log(`l2 = ${l2}`);
}
}
foo(20);
Just for completeness, function declarations are also fully-hoisted, but even more so than var: The function is actually created and assigned to the binding upon entry to the scope (unlike var, which gets the value undefined).
In a comment you've observed:
Then... I don't see what's the difference between no hoisting and half-hoisting...
Good point, I didn't explain that. The difference relates to shadowing identifiers in the outer scope. Consider:
function foo() {
let a = 1;
if (/*...*/) {
console.log(`a = ${a}`);
let a = 2;
// ...
}
}
What should the log show?
Sorry, that was a trick question; the log doesn't show anything, because you get an error trying to use a there, because the inner a declaration shadows (hides) the outer a declaration, but the inner a isn't initialized yet, so you can't use it yet. It's in the TDZ.
It would have been possible to make the outer a accessible there, or to make the inner a accessible there with the value undefined (e.g., fully hoisting it like var, but just within the block), but both of those have problems the TDZ helps solve. (Specifically: Using the outer a would have been confusing for programmers [a means one thing at the beginning of the block but something else later?!] and would have meant JavaScript engines had to create new lexical environments all over the place, basically every let or const or class would introduce a new one. And pre-initializing with undefined is confusing for programmers, as var has shown us over the years...)

Are functions set before variables in the javascript 'creation phase'?

I am doing the Udemy course Javascript: Understanding the Weird Parts right now, and I just learned about the creation phase and the execution phase that occurs when the interpreter interprets the JS.
I have a question, but I will first show you the code I am playing with:
http://codepen.io/rsf/pen/bEgpNY
b();
function b () {
console.log(a);
}
var a = 'peas';
b();
If I understand correctly, in the creation phase, the variables and functions are 'set', meaning they are given spots in memory. The variables are all given the placeholder value of undefined. Then in the execution phase, the engine executes the lines starting at the top. When b() is first called, 'a' still has the placeholder value of undefined, then 'a' is given its initial value of 'peas', b() is called again and this time 'a' has the value of 'peas'.
In my mind, one of two things has to be happening here. Alternative 1: In the creation phase, all variables are set before functions. This means that when a memory space for the function b() is created, the function includes a's value of undefined (because the 'a' memory space was already created with the value of 'undefined'). Alternative 2: the functions and variables are set in the lexical order they are in (in this case, b is created before a), and when b is created, the 'a' reference somehow means that the function is listening for any possible creation of an 'a' memory location, and when later the 'a' location is actually created, the reference refers to that spot.
Am I on the right track with either of these scenarios?
You can think of it like this.
Your original code:
b();
function b () {
console.log(a);
}
var a = 'peas';
b();
is actually executed like this:
var a;
function b () {
console.log(a);
}
b(); // log undefined because a doesn't have a value yet
a = 'peas';
b(); // log peas because a has a value
Basically all the variable and function definitions are hoisted at the top of the enclosing scope.
The order doesn't really matter because the code inside the b function doesn't get executed until you actually call the function.
If I understand correctly, in the creation phase, the variables and functions are 'set', meaning they are given spots in memory.
I would not use the term set for this--it usually is used to refer to a variable being set to (assigned) a particular value. I also would not use the term "spot" or "memory"--we don't need to worry about these internals. It's clearer just to say declared.
I also don't really like the use of the term "creation phase", which is both non-standard and confusing--what is being created, exactly? I would prefer the term "compilation".
The variables are all given the placeholder value of undefined.
To be precise, I would not say they "have the value of undefined", but rather "have no value", or "are not defined". undefined is not a value held by a variable which has not been assigned to yet; rather it's a state, which causes the variable to evaluate to the undefined value when accessed.
Alternative 1: In the creation phase, all variables are set before functions.
Yes, although again it's going to be confusing to use the word "set". Say, "all variables are declared before functions". This is the process of hoisting.
Alternative 2: the functions and variables are set in the lexical order they are in (in this case, b is created before a), and when b is created, the 'a' reference somehow means that the function is listening for any possible creation of an 'a' memory location, and when later the 'a' location is actually created, the reference refers to that spot.
No. The function does not "listen" to anything. It just executes when you tell it to.
Is this important?
Not really. It falls into the category of arcana. So we clog up our brains with rules like, variables hoist this way, function declarations hoist some other way, let has yet some other hoisting behavior. In practice, almost all style guides will call for you to declare variables at the top of the function, and linters will warn you if you don't (or can be configured to do so). This immediately eliminates all variable hoisting issues.
Some people like to put internal functions at the bottom of their function, and that works fine since, if it's a function declaration (ie function foo() { }) the whole thing (including the definition) is hoisted. If it's a function expression being assigned to a variable (ie var foo = function() { }), then it's a variable, and we already decided to put those at the top of our function--see paragraph above.
In general, if your program depends on hoisting behavior, it's written badly. If you need to understand hoisting behavior to understand how the program works, it's written badly.
To summarize, all you really need to learn is one rule: put variable declarations (and their initializations) at the top of your function. Then you don't have to worry about hoisting at all.
(There are some exceptions, such as declaring a variable inside a for statement, as in for (var i...), which is fine, assuming i is not being used for anything other than the index of the loop.)
For some reason, people learning JS seem sometimes to focus on these oddities--such as "why does " " == false or something. I would suggest instead focusing on the how to think about your problems, and break them down, and writing nice clean code that just works, and that you and other people can maintain without worrying about the arcana. I've been writing JS for many years, and cannot remember the last time I encountered a problem related to hoisting.

Does a (JS) Closure Require a Function Inside a Function

I'm having a little difficulty with the inherent concept of a closure. I get the basic idea, but here's the thing: I thought that, technically, there "is a closure" inside every Javascript function. To quote wikipedia:
In computer science, a closure (also lexical closure, function closure
or function value) is a function together with a referencing
environment for the nonlocal names (free variables) of that function.
Such a function is said to be "closed over" its free variables.
So since you can define variables inside a function, they are "closed off" to the rest of your code, and so I see that as a closure. Thus, as I understand it:
(function(){var a = 1;}())
Is a (not very useful) example of a closure. Or heck, even just this:
function(){var a = 1;}
But, I think my understanding might be wrong. Others are telling me that for something to be a closure it has to persist a state, and so since nothing persists beyond that code it's not really a closure. That suggests that you need to have:
function(foo){foo.a = 1;}(bar); // bar.a = 1
or even (to ensure un-modifiability):
function(foo){var a = 1; bar.baz = function() { return a}}(bar); // bar.baz() = 1
So, technically speaking (I know several of the examples are practically speaking pointless, but) which of the above examples are actually examples of closures. And does a closure just have to be a space (ie. inside a JS function) where variables can be stored that can't be accessed form outside, or is persistence a key part of a closure's definition?
EDIT
Just noticed the wiki definition for the "closures" tag on Stack Overflow:
A closure is a first-class function that refers to (closes over)
variables from the scope in which it was defined. If the closure still
exists after its defining scope ends, the variables it closes over
will continue to exist as well.
While the SO wiki is certainly no final authority, the first sentence does seem to correlate with my understanding of the term. The second sentence then suggests how a closure can be used, but it doesn't seem like a requirement.
EDIT #2
In case it isn't clear from the varying answers here, the wikipedia answer, and the tag answer, there does not seem to be a clear consensus on what the word "closure" even means. So while I appreciate all the answers so far, and they all make sense if you go with the author's definition of closure, what I guess I'm really looking for is ... is there any actual "authoritative" definition of the word (and then if so, how does it apply to all of the above)?
You're being led astray by a wrong assumption of where the word "closure" comes from.
In a language-theoretic context, the point of a closure is that the function can refer to variables declared outside its own definition. It is immaterial whether it has internal variables, or that the internal variables are not visible from outside. In other words it is about seeing out from the function to its definition environment, not about seeing in from outside the function.
Why the weird word, then? Look at the function in your last example:
bar.baz = function() { return a }
This function contains a mention of the variable a which is not defined in the function body itself. It is a "free" variable of the function body, sort of a "hole" in the definition. We cannot execute the function without knowing, by some extraneous means, what variable the identifier a in the body refers to. Forming a closure at run-time pairs this "open" function body with a reference to the appropriate variable, thereby closing the hole in the definition. And that's where the name comes from.
(If you want the completely technical explanation, the underlying concept is that of a "closed" term in the lambda-calculus, which means one that has no free variables. Only closed term have independent meanings. A closure is then the combination of a (usually compiled) non-closed piece of source code, together with the contextual information that lets it behave like it was a closed term, and therefore be executable).
Addendum: In the common idiom
function() {
var blah;
// some code here
}();
the point is not to get a closure (you will get one, of course, but it doesn't do anything interesting for you), but to create a local scope for the blah variable. A local scope is conceptually quite a different thing from a closure -- in fact most C-lookalikes other than Javascript will create them at every {} block, whereas they may or may not have closures at all.
None of your samples are closures technically speaking. (But forth sample can be classified as such in some circumstances, see below)
Closure is a data structure that combines reference to a function and non-empty list of call frames (or scopes) active at the moment of declaration.
Closure is created by executing some code that contains declaration of a function that uses variables from outer scopes. In this case runtime, while executing the code, has to create not just a reference to the function but closure structure - function reference and reference to its current environment - list of call frames that hold used outer variables.
For example in my TIScript call frames are replaced on stack - when you exit from a function its call frame that includes collection of variables it uses is purged from the stack. Closure creation in my case happens when: VM meets function declaration instruction and that function is marked (by compiler) as the one that uses outer variables. In this case current chain of call frames that hold used variables is moved from stack to the heap - converted to GCable data objects and reference to the function and its call chain is stored as a reference.
Your fourth case physically does not require closure to be created - no need to store call frames for later use - bar.baz contains just a number - not a reference to function.
But this:
function(foo){
var a = 1;
bar.baz = function() { return a; };
}
creates closure in bar.baz field. When you later invoke bar.baz() function code is executed and value of 'a' variable will be taken from reference to outer call frame that is stored in closure.
Hope it clears something for you.
Closures in JavaScript (and other languages) are used to control and define scope. There's no requirement that you define a function within a function for it to "qualify" as a closure. The body of a function is a Closure. One of the more common uses is to declare a local scope variable that becomes a Private or Hidden member of some other object or function you'll return, but that's not a hard-fast rule.

What is the scope of a function in Javascript/ECMAScript?

Today I had a discussion with a colleague about nested functions in Javascript:
function a() {
function b() {
alert('boo')
}
var c = 'Bound to local call object.'
d = 'Bound to global object.'
}
In this example, trials point out that b is not reachable outside the body of a, much like c is. However, d is - after executing a(). Looking for the exact definition of this behaviour in the ECMAScript v.3 standard , I didn't find the exact wording I was looking for; what Sec.13 p.71 does not say, is which object the function object created by the function declaration statement is to be bound to. Am I missing something?
This is static scoping. Statements within a function are scoped within that function.
Javascript has a quirky behavior, however, which is that without the var keyword, you've implied a global variable. That's what you're seeing in your test. Your "d" variable is available because it is an implied global, despite being written within the body of a function.
Also, to answer the second part of your question: A function exists in whatever scope it is declared, just like a variable.
Sidenote:
You probably don't want global variables, especially not implied ones. It's recommended that you always use the var keyword, to prevent confusion and to keep everything clean.
Sidenote:
The ECMA Standard isn't probably the most helpful place to find answers about Javascript, although it certainly isn't a bad resource. Remember that javascript in your browser is just an implementation of that standard, so the standards document will be giving you the rules that were (mostly) followed by the implementors when the javascript engine was being built. It can't offer specific information about the implementations you care about, namely the major browsers. There are a couple of books in particular which will give you very direct information about how the javascript implementations in the major browsers behave. To illustrate the difference, I'll include excerpts below from both the ECMAScript specification, and a book on Javascript. I think you'll agree that the book gives a more direct answer.
Here's from the ECMAScript Language Specification:
10.2 Entering An Execution Context
Every function and constructor call
enters a new execution context, even
if a function is calling itself
recursively. Every return exits an
execution context. A thrown exception,
if not caught, may also exit one or
more execution contexts.
When control
enters an execution context, the scope
chain is created and initialised,
variable instantiation is performed,
and the this value is determined.
The
initialisation of the scope chain,
variable instantiation, and the
determination of the this value depend
on the type of code being entered.
Here's from O'Reilly's Javascript: The Definitive Guide (5th Edition):
8.8.1 Lexical Scoping
Functions in JavaScript are lexically
rather than dynamically scoped. This
means that they run in the scope in
which they are defined, not the scope
from which they are executed. When a
function is defined, the current scope
chain is saved and becomes part of
the internal state of the function.
...
Highly recommended for covering these kinds of questions is Douglas Crockford's book:
JavaScript, The Good Parts http://oreilly.com/catalog/covers/9780596517748_cat.gif
Javascript, The Good Parts, also from O'Reilly.
As I understand it, these are equivalent as far as scoping is concerned:
function a() { ... }
and
var a = function() { ... }
It seems important to note that while d is being created as a "global", it is in reality being created as a property of the window object. This means that you could inadvertently be overwriting something that already exists on the window object or your variable might actually fail to be created at all. So:
function a() {
d = 'Hello World';
}
alert(window.d); // shows 'Hello World'
But you cannot do:
function a() {
document = 'something';
}
because you cannot overwrite the window.document object.
For all practical purposes you can imaging that all of your code is running in a giant with(window) block.
Javascript has two scopes. Global, and functional. If you declare a variable inside a function using the "var" keyword, it will be local to that function, and any inner functions. If you declare a variable outside of a function, it has global scope.
Finally, if you omit the var keyword when first declaring a variable, javascript assumes you wanted a global variable, no matter where you declare it.
So, you're calling function a, and function a is declaring a global variable d.
...
function a() {
function b() {
alert('boo')
}
var c = 'Bound to local call object.'
d = 'Bound to global object.'
}
without being preceded by var, d is global. Do this to made d private:
function a() {
function b() {
alert('boo')
}
var c = 'Bound to local call object.'
var d = 'Bound to local object.'
}

Categories