as ECMAScriptv5, each time when control enters a code, the enginge creates a LexicalEnvironment(LE) and a VariableEnvironment(VE), for function code, these 2 objects are exactly the same reference which is the result of calling NewDeclarativeEnvironment(ECMAScript v5 10.4.3), and all variables declared in function code are stored in the environment record componentof VariableEnvironment(ECMAScript v5 10.5), and this is the basic concept for closure.
What confused me is how Garbage Collect works with this closure approach, suppose I have code like:
function f1() {
var o = LargeObject.fromSize('10MB');
return function() {
// here never uses o
return 'Hello world';
}
}
var f2 = f1();
after the line var f2 = f1(), our object graph would be:
global -> f2 -> f2's VariableEnvironment -> f1's VariableEnvironment -> o
so as from my little knowledge, if the javascript engine uses a reference counting method for garbage collection, the object o has at lease 1 refenrence and would never be GCed. Appearently this would result a waste of memory since o would never be used but is always stored in memory.
Someone may said the engine knows that f2's VariableEnvironment doesn't use f1's VariableEnvironment, so the entire f1's VariableEnvironment would be GCed, so there is another code snippet which may lead to more complex situation:
function f1() {
var o1 = LargeObject.fromSize('10MB');
var o2 = LargeObject.fromSize('10MB');
return function() {
alert(o1);
}
}
var f2 = f1();
in this case, f2 uses the o1 object which stores in f1's VariableEnvironment, so f2's VariableEnvironment must keep a reference to f1's VariableEnvironment, which result that o2 cannot be GCed as well, which further result in a waste of memory.
so I would ask, how modern javascript engine (JScript.dll / V8 / SpiderMonkey ...) handles such situation, is there a standard specified rule or is it implementation based, and what is the exact step javascript engine handles such object graph when executing Garbage Collection.
Thanks.
tl;dr answer: "Only variables referenced from inner fns are heap allocated in V8. If you use eval then all vars assumed referenced.". In your second example, o2 can be allocated on the stack and is thrown away after f1 exits.
I don't think they can handle it. At least we know that some engines cannot, as this is known to be the cause of many memory leaks, as for example:
function outer(node) {
node.onclick = function inner() {
// some code not referencing "node"
};
}
where inner closes over node, forming a circular reference inner -> outer's VariableContext -> node -> inner, which will never be freed in for instance IE6, even if the DOM node is removed from the document. Some browsers handle this just fine though: circular references themselves are not a problem, it's the GC implementation in IE6 that is the problem. But now I digress from the subject.
A common way to break the circular reference is to null out all unnecessary variables at the end of outer. I.e., set node = null. The question is then whether modern javascript engines can do this for you, can they somehow infer that a variable is not used within inner?
I think the answer is no, but I can be proven wrong. The reason is that the following code executes just fine:
function get_inner_function() {
var x = "very big object";
var y = "another big object";
return function inner(varName) {
alert(eval(varName));
};
}
func = get_inner_function();
func("x");
func("y");
See for yourself using this jsfiddle example. There are no references to either x or y inside inner, but they are still accessible using eval. (Amazingly, if you alias eval to something else, say myeval, and call myeval, you DO NOT get a new execution context - this is even in the specification, see sections 10.4.2 and 15.1.2.1.1 in ECMA-262.)
Edit: As per your comment, it appears that some modern engines actually do some smart tricks, so I tried to dig a little more. I came across this forum thread discussing the issue, and in particular, a link to a tweet about how variables are allocated in V8. It also specifically touches on the eval problem. It seems that it has to parse the code in all inner functions. and see what variables are referenced, or if eval is used, and then determine whether each variable should be allocated on the heap or on the stack. Pretty neat. Here is another blog that contains a lot of details on the ECMAScript implementation.
This has the implication that even if an inner function never "escapes" the call, it can still force variables to be allocated on the heap. E.g.:
function init(node) {
var someLargeVariable = "...";
function drawSomeWidget(x, y) {
library.draw(x, y, someLargeVariable);
}
drawSomeWidget(1, 1);
drawSomeWidget(101, 1);
return function () {
alert("hi!");
};
}
Now, as init has finished its call, someLargeVariable is no longer referenced and should be eligible for deletion, but I suspect that it is not, unless the inner function drawSomeWidget has been optimized away (inlined?). If so, this could probably occur pretty frequently when using self-executing functions to mimick classes with private / public methods.
Answer to Raynos comment below. I tried the above scenario (slightly modified) in the debugger, and the results are as I predict, at least in Chrome:
When the inner function is being executed, someLargeVariable is still in scope.
If I comment out the reference to someLargeVariable in the inner drawSomeWidget method, then you get a different result:
Now someLargeVariable is not in scope, because it could be allocated on the stack.
There is no standard specifications of implementation for GC, every engine have their own implementation. I know a little concept of v8, it has a very impressive garbage collector (stop-the-world, generational, accurate). As above example 2, the v8 engine has following step:
create f1's VariableEnvironment object called f1.
after created that object the V8 creates an initial hidden class of f1 called H1.
indicate the point of f1 is to f2 in root level.
create another hidden class H2, based on H1, then add information to H2 that describes the object as having one property, o1, store it at offset 0 in the f1 object.
updates f1 point to H2 indicated f1 should used H2 instead of H1.
creates another hidden class H3, based on H2, and add property, o2, store it at offset 1 in the f1 object.
updates f1 point to H3.
create anonymous VariableEnvironment object called a1.
create an initial hidden class of a1 called A1.
indicate a1 parent is f1.
On parse function literal, it create FunctionBody. Only parse FunctionBody when function was called.The following code indicate it not throw error while parser time
function p(){
return function(){alert(a)}
}
p();
So at GC time H1, H2 will be swept, because no reference point that.In my mind if the code is lazily compiled, no way to indicate o1 variable declared in a1 is a reference to f1, It use JIT.
if the javascript engine uses a reference counting method
Most javascript engine's use some variant of a compacting mark and sweep garbage collector, not a simple reference counting GC, so reference cycles do not cause problems.
They also tend to do some tricks so that cycles that involve DOM nodes (which are reference counted by the browser outside the JavaScript heap) don't introduce uncollectible cycles. The XPCOM cycle collector does this for Firefox.
The cycle collector spends most of its time accumulating (and forgetting about) pointers to XPCOM objects that might be involved in garbage cycles. This is the idle stage of the collector's operation, in which special variants of nsAutoRefCnt register and unregister themselves very rapidly with the collector, as they pass through a "suspicious" refcount event (from N+1 to N, for nonzero N).
Periodically the collector wakes up and examines any suspicious pointers that have been sitting in its buffer for a while. This is the scanning stage of the collector's operation. In this stage the collector repeatedly asks each candidate for a singleton cycle-collection helper class, and if that helper exists, the collector asks the helper to describe the candidate's (owned) children. This way the collector builds a picture of the ownership subgraph reachable from suspicious objects.
If the collector finds a group of objects that all refer back to one another, and establishes that the objects' reference counts are all accounted for by internal pointers within the group, it considers that group cyclical garbage, which it then attempts to free. This is the unlinking stage of the collectors operation. In this stage the collector walks through the garbage objects it has found, again consulting with their helper objects, asking the helper objects to "unlink" each object from its immediate children.
Note that the collector also knows how to walk through the JS heap, and can locate ownership cycles that pass in and out of it.
EcmaScript harmony is likely to include ephemerons as well to provide weakly held references.
You might find "The future of XPCOM memory management" interesting.
Related
I read that every function has its own stack, which means that when the function ends its variables are no longer kept into the memory. I read also that objects are returned by reference.
Consider this example:
function getObject() {
var obj = {name: "someName"};
return obj;
} //At the end the locals should disappear
var newObj = getObject();// Get reference to something that is no longer kept in the stack,
console.log(newObj);//
So if the returned value by the function is a reference to an object that no longer exists (in the stack), how are we still getting the right value? In C (language) to return a pointer to a local variable is insane.
First, a quick disclaimer: There's a difference here between the theory and the optimized process that modern JavaScript engines actually perform.
Let's start with the theory:
The object isn't on the stack, just the variable that contains the reference to the object. The object exists separately, and is created in the heap. So in your example, just before the function returns, we have this in memory:
+−−−−−−−−−−−−−−−−−−+
| (object) |
+−−−−−−−−−−−−−−−−−−+
[obj:Ref11254]−−−−−>| name: "someName" |
+−−−−−−−−−−−−−−−−−−+
obj, the variable, contains a reference to the object (Ref11254, though of course that's just a conceptual number, we never actually see these). When your function returns, the content of obj (Ref11254) is returned, and obj (the variable) disappears along with the rest of the stack. But the object, which exists separately, continues to exist as long as something has a reference to it — in your case, the newObj variable.
In practice:
As an optimization, modern JavaScript engines frequently allocate objects on the stack (subject to engine-specific size constraints). Then, if the object is going to continue to live after the function terminates and the stack is cleaned up, they copy it into the heap. But if the object isn't going to survive (that is, no reference to it has been kept or is being returned), they can let it get cleaned up along with the stack. (Why? Because stack allocations and cleanup are really fast, and a lot of objects are only used locally in functions.)
It's worth noting that the same question arises for variables if the function creates a function:
function getFunction() {
var x = 0;
return function() {
return x++;
};
}
var f = getFunction();
console.log(f()); // 0
console.log(f()); // 1
console.log(f()); // 2
Now it's not just that the function getFunction returns (which is an object) survives, but so does x! Even though variables are "on the stack," and the stack gets reclaimed when getFunction returns. How is that possible?!
The answer is that conceptually, local variables aren't on the stack. Putting them on the stack is an optimization (a common one). Conceptually, local variables (along with a few other things) are stored in an object called a LexicalEnvironment object which is created for the call to getFunction. And conceptually, the function getFunction returns has a reference to that LexicalEnvironment object, and so it (and the variables it contains) continues to live on even after getFunction returns. (The function getFunction returns is called a closure because it closes over the environment where it was created.)
And sure enough, in practice modern JavaScript engines don't really do that. They create variables on the stack where it's quick and easy to clean them up. But then if a variable is going to need to live on (like x needs to in our example above), the engine puts it (and any others that it needs to keep around) into a container in the heap and has the closure refer to that container.
More about closures in this question's answers and on my anemic little blog (using slightly outdated terminology).
Because the garbage collector is clever than that! Simply put, if the object has active references then it isn't removed. Variables created within the function and not returned are lost when the function ends (unless you've created async calls which use them) but because they then don't have any active references.
See, for example, https://developer.mozilla.org/en-US/docs/Web/JavaScript/Memory_Management
I'm reading this article (http://javascript.info/tutorial/memory-leaks#memory-leak-size) about memory leaks which mentions this as a memory leak:
function f() {
var data = "Large piece of data";
function inner() {
return "Foo";
}
return inner;
}
JavaScript interpreter has no idea which variables may be required by
the inner function, so it keeps everything. In every outer
LexicalEnvironment. I hope, newer interpreters try to optimize it, but
not sure about their success.
The article suggests we need to manually set data = null before we return the inner function.
Does this hold true today? Or is this article outdated? (If it's outdated, can someone point me to a resource about current pitfalls)
Modern engines would not maintain unused variables in the outer scope.
Therefore, it doesn't matter if you set data = null before returning the inner function, because the inner function does not depend on ("close over") data.
If the inner function did depend on data--perhaps it returns it--then setting data = null is certainly not what you want to, because then, well, it would be null instead of having its original value!
Assuming the inner function does depend on data, then yes, as long as inner is being pointed to (referred to by) something, then the value of data will have to be kept around. But, that's what you are saying you want! How can you have something available without having it be available?
Remember that at some point the variable which holds the return value of f() will itself go out of scope. At that point, at least until f() is called again, data will be garbage collected.
The general rule is that you don't need to worry about memory and leaks with JavaScript. That's the whole point of GC. The garbage collector does an excellent job of identifying what is needed and what is not needed, and keeping the former and garbage collecting the latter.
You may want to consider the following example:
function foo() {
var x = 1;
return function() { debugger; return 1; };
}
function bar() {
var x = 1;
return function() { debugger; return x; };
}
foo()();
bar()();
And examine its execution in Chrome devtools variable window. When the debugger stops in the inner function of foo, note that x is not present as a local variable or as a closure. For all practical purposes, it does not exist.
When the debugger stops in the inner function of bar, we see the variable x, because it had to be preserved so as to be accessible in order to be returned.
Does this hold true today? Or is this article outdated?
No, it doesn't, and yes, it is. The article is four years old, which is a lifetime in the web world. I have no way to know if jQuery still is subject to leaks, but I'd be surprised if it were, and if so, there's an easy enough way to avoid them--don't use jQuery. The leaks the article's author mentions related to DOM loops and event handlers are not present in modern browsers, by which I mean IE10 (more likely IE9) and above. I'd suggest finding a more up-to-date reference if you really want to understand about memory leaks. Actually, I'd suggest you mainly stop worrying about memory leaks. They occur only in very specialized situations. It's hard to find much on the topic on the web these days for that precise reason. Here's one article I found: http://point.davidglasser.net/2013/06/27/surprising-javascript-memory-leak.html.
Just in addition to to #torazaburo's excellent answer, it is worth pointing out that the examples in that tutorial are not leaks. A leak what happens when a program deletes a reference to something but does not release the memory it consumes.
The last time I remember that JS developers had to really worry about genuine leaks was when Internet Explorer (6 and 7 I think) used separate memory management for the DOM and for JS. Because of this, it was possible to bind an onclick event to a button, destroy the button, and still have the event handler stranded in memory -- forever (or until the browser crashed or was closed by the user). You couldn't trigger the handler or release it after the fact. It just sat on the stack, taking up room. So if you had a long-lived webapp or a webpage that created and destroyed a lot of DOM elements, you had to be super diligent to always unbind events before destroying them.
I've also run into a few annoying leaks in iOS but these were all bugs and were (eventually) patched by Apple.
That said, a good developer needs to keep resource management in mind when writing code. Consider these two constructors:
function F() {
var data = "One megabyte of data";
this.inner = new function () {
return data;
}
}
var G = function () {};
G.prototype.data = "One megabyte of data";
G.prototype.inner = function () {
return this.data;
};
If you were to create a thousand instances of F, the browser would have to allocate an extra gigabyte of memory for all those copies of the huge string. And every time you deleted an instance, you might get some onscreen jankiness when the GC eventually recovered that ram. On the other hand, if you made a thousand instances of G, the huge string would be created once and reused by every instance. That is a huge performance boost.
But the advantage of F is that the huge string is essentially private. No other code outside of the constructor would be able to access that string directly. Because of that, each instance of F could mutate that string as much as it wanted and you'd never have to worry about causing problems for other instances.
On the other hand, the huge string in G is out there for anyone to change. Other instances could change it, and any code that shares the same scope as G could too.
So in this case, there is a trade-off between resource use and security.
I've read many questions on closures and JavaScript on SO, but I can't find information about what happens to a function when it gets closured in. Usually the examples are string literals or simple objects. See the example below. When you closure in a function, the original function is preserved even if you change it later on.
What technically happens to the function preserved in a closure? How is it stored in memory? How is it preserved?
See the following code as an example:
var makeFunc = function () {
var fn = document.getElementById; //Preserving this, I will change it later
function getOriginal() {
return fn; //Closure it in
}
return getOriginal;
};
var myFunc = makeFunc();
var oldGetElementById = myFunc(); //Get my preserved function
document.getElementById = function () { //Change the original function
return "foo"
};
console.log(oldGetElementById.call(document, "myDiv")); //Calls the original!
console.log(document.getElementById("myDiv")); //Returns "foo"
Thanks for the comments and discussion. After your recommendations, I found the following.
What technically happens to the function preserved in a closure?
Functions as objects are treated no differently than any other simple object as closured variables (such as a string or object).
How is it stored in memory? How is it preserved?
To answer this, I had to dig through some texts on programming languages. John C. Mitchell's Concepts in Programming Languages explains that closured variables usually end up in the program heap.
Therefore, the variables defined in nesting subprograms may need lifetimes that are of the entire program, rather than just the time during which the subprogram in which they were defined is active. A variable whose lifetime is that of the whole program is said to have unlimited extent. This usually means they must be heap-dynamic, rather than stack-dynamic.
And more specific to JavaScript runtimes, Dmitry Soshnikov describes
As to implementations, for storing local variables after the context is destroyed, the stack-based implementation is not fit any more (because it contradicts the definition of stack-based structure). Therefore in this case closured data of the parent context are saved in the dynamic memory allocation (in the “heap”, i.e. heap-based implementations), with using a garbage collector (GC) and references counting. Such systems are less effective by speed than stack-based systems. However, implementations may always optimize it: at parsing stage to find out, whether free variables are used in function, and depending on this decide — to place the data in the stack or in the “heap”.
Further, Dmitry shows a varied implementation of the parent scope in function closures:
As we mentioned, for optimization purpose, when a function does not use free variables, implementations may not to save a parent scope chain. However, in ECMA-262-3 specification nothing is said about it; therefore, formally (and by the technical algorithm) — all functions save scope chain in the [[Scope]] property at creation moment.
Some implementations allow access to the closured scope directly. For example in Rhino, the [[Scope]] property of a function corresponds to a non-standard property __parent__.
If I do the following (in the global scope):
var myObject = {name: "Bob"};
I have a way to point to that object in memory (i.e. the string identifier "myObject)". I can open the console and type: myObject.name and the console will respond with:
"Bob"
Now, if I just type:
{name: "Jane"};
I'm creating that object somewhere and I'm guessing it goes on living in some scope. Is there any way I can find it? Does it exists under window somewhere in some generic store?
Edit: Some people are saying it will just get garbage collected.
So how about this example:
var MyObject = function(){
$("button").click(this.alert);
}
MyObject.prototype.alert = function(){
alert("I heard that!")
}
new MyObject();
It can't be garbage collected because its callback is bound to a DOM event. Where does the resulting object live and can it be accessed?
If there is no reference pointing to this object (that is you didn't assign it to any variable or to the value of any property), then there is no way to access it and in fact it's no living as the garbage collector can reclaim this memory immediately.
The short answer is no, the object isn't kept alive in memory somewhere you can't reach. Bit life is live: the truth is a bit more complicated, but not by much, if you grasp the basics.
Update:
In response to your update: you're right, in a way. The callback is a reference to MyObject.prototype.alert, which you accessed using this.alert, but that function object is being referenced by a constructors prototype, and can't be GC'ed anyway. The instance itself is not involved in the alert function itself, so it can be GC'ed safely.
Think of it like this:
MyConstructor.prototype.alert = 0x000123;//some memory address
||
\/
0x000123 = [object Function];
The function object itself isn't directly attached to anything, it floats there in memory, and is being referenced by the prototype. When you create an instance:
new MyConstructor().alert;
Is resolved as follows:
[new MyConstructor instance] allocated at e.g 0x000321
||
\\
\=>[alert] check for alert #instance -> not found
\\
\=> check prototype, yield 0x000123 <-- memory address, return this value
So upon executing the statement:
$("button").click(this.alert);
this.alert is an expression that is resolved to 0x000123. In other words, jQ's click method (function object) only receives the memory address of the alert function object. The instance, or indeed the constructor isn't involved at all. That's why this, or the call-context, can change depending on how and where a function is invoked. see here for more on ad-hoc context determination
I'll even do you one better:
/*assume your code is here*/
new MyConstructor().alert = function(){ alert('I am deaf');};
MyConstructor.prototype.alert = 'foo';
$('#button').click();
Guess what, the click event alert "I heard that" all the same, the prototype isn't even involved, let alone the instance.
If MyConstructor were to go out of scope, the click event will still work fine, because the GC still sees a reference to the alert function object that isn't out of scope yet. everything else is available for GC'ing, though...
The JS garbage collector (GC) is a flag-and-swipe GC. When the JS engine encounters your statement, it does allocate the memory required to store your object. When the next statement is reached, that object will probably still be in memory.
Every now and then, the GC X-checks all the objects it sees in memory, and tries to locate all references to that object that are still accessible. When it comes across the object literal that was just created in that statement, but no reference to was assigned, the object is flagged for garbage collection.
The next time the GC gets about its business of swiping flagged object, that object will be removed from memory.
Of course, this isn't entirely true for all engines. suppose your statement was written in an IIFE, that returned a function:
var foo = function()
{
{name: 'bar'};
return function()
{
return 'foobar';
};
}());
Some engines just keep the entire scope of the IIFE in memory, and only deallocate the memory for that scope when the return value of the IIFE goes out of scope (is flagged for GC). Other engines, like V8 last time I checked, will in fact flag those objects/vars of the outer scope that aren't referenced by its return value.
Though, come to think about it, it might not apply to this case, because the GC might even kick in even before the IIFE returns... But on the whole, that's just nit-picking.
There's also the question of logical OR's to consider:
var name = (mayNotExist || {name:'default'}).name;
In this case, if mayNotExist does exist, the object literal will never even be created, thanks to JS's short-circuit evaluation of expressions.
A couple of links on the matter:
Objects and functions in javascript
javascript - How to make this code work?
What makes my.class.js so fast?
Properties of Javascript function objects
I need some help here to undestand how this works (or does not, for that matter).
In a web page, I create a click event listener for a node.
Within the listener, I create an instance of some random class, which sets the node as property within itself. So, if var classInstance is the instance, I can access the node as something like classInstance.rootNode.
When the listener fires, I setup an ajax request, keep classInstance in closure and pass along the ajax response to classInstance and use it to perhaps modify the rootNode's style or content or whatever.
My question is, once I'm done with classInstance, assuming nothing else references it and by itself, it holds nothing else in its own closure, will the garbage collector dispose of it? If not, how do I mark it for disposal?
In response to #Beetroot-Beetroot's doubts (which, admittedly, I have, too), I did some more digging. I set up this fiddle, and used the chrome dev-tools' timeline and this article as a guideline. In the fiddle, two almost identical handlers create a closure with 2 date objects. The first only references a, the second references both a and b. Though in both cases only a can ever really be exposed (hard-coded values), the first closure uses significantly less memory. Whether or not this is down to JIC (just in time compilation) or V8's JS optimization wizardry, I can't say for sure. But from what I've read, I'd say it's V8's GC that deallocates b when the tst function returns, which it can't in the second case (bar references b when tst2 returns). I get the feeling that this isn't that outlandish, and I wouldn't at all be surprised to find out that FF and even IE would work similarly. Just added this, perhaps irrelevant, update for completeness' sake and because I feel as though a link to google's documentation of the dev-tools is an added value of sorts.
It sort of depends, as a simple rule of thumb: as long as you can't reference the classInstance variable anymore, it should be GC'ed, regardless of its own circular references. I've tested quite a lot of constructions, similar to the one you describe here. Perhaps it's worth a lookI've found that closures and mem-leaks aren't that common or easy to get by (at least, not anymore).
But as the accepted answer says: it's nigh impossible to know when what code will leak. Reading over your question again, I'd say: no, you're not going to leak memory: the classInstance variable wasn't created in the global scope, but it's being passed to various functions (and therefore various scopes). These scopes disintegrate each time the function returns. classInstance won't be GC'ed if it's been passed to another function/scope. But as soon as the last function that references classInstance returns, the object is marked for GC. Sure it might be a circular reference, but it's a reference that cannot be accessed from anywhere but its own scope. You can't really call that a closure, either: closures happen when there is some form of exposure to the outer scope, which is not happening in your example.
I'm rubbish at explaining stuff like this, but just to recap:
var foo = (function()
{
var a, b, c, d;
return function()
{
return a;
}
})();
The GC will deallocate the mem b,c and d reference: they've gone out of scope, there's no way to access them...
var foo = (function()
{
var a, b, c, d;
return function()
{
a.getB = function()
{
return b;
}
a.getSelf = function()
{
return a;//or return this;
}
return a;
}
})();
//some code
foo = new Date();//
In this case, b won't get GC'ed either, for obvious reasons. foo exposes a and b, where a is an object that contains a circular reference. Though as soon as foo = new Date(), foo loses any reference to a. Sure, a still references itself, but a is no longer exposed: it can reference whatever it bloody well likes. Most browsers won't care and will GC a and b. In fact, I've checked and Chrome, FF, and IE8 all GC the code above perfectly... no worries, then.
I'm not an expert on this issue, but I'm pretty sure the GC will not dispose of it. In fact, it probably never will, because you've created a circular reference between the event listener and the DOM node. To allow it to be garbage-collected, you should set one or both of those references (the event listener and/or rootNode) to undefined or null.
Still, I'd only worry about this if you're creating many of these classInstances or if it can be created several times over the page's lifetime. Otherwise it's an unnecessary optimization.