Trying to examine intricacies of JavaScript GC, I got deep into the weeds (that is, into the ECMAScript spec). It was found by me that an object should not be collected as long as it is deemed "live". And liveness itself is defined as follows:
At any point during evaluation, a set of objects S is considered
live if either of the following conditions is met:
Any element in S is included in any agent's [[KeptAlive]] List.
There exists a valid future hypothetical WeakRef-oblivious execution with respect to S that observes the Object value of any
object in S.
The [[KeptAlive]] list is appended with an object once a special WeakRef is created, which (weakly) refers to it, and emptied after the current synchronous job ceases.
However, as for WeakRef-oblivious execution, I fail to get my mind around what it is:
For some set of objects S, a hypothetical WeakRef-oblivious
execution with respect to S is an execution whereby the abstract
operation WeakRefDeref of a WeakRef whose referent is an element
of S always returns undefined.
WeakRefDeref of a WeakRef returns undefined when its referent was collected already. Am I getting it right that it is implied here that all objects that make up S should be collected? So the notion of a future hypothetical WeakRef-oblivious execution is that there is still an object, an element of S, which not collected yet and observed by some WeakRef.
It all still makes little sense for me. I would appreciate some samples.
Let's ignore the formalised, but incomplete, definitions. We find the actual meaning in the non-normative notes of that section.1
What is Liveness in JavaScript?
Liveness is the lower bound for guaranteeing which WeakRefs an engine must not empty (note 6). So live (sets of) objects are those that must not be garbage-collected because they still will be used by the program.
However, the liveness of a set of objects does not mean that all the objects in the set must be retained. It means that there are some objects in the set that still will be used by the program, and the live set (as a whole) must not be garbage-collected. This is because the definition is used in its negated form in the garbage collector Execution algorithm2: At any time, if a set of objects S is not live, an ECMAScript implementation may3 […] atomically [remove them]. In other words, if an implementation chooses a non-live set S in which to empty WeakRefs, it must empty WeakRefs for all objects in S simultaneously (note 2).
Looking at individual objects, we can say they are not live (garbage-collectable) if there is at least one non-live set containing them; and conversely we say that an individual object is live if every set of objects containing it is live (note 3). It's a bit weird as a "live set of objects" is basically defined as "a set of objects where any of them is live", however the individual liveness is always "with respect to the set S", i.e. whether these objects can be garbage-collected together.
1: This definitely appears to be the section with the highest notes-to-content ratio in the entire spec.
2: emphasis mine
3: From the first paragraph of the objectives: "This specification does not make any guarantees that any object will be garbage collected. Objects which are not live may be released after long periods of time, or never at all. For this reason, this specification uses the term "may" when describing behaviour triggered by garbage collection."
Now, let's try to understand the definition.
At any point during evaluation, a set of objects S is considered
live if either of the following conditions is met:
Any element in S is included in any agent's [[KeptAlive]] List.
There exists a valid future hypothetical WeakRef-oblivious execution
with respect to S that observes the Object value of any object in S.
The first condition is pretty clear. The [[KeptAlive]] list of an agent is representing the list of objects to be kept alive until the end of the current Job. It is cleared after a synchronous run of execution ends, and the note on WeakRef.prototype.deref4 provides further insight on the intention: If [WeakRefDeref] returns a target Object that is not undefined, then this target object should not be garbage collected until the current execution of ECMAScript code has completed.
The second condition however, oh well. It is not well defined what "valid", "future execution" and "observing the Object value" mean. The intuition the second condition above intends to capture is that an object is live if its identity is observable via non-WeakRef means (note 2), aha. From my understanding, "an execution" is the execution of JavaScript code by an agent and the operations occurring during that. It is "valid" if it conforms to the ECMAScript specification. And it is "future" if it starts from the current state of the program.
An object's identity may be observed by observing a strict equality comparison between objects or observing the object being used as key in a Map (note 4), whereby I assume that the note only gives examples and "the Object value" means "identity". What seems to matter is whether the code does or does not care if the particular object is used, and all of that only if the result of the execution is observable (i.e. cannot be optimised away without altering the result/output of the program)5.
To determine liveness of objects by these means would require testing all possible future executions until the objects are no longer observable. Therefore, liveness as defined here is undecidable6. In practice, engines use conservative approximations such as reachability7 (note 6), but notice that research on more advanced garbage-collectors is under way.
Now for the interesting bit: what makes an execution "hypothetical WeakRef-oblivious with respect to a set of object S"? It means an execution under the hypothesis that all WeakRefs to objects in S are already cleared8. We assume that during the future execution, the abstract operation WeakRefDeref of a WeakRef whose referent is an element of S always returns undefined (def), and then work back whether it still might observe an element of the set. If none of the objects to be can be observed after all weak references to them are cleared, they may be garbage-collected. Otherwise, S is considered live, the objects cannot be garbage-collected and the weak references to them must not be cleared.
4: See the whole note for an example. Interestingly, also the new WeakRef(obj) constructor adds obj to the [[KeptAlive]] list.
5: Unfortunately, "the notion of what constitutes an "observation" is intentionally left vague" according to this very interesting es-discourse thread.
6: While it appears to be useless to specify undecidable properties, it actually isn't. Specifying a worse approximation, e.g. said reachability, would preclude some optimisations that are possible in practice, even if it is impossible to implement a generic 100% optimiser. The case is similar for dead code elimination.
7: Specifying the concept of reachability would actually be much more complicated than describing liveness. See Note 5, which gives examples of structures where objects are reachable through internal slots and specification type fields but should be garbage-collected nonetheless.
8: See also issue 179 in the proposal and the corresponding PR for why sets of objects were introduced.
Example time!
It is hard to me to recognize how livenesses of several objects may affect each other.
WeakRef-obliviousness, together with liveness, capture[s the notion] that a WeakRef itself does not keep an object alive (note 1). This is pretty much the purpose of a WeakRef, but let's see an example anyway:
{
const o = {};
const w = new WeakRef(o);
t = setInterval(() => {
console.log(`Weak reference was ${w.deref() ? "kept" : "cleared"}.`)
}, 1000);
}
(You can run this in the console, then force garbage collection, then clearInterval(t);)
[The second notion is] that cycles in liveness does not imply that an object is live (note 1). This one is a bit tougher to show, but see this example:
{
const o = {};
const w = new WeakRef(o);
setTimeout(() => {
console.log(w.deref() && w.deref() === o ? "kept" : "cleared")
}, 1000);
}
Here, we clearly do observe the identity of o. So it must be alive? Only if the w that holds o is not cleared, as otherwise … === o is not evaluated. So the liveness of (the set containing) o depends on itself, with circular reasoning, and a clever garbage collector is actually allowed to collect it regardless of the closure.
To be concrete, if determining obj's liveness depends on determining the liveness of another WeakRef referent, obj2, obj2's liveness cannot assume obj's liveness, which would be circular reasoning (note 1). Let's try to make an example with two objects that depend on each other:
{
const a = {}, b = {};
const wa = new WeakRef(a), wb = new WeakRef(b);
const lookup = new WeakMap([[a, "b kept"], [b, "a kept"]]);
setTimeout(() => {
console.log(wa.deref() ? lookup.get(b) : "a cleared");
console.log(wb.deref() ? lookup.get(a) : "b cleared");
}, 1000);
}
The WeakMap primarily serves as something that would observe the identity of the two objects. Here, if a is kept so wa.deref() would return it, b is observed; and if b is kept so wb.deref() would return it, a is observed. Their liveness depends on each other, but we must not do circular reasoning. A garbage-collector may clear both wa and wb at the same time, but not only one of them.
Chrome does currently checks for reachability through the closure so the above snippet doesn't work, but we can remove those references by introducing a circular dependency between the objects:
{
const a = {}, b = {};
a.b = b; b.a = a;
const wa = new WeakRef(a), wb = new WeakRef(b);
const lookup = new WeakMap([[a, "b kept"], [b, "a kept"]]);
t = setInterval(() => {
console.log(wa.deref() ? lookup.get(wa.deref().b) : "a cleared");
console.log(wb.deref() ? lookup.get(wb.deref().a) : "b cleared");
}, 1000);
}
To me, note 2 (WeakRef-obliviousness is defined on sets of objects instead of individual objects to account for cycles. If it were defined on individual objects, then an object in a cycle will be considered live even though its Object value is only observed via WeakRefs of other objects in the cycle.) seems to say the exact same thing. The note was introduced to fix the definition of liveness to handle cycles, that issue also includes some interesting examples.
Related
The Question
With FinalizationRegistry, it's possible to get notified after that an object has been garbage collected. However, is it possible to get notified before, so I can still have access to the data and do something with it?
What I'm trying to achieve
I want to implement a CompressedMap<K, V> were data is internally stored, either deflated in a Map<K, Buffer> or inflated in a Map<K, WeakRef<V>>. It's up to the user to define the deflate and inflate functions.
As a classic Map<K, V>, if the user holds a reference to a value present in the map, and update it, it should also be automatically updated in the map (because it's the same object). That's why I need to keep the values in a Map<K, WeakRef<V>> and compress and move them to the Map<K, Buffer> only when they're about to get garbage collected.
What I've already considered
SO question: Can I get a callback when my object is about to get collected by GC in Node?
The accepted answer shows how to use FinalizationRegistry which fires a callback AFTER that the object has been garbage collected and is no longer available.
Moving the value to the deflated map after each modification
It would require to wrap each fields of the object in a getter/setter and it has lot of implications:
It's more computational intensive to update the deflated map after EACH modification.
Modifications on new fields (not wrapped in getter/setter) would be ignored.
Wrapping each fields of each objects could have a big memory impact on large map which would defeat the purpose of a "compressed map".
It would modify the user's objects.
It questions where the boundary of the object is. Maybe we should wrap all the fields even deep ones, maybe not. It depends of the user's use case.
Writing a Node.JS addon and using Node-API
I didn't dig deeply into it, but it would be a last resort solution, because my implementation will only be compatible with Node.JS. Even if I'm focused on Node.JS, browser support would be nice to have. Also I never wrote a Node.JS addon, and I'm not even sure if it will allow me to implement a PreFinalizationRegistry.
References
FinalizationRegistry
Map
WeakRef
Developers shouldn't rely on cleanup callbacks for essential program logic. Cleanup callbacks may be useful for reducing memory usage across the course of a program, but are unlikely to be useful otherwise.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/FinalizationRegistry#notes_on_cleanup_callbacks
Instead of waiting until the object is finalized, I would recommend using setTimeout() to deflate it when it hasn't been used for a certain period of time.
To do so you'll want to return an object that behaves like a Map<K, WeakRef<V>> and wraps it instead of the actual map. This way you can start throwing exceptions if it is used after the timeout.
Imagine an unreferenced object {a: 1, b: {c: 2}} for which there is still a reference to its b subobject. If the object was compressed and then garbage-collected, the holder of the reference to b might say ref_to_b.c = 3, and this change would not automatically be reflected in the compressed version. So when the compressed version is later re-inflated, it still has b.c = 2.
This means that you can only compress those members of an object that are not themselves objects, that is, the primitive-valued members. And this could be done with a setter whenever such a value is changed. The deflated values would be kept with strong references, so that an object can always be recreated from them if only its key is known, even if its earlier incarnation has been garbage-collected.
class DeflatableObject {
static deflated = {
primitive: new Map(),
subobject: new Map()
}
static recreate(key) {
var obj = new DeflatableObject();
obj.key = key;
obj._primitive = inflate(DeflatableObject.deflated.primitive.get(key));
var subobj = DeflatableObject.deflated.subobject.get(key);
if (subobj)
obj._subobject = subobj.object.deref() || DeflatableSubobject.recreate(subobj.key);
return obj;
}
set primitive(value) {
this._primitive = value;
DeflatableObject.deflated.primitive.set(this.key, deflate(value));
}
get primitive() {
return this._primitive;
}
set subobject(value) {
this._subobject = value;
DeflatableObject.deflated.subobject.set(this.key, {
object: new WeakRef(value),
key: value.key
});
}
get subobject() {
return this._subobject;
}
}
I have been grinding leetcode and I encountered this question https://leetcode.com/problems/intersection-of-two-linked-lists/ where it asks you to find the intersection of two LinkedList. One solution (not the best one I know) is to use a hash set to keep track of the first linked list while traversing through it and then traversing through the second list. When we found out that there is a duplicate node then that is the intersection.
For example, E is the intersection
A -> B
\
E -> F
/
C -> D
The way I solved it is to use a WeakSet as the hash set to store the reference of the first linked list.
Here is the code
var getIntersectionNode = function(headA, headB) {
let hashSet = new WeakSet()
while(headA){
hashSet.add(headA)
headA = headA.next
}
while(headB){
if(hashSet.has(headB)) return headB
headB = headB.next
}
return null
};
My question is, since WeakSet has this nice feature - when no other references to an object stored in the WeakSet exist, those objects can be garbage collected. If we go back to the example here, when we are iterating through A -> B -> E -> F, we are adding every node into the hash set, but we don't preserve the reference for every node, i.e. headA = headA.next. So that means after I added one node into the hash set and I advanced to the next node, the reference to the previous node is gone, then it should be garbage collected from the hash set right? Then how come the solution would pass?
For example, when we are at A, we store the A into the hash set, and we advance to B, now there is no way to reference back to A, with WeakSet it should have been garbage collected. But clearly if that is the case the solution wouldn't work. Can someone point out where my understanding is wrong here?
There are a couple issues here:
The original objects passed into getIntersectionNode from the caller will still exist at least until the function finishes. If you do
someFn({ foo: 'bar' })
The object won't get garbage collected until synchronous JS processing has finished; the GC only runs once JS is idle, and even then, it'll often take a few seconds. If you add an element to a WeakSet, and you were somehow able to observe when exactly it gets removed from it due to there no longer being any references to it, it would take some time.
Even then, even if unreferenceable objects were GC'd immediately, in this case, all that's needed is for the one intersection node to remain referenceable. If there's an intersection, that intersection node will be a child of headA somewhere, and that node will also exist somewhere nested inside headB; a reference still exists to it inside headB even after iterating through headA.
Unless your script carries out asynchronous tasks (like wait for user input, or a setTimeout), there's no benefit to using a WeakSet over a Set (or a WeakMap over a Map), since the garbage collector won't run in time for it to be of any use.
I came across the following How to break on reduce that was not tagged functional, yet contained a lot of discussion regarding the mutation of the array being a functional no no.
The main answer mutated the array to break out of the iterator early, but the array could easily be restored to its original state by pushing back the spliced items, a somewhat dubious solution and arguably not at all functional.
However many algorithms gain a significant advantage if items can be modified in place (mutated)
In regard to Javascript (single threaded (no workers), and no proxies) is it consider as mutation if the modification only exists temporarily? Or Is mutation only a side effect after the function has returned.
Is the following function a mutator?
function mutateAndRepair(arr) { // arr is an array of numbers
arr[0]++;
arr[0]--;
}
The array contains 1 or more items.
The arrays first item (index 0) is a number within max safe integer range.
The array is not a shared buffer
The array is not being watched by a proxy
I consider this as not mutating as the mutation only exists while the function is executing and as JS is blocking no other code will ever be able to see the mutation hence there are no side effects.
Considering the restraints does this comply with the common functional paradigm used by JavaScript coders?
The ++ and -- operators are mutating and they do not exactly reverse each other. Quoting the 2017 standard:
12.4.4.1Runtime Semantics: Evaluation
UpdateExpression : LeftHandSideExpression ++
Let lhs be the result of evaluating LeftHandSideExpression.
Let oldValue be ? ToNumber(? GetValue(lhs)).
Let newValue be the result of adding the value 1 to oldValue, using the same rules as for the + operator (see 12.8.5).
Perform ? PutValue(lhs, newValue).
Return oldValue.
It's that second step that's important, as it converts the value to a number primitive but there's also a subtle difference between that and a Number object as returned by the Number constructor.
var arr = [new Number(1234)];
function mutateAndRepair(arr) {
console.log(`the value before is ${arr[0]}`);
arr[0]++;
arr[0]--;
console.log(`the value after is ${arr[0]}`);
}
arr[0].foo = 'bar';
console.log(`foo before is ${arr[0].foo}`);
mutateAndRepair(arr)
console.log(`foo after is ${arr[0].foo}`);
Now, I'm being a little cheeky here by loosely interpreting your requirement that the first item of arr is a "number". And for sure, you can add another stipulation that the values of arr must be "number primitives" to exclude this exact form of mutation.
How about another, more subtle point. -0 and 0 are treated as the same value in virtually all ways except Object.is:
var arr = [-0];
function mutateAndRepair(arr) {
console.log(`the value before is ${arr[0]}`);
arr[0]++;
arr[0]--;
console.log(`the value after is ${arr[0]}`);
}
console.log(`is zero before ${Object.is(0, arr[0])}`);
mutateAndRepair(arr)
console.log(`is zero after ${Object.is(0, arr[0])}`);
Okay, you can add a requirement that the first item of arr is not -0. But all of that kind of misses the point. You could argue that virtually any method is non-mutating if you simply declare that you're going to ignore any case in which mutation would be observed.
Considering the restraints does this comply with the common functional paradigm used by JavaScript coders?
I would not consider this code to follow functional coding principles, and would perhaps even reject it in a code review if that were a goal of the project. It's not even so much about the nitty-gritty of how or whether immutability is assured by all code paths, but the fact that it depends upon mutation internally that makes this code non-functional in my view. I've seen a number of bugs arise in pseudo-functional code where an exception occurs between the mutate and repair steps, which of course leads to clear and unexpected side-effects, and even if you have a catch/finally block to try to restore the state, an exception could also occur there. This is perhaps just my opinion, but I think of immutability as a part of a larger functional style, rather than just a technical feature of a given function.
Getting the size of an Object usually consists of iterating and counting, or Object.keys(obj).length which is also O(n). If I switch to Map, can I assume Map.size runs in O(1)?
I'm pretty new to Javascript, coming from the C++ world, and I was shocked I couldn't find a standard that specs the time complexity of all functions provided by the language.
You can't rely on it except on implementations where you've examined the source code or proved it empirically. The specification shows that Map.prototype.size is a getter with looping logic:
get Map.prototype.size
Map.prototype.size is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:
Let M be the this value.
If Type(M) is not Object, throw a TypeError exception.
If M does not have a [[MapData]] internal slot, throw a TypeError exception.
Let entries be the List that is the value of M's [[MapData]] internal slot.
Let count be 0.
For each Record {[[Key]], [[Value]]} p that is an element of entries
If p.[[Key]] is not empty, set count to count+1.
Return count.
But implementations are free to optimize provided the semantics of size are unchanged.
Objects in JavaScript have unique identities. Every object you create via an expression such as a constructor or a literal is considered differently from every other object.
What is the reason behind this?
{}==={}//output:false
For what reason they are treated differently? What makes them different to each other?
{} creates a new object.
When you try and compare two, separate new objects (references), they will never be equal.
Laying it out:
var a = {}; // New object, new reference in memory, stored in `a`
var b = {}; // New object, new reference in memory, stored in `b`
a === b; // Compares (different) references in memory
If it helps, {} is a "shortcut" for new Object(), so more explicitly:
var a = new Object();
var b = new Object();
a === b; // Still false
Maybe the explicitness of new helps you understand the comparison compares different objects.
On the other side, references can be equal, if they point to the same object. For example:
var a = {};
var b = a;
a === b; // TRUE
They are different instances of objects, and can be modified independently. Even if they (currently) look alike, they are not the same. Comparing them by their (property) values can be useful sometimes, but in stateful programming languages the object equality is usually their identity.
The fact that they're different is important in this scenario:
a={};
b={};
a.some_prop = 3;
At this point you'll obviously know that b.some_prop will be undefined.
The == or === operators thus allow you to be sure that you're not changing some object's properties, that you don't want changed
This question is quite old, but I think the actual solution does not pop out clearly enough in the given answers, so far.
For what reason they are treated differently? What makes them
different to each other?
I understand your pain, many sources in the internet do not come straight to the fact:
Object (complex JS types => objects, arrays and functions) variables store only references (=address of the instances in memory) as their value. Object identity is recognized by reference identity.
You expected something like an ID or reference inside the object, which you could use to tell them apart (maybe that's actually done transparently, under the hood). But every time you instantiate an object, a new instance is created in memory and only the reference to it is stored in the variable.
So, when the description of the ===-operator says that it compares the values, it actually means it compares the references (not the properties and their values), which are only equal if they point to the exactly same object.
This article explains it in detail: https://codeburst.io/explaining-value-vs-reference-in-javascript-647a975e12a0
BR
Michael
Both of the objects are created as a separate entities in the memory. To be precise, both of the objects are created as a separate entities on the heap (JavaScript engines use heap and stack memory models for managing running scripts). So, both of the objects may look the same (structure, properties etc.) but under the hood they have two separate addresses in the memory.
Here is some intuition for you. Imagine a new neighborhood where all houses are look the same. You've decided to build another two identical buildings and after finishing the construction both of the buildings are look the same and they even "sit" contiguously but still they are not the same building. They have two separate addresses.
I think that the simplest answer is "they are stored in different locations in memory". Although it is not always clear in languages that hide pointers ( if you know C, C++ or assembly language, you know what pointers are, if not, it is useful study to learn a low level language ) by making everything a pointer, each "object" is actually a pointer to a location in memory where the object exists. In some cases, two variables will point to the same location in memory. In others, they will point to different locations in memory that happen to have similar or identical content. It's like having two different URLs, each of which points to an identical page. The web pages are equal to each other, but the URLs are not.