Javascript: is reading ES6 Map.size constant time?

Javascript: is reading ES6 Map.size constant time? - javascript

Getting the size of an Object usually consists of iterating and counting, or Object.keys(obj).length which is also O(n). If I switch to Map, can I assume Map.size runs in O(1)?
I'm pretty new to Javascript, coming from the C++ world, and I was shocked I couldn't find a standard that specs the time complexity of all functions provided by the language.

You can't rely on it except on implementations where you've examined the source code or proved it empirically. The specification shows that Map.prototype.size is a getter with looping logic:
get Map.prototype.size
Map.prototype.size is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:
Let M be the this value.
If Type(M) is not Object, throw a TypeError exception.
If M does not have a [[MapData]] internal slot, throw a TypeError exception.
Let entries be the List that is the value of M's [[MapData]] internal slot.
Let count be 0.
For each Record {[[Key]], [[Value]]} p that is an element of entries
If p.[[Key]] is not empty, set count to count+1.
Return count.
But implementations are free to optimize provided the semantics of size are unchanged.

Related

What is Liveness in JavaScript?

Trying to examine intricacies of JavaScript GC, I got deep into the weeds (that is, into the ECMAScript spec). It was found by me that an object should not be collected as long as it is deemed "live". And liveness itself is defined as follows:
At any point during evaluation, a set of objects S is considered
live if either of the following conditions is met:
Any element in S is included in any agent's [[KeptAlive]] List.
There exists a valid future hypothetical WeakRef-oblivious execution with respect to S that observes the Object value of any
object in S.
The [[KeptAlive]] list is appended with an object once a special WeakRef is created, which (weakly) refers to it, and emptied after the current synchronous job ceases.
However, as for WeakRef-oblivious execution, I fail to get my mind around what it is:
For some set of objects S, a hypothetical WeakRef-oblivious
execution with respect to S is an execution whereby the abstract
operation WeakRefDeref of a WeakRef whose referent is an element
of S always returns undefined.
WeakRefDeref of a WeakRef returns undefined when its referent was collected already. Am I getting it right that it is implied here that all objects that make up S should be collected? So the notion of a future hypothetical WeakRef-oblivious execution is that there is still an object, an element of S, which not collected yet and observed by some WeakRef.
It all still makes little sense for me. I would appreciate some samples.

Let's ignore the formalised, but incomplete, definitions. We find the actual meaning in the non-normative notes of that section.1
What is Liveness in JavaScript?
Liveness is the lower bound for guaranteeing which WeakRefs an engine must not empty (note 6). So live (sets of) objects are those that must not be garbage-collected because they still will be used by the program.
However, the liveness of a set of objects does not mean that all the objects in the set must be retained. It means that there are some objects in the set that still will be used by the program, and the live set (as a whole) must not be garbage-collected. This is because the definition is used in its negated form in the garbage collector Execution algorithm2: At any time, if a set of objects S is not live, an ECMAScript implementation may3 […] atomically [remove them]. In other words, if an implementation chooses a non-live set S in which to empty WeakRefs, it must empty WeakRefs for all objects in S simultaneously (note 2).
Looking at individual objects, we can say they are not live (garbage-collectable) if there is at least one non-live set containing them; and conversely we say that an individual object is live if every set of objects containing it is live (note 3). It's a bit weird as a "live set of objects" is basically defined as "a set of objects where any of them is live", however the individual liveness is always "with respect to the set S", i.e. whether these objects can be garbage-collected together.
1: This definitely appears to be the section with the highest notes-to-content ratio in the entire spec.
2: emphasis mine
3: From the first paragraph of the objectives: "This specification does not make any guarantees that any object will be garbage collected. Objects which are not live may be released after long periods of time, or never at all. For this reason, this specification uses the term "may" when describing behaviour triggered by garbage collection."
Now, let's try to understand the definition.
At any point during evaluation, a set of objects S is considered
live if either of the following conditions is met:
Any element in S is included in any agent's [[KeptAlive]] List.
There exists a valid future hypothetical WeakRef-oblivious execution
with respect to S that observes the Object value of any object in S.
The first condition is pretty clear. The [[KeptAlive]] list of an agent is representing the list of objects to be kept alive until the end of the current Job. It is cleared after a synchronous run of execution ends, and the note on WeakRef.prototype.deref4 provides further insight on the intention: If [WeakRefDeref] returns a target Object that is not undefined, then this target object should not be garbage collected until the current execution of ECMAScript code has completed.
The second condition however, oh well. It is not well defined what "valid", "future execution" and "observing the Object value" mean. The intuition the second condition above intends to capture is that an object is live if its identity is observable via non-WeakRef means (note 2), aha. From my understanding, "an execution" is the execution of JavaScript code by an agent and the operations occurring during that. It is "valid" if it conforms to the ECMAScript specification. And it is "future" if it starts from the current state of the program.
An object's identity may be observed by observing a strict equality comparison between objects or observing the object being used as key in a Map (note 4), whereby I assume that the note only gives examples and "the Object value" means "identity". What seems to matter is whether the code does or does not care if the particular object is used, and all of that only if the result of the execution is observable (i.e. cannot be optimised away without altering the result/output of the program)5.
To determine liveness of objects by these means would require testing all possible future executions until the objects are no longer observable. Therefore, liveness as defined here is undecidable6. In practice, engines use conservative approximations such as reachability7 (note 6), but notice that research on more advanced garbage-collectors is under way.
Now for the interesting bit: what makes an execution "hypothetical WeakRef-oblivious with respect to a set of object S"? It means an execution under the hypothesis that all WeakRefs to objects in S are already cleared8. We assume that during the future execution, the abstract operation WeakRefDeref of a WeakRef whose referent is an element of S always returns undefined (def), and then work back whether it still might observe an element of the set. If none of the objects to be can be observed after all weak references to them are cleared, they may be garbage-collected. Otherwise, S is considered live, the objects cannot be garbage-collected and the weak references to them must not be cleared.
4: See the whole note for an example. Interestingly, also the new WeakRef(obj) constructor adds obj to the [[KeptAlive]] list.
5: Unfortunately, "the notion of what constitutes an "observation" is intentionally left vague" according to this very interesting es-discourse thread.
6: While it appears to be useless to specify undecidable properties, it actually isn't. Specifying a worse approximation, e.g. said reachability, would preclude some optimisations that are possible in practice, even if it is impossible to implement a generic 100% optimiser. The case is similar for dead code elimination.
7: Specifying the concept of reachability would actually be much more complicated than describing liveness. See Note 5, which gives examples of structures where objects are reachable through internal slots and specification type fields but should be garbage-collected nonetheless.
8: See also issue 179 in the proposal and the corresponding PR for why sets of objects were introduced.
Example time!
It is hard to me to recognize how livenesses of several objects may affect each other.
WeakRef-obliviousness, together with liveness, capture[s the notion] that a WeakRef itself does not keep an object alive (note 1). This is pretty much the purpose of a WeakRef, but let's see an example anyway:
{
const o = {};
const w = new WeakRef(o);
t = setInterval(() => {
console.log(`Weak reference was ${w.deref() ? "kept" : "cleared"}.`)
}, 1000);
}
(You can run this in the console, then force garbage collection, then clearInterval(t);)
[The second notion is] that cycles in liveness does not imply that an object is live (note 1). This one is a bit tougher to show, but see this example:
{
const o = {};
const w = new WeakRef(o);
setTimeout(() => {
console.log(w.deref() && w.deref() === o ? "kept" : "cleared")
}, 1000);
}
Here, we clearly do observe the identity of o. So it must be alive? Only if the w that holds o is not cleared, as otherwise … === o is not evaluated. So the liveness of (the set containing) o depends on itself, with circular reasoning, and a clever garbage collector is actually allowed to collect it regardless of the closure.
To be concrete, if determining obj's liveness depends on determining the liveness of another WeakRef referent, obj2, obj2's liveness cannot assume obj's liveness, which would be circular reasoning (note 1). Let's try to make an example with two objects that depend on each other:
{
const a = {}, b = {};
const wa = new WeakRef(a), wb = new WeakRef(b);
const lookup = new WeakMap([[a, "b kept"], [b, "a kept"]]);
setTimeout(() => {
console.log(wa.deref() ? lookup.get(b) : "a cleared");
console.log(wb.deref() ? lookup.get(a) : "b cleared");
}, 1000);
}
The WeakMap primarily serves as something that would observe the identity of the two objects. Here, if a is kept so wa.deref() would return it, b is observed; and if b is kept so wb.deref() would return it, a is observed. Their liveness depends on each other, but we must not do circular reasoning. A garbage-collector may clear both wa and wb at the same time, but not only one of them.
Chrome does currently checks for reachability through the closure so the above snippet doesn't work, but we can remove those references by introducing a circular dependency between the objects:
{
const a = {}, b = {};
a.b = b; b.a = a;
const wa = new WeakRef(a), wb = new WeakRef(b);
const lookup = new WeakMap([[a, "b kept"], [b, "a kept"]]);
t = setInterval(() => {
console.log(wa.deref() ? lookup.get(wa.deref().b) : "a cleared");
console.log(wb.deref() ? lookup.get(wb.deref().a) : "b cleared");
}, 1000);
}
To me, note 2 (WeakRef-obliviousness is defined on sets of objects instead of individual objects to account for cycles. If it were defined on individual objects, then an object in a cycle will be considered live even though its Object value is only observed via WeakRefs of other objects in the cycle.) seems to say the exact same thing. The note was introduced to fix the definition of liveness to handle cycles, that issue also includes some interesting examples.

What Does the Javascript Spec say should be returned from an out of bounds array index on a NodeList

I recently started a new job working on a browser-based application and am still learning the finer points of Javascript. I've got a question about what a JS interpreter following the spec should do with the following code snippet.
node_list_foo = document.getElementsByName("bar");
//node_list_foo is now a node list with a length of lets say 3.
var element_from_item = NodeListFoo.item(4);
//element_from_item is now NULL according to [url]https://dom.spec.whatwg.org/#interface-nodelist[/url]
var element_from_index = NodeListFoo[4];
// my broswers are setting element_from_index to undefined in this case, but what should it be?
https://dom.spec.whatwg.org/#interface-nodelist appears to defer defining the behavior of out of range indices to the definition of supported-property-indices, am I reading that right?
https://heycam.github.io/webidl/#dfn-supported-property-indices has this to say about using array style syntax to access objects with a getter. NodeList has a relevant getter - item(index).
If an indexed property getter was specified using an operation with an >identifier, then the value returned when indexing the object with a given >supported property index is the value that would be returned by invoking the >operation, passing the index as its only argument. If the operation used to >declare the indexed property getter did not have an identifier, then the >interface definition must be accompanied by a description of how to determine >the value of an indexed property for a given index.
If I'm reading that right, the spec says the behavior of the two access methods should be the same? So both should return undefined?
Curiously, https://developer.mozilla.org/en-US/docs/Web/API/NodeList, documents the observed behavior and not the behavior that I think the spec calls for.
What's the correct interpretation of the spec and is there another approach I should take to finding out how these kinds of things work?
Finally, if my interpretation of the spec is correct; nobody seems to actually follow the spec for using array style syntax; so as a paranoid and perfectionist programmer should I be using nodelist.item(x) as the preferred approach?

javascript cost of comparison with undefined

While messing around with JavaScript I found that comparing an array element with undefined was very interesting. Considering:
L = [1,2,3];
if (L[1] == undefined)
console.log('no element for key 1');
else
console.log('Value for key 1'+L[1]);
I think thats an awesome way to check for values in sequences in JavaScript, instead of iterating over sequences or other containers, but my question is: is that error prone or not efficient? Whats the cost of such comparison?

The code does not test if a particular value exists; it tests if an [Array] index was assigned a non-undefined value. (It will also incorrectly detect some false-positive values like null due to using ==, but that's for another question ..)
Consider this:
L = ["hello","world","bye"]
a = L["bye"]
b = L[1]
What is the value of a and what does it say about "bye"? What is the value of b and how does 1 relate to any of the values which may (or may not) exist as elements of L?
That is, iterating an Array - to find a value of unknown index? to perform an operation on multiple values? - and accessing an element by index are two different operations and cannot be generally interchanged.
On the flip side, object properties can be used to achieve a similar (but useful) effect:
M = {hello: 1, world: 1, bye: 1}
c = M["hello"]
What is the value of c now? How does the value used as the key relate to the data?
In this case the property name (used as a lookup key) relates to the data being checked and can say something useful about it - yes, there is a "hello"! (This can detect some false-positives without using hasOwnProperty, but that's for another question ..)
And, of course .. for a small sequence, or an infrequent operation, iterating (or using a handy method like Array.indexOf or Array.some) to find existence of a value is "just fine" and will not result in a "performance impact".

In V8 accessing array out of bounds to elicit undefined is unspeakably slow because for instance if it's done in optimized code, the optimized code is thrown away and deoptimized. In other languages an exception would be thrown which is very very slow too or in unmanaged language your program would have undefined behavior and for example crash if you're lucky.
So always check the .length of the collection to ensure you don't do out of bounds accesses.
Also, for performance prefer void 0 over undefined as it is a compile time constant rather than runtime variable lookup.

Why is arr = [] faster than arr = new Array?

I ran this code and got the below result. I curious to know why [] is faster?
console.time('using[]')
for(var i=0; i<200000; i++){var arr = []};
console.timeEnd('using[]')
console.time('using new')
for(var i=0; i<200000; i++){var arr = new Array};
console.timeEnd('using new')
using []: 299ms
using new: 363ms
Thanks to Raynos here is a benchmark of this code and some more possible way to define a variable.

Further expanding on previous answers...
From a general compilers perspective and disregarding VM-specific optimizations:
First, we go through the lexical analysis phase where we tokenize the code.
By way of example, the following tokens may be produced:
[]: ARRAY_INIT
[1]: ARRAY_INIT (NUMBER)
[1, foo]: ARRAY_INIT (NUMBER, IDENTIFIER)
new Array: NEW, IDENTIFIER
new Array(): NEW, IDENTIFIER, CALL
new Array(5): NEW, IDENTIFIER, CALL (NUMBER)
new Array(5,4): NEW, IDENTIFIER, CALL (NUMBER, NUMBER)
new Array(5, foo): NEW, IDENTIFIER, CALL (NUMBER, IDENTIFIER)
Hopefully this should provide you a sufficient visualization so you can understand how much more (or less) processing is required.
Based on the above tokens, we know as a fact ARRAY_INIT will always produce an array. We therefore simply create an array and populate it. As far as ambiguity, the lexical analysis stage has already distinguished ARRAY_INIT from an object property accessor (e.g. obj[foo]) or brackets inside strings/regex literals (e.g. "foo[]bar" or /[]/)
This is miniscule, but we also have more tokens with new Array. Furthermore, it's not entirely clear yet that we simply want to create an array. We see the "new" token, but "new" what? We then see the IDENTIFIER token which signifies we want a new "Array," but JavaScript VM's generally do not distinguish an IDENTIFIER token and tokens for "native global objects." Therefore...
We have to look up the scope chain each time we encounter an IDENTIFIER token. Javascript VMs contain an "Activation object" for each execution context which may contain the "arguments" object, locally defined variables, etc. If we cannot find it in the Activation object, we begin looking up the scope chain until we reach the global scope. If nothing is found, we throw a ReferenceError.
Once we've located the variable declaration, we invoke the constructor. new Array is an implicit function call, and the rule of thumb is that function calls are slower during execution (hence why static C/C++ compilers allow "function inlining" - which JS JIT engines such as SpiderMonkey have to do on-the-fly)
The Array constructor is overloaded. The Array constructor is implemented as native code so it provides some performance enhancements, but it still needs to check for arguments length and act accordingly. Moreover, in the event only one argument is supplied, we need to further check the type of the argument. new Array("foo") produces ["foo"] where as new Array(1) produces [undefined]
So to simplify it all: with array literals, the VM knows we want an array; with new Array, the VM needs to use extra CPU cycles to figure out what new Array actually does.

One possible reason is that new Array requires a name lookup on Array (you can have a variable with that name in scope), whereas [] does not.

Good question.
The first example is called an array literal. It is the prefered way to create arrays among many developers. It could be that the performance difference is caused by checking the arguments of the new Array() call and then creating the object, while the literal creates an array directly.
The relatively small difference in performance supports this point I think. You could do the same test with the Object and object literal {} by the way.

Also, interesting, if the length of the array is known in advance (elements will be added just after creation), the use of an array constructor with a specified length is much faster on recent Google Chrome 70+.
"new Array( %ARR_LENGTH% )" – 100% (faster)!
"[]" – 160-170% (slower)
The test can be found here - https://jsperf.com/small-arr-init-with-known-length-brackets-vs-new-array/2
Note: this result tested on Google Chrome v.70+; in the Firefox v.70 and IE both variants almost equal.

This would make some sense
Objects literals enable us to write code that supports lots of
features yet still make it a relatively straightforward for the
implementers of our code. No need to invoke constructors directly or
maintain the correct order of arguments passed to functions, etc.
http://www.dyn-web.com/tutorials/obj_lit.php

If I set only a high index in an array, does it waste memory?

In Javascript, if I do something like
var alpha = [];
alpha[1000000] = 2;
does this waste memory somehow? I remember reading something about Javascript arrays still setting values for unspecified indices (maybe sets them to undefined?), but I think this may have had something to do with delete. I can't really remember.

See this topic:
are-javascript-arrays-sparse
In most implementations of Javascript (probably all modern ones) arrays are sparse. That means no, it's not going to allocate memory up to the maximum index.
If it's anything like a Lua implementation there is actually an internal array and dictionary. Densely populated parts from the starting index will be stored in the array, sparse portions in the dictionary.

This is an old myth. The other indexes on the array will not be assigned.
When you assign a property name that is an "array index" (e.g. alpha[10] = 'foo', a name that represents an unsigned 32-bit integer) and it is greater than the current value of the length property of an Array object, two things will happen:
The "index named" property will be created on the object.
The length will be incremented to be that index + 1.
Proof of concept:
var alpha = [];
alpha[10] = 2;
alpha.hasOwnProperty(0); // false, the property doesn't exist
alpha.hasOwnProperty(9); // false
alpha.hasOwnProperty(10); // true, the property exist
alpha.length; // 11
As you can see, the hasOwnProperty method returns false when we test the presence of the 0 or 9 properties, because they don't exist physically on the object, whereas it returns true for 10, the property was created.
This misconception probably comes from popular JS consoles, like Firebug, because when they detect that the object being printed is an array-like one, they will simply make a loop, showing each of the index values from 0 to length - 1.
For example, Firebug detects array-like objects simply by looking if they have a length property whose its value is an unsigned 32-bit integer (less than 2^32 - 1), and if they have a splice property that is a function:
console.log({length:3, splice:function(){}});
// Firebug will log: `[undefined, undefined, undefined]`
In the above case, Firebug will internally make a sequential loop, to show each of the property values, but no one of the indexes really exist and showing [undefined, undefined, undefined] will give you the false sensation that those properties exist, or that they were "allocated", but that's not the case...
This has been like that since ever, it's specified even of the ECMAScript 1st Edition Specification (as of 1997), you shouldn't worry to have implementation differences.

About a year ago, I did some testing on how browsers handle arrays (obligatory self-promotional link to my blog post.) My testing was aimed more at CPU performance than at memory consumption, which is much harder to measure. The bottom line, though, was that every browser I tested with seemed to treat sparse arrays as hash tables. That is, unless you initialized the array from the get-go by putting values in consecutive indexes (starting from 0), the array would be implemented in a way that seemed to optimize for space.
So while there's no guarantee, I don't think that setting array[100000] will take any more room than setting array[1] -- unless you also set all the indexes leading up to those.

I dont think so because javascript treats arrays kinda like dictionaries, but with integer keys.
alpha[1000000] = alpha["1000000"]

I don't really know javascript, but it would be pretty odd behaviour if it DIDN'T allocate space for the entire array. Why would you think it wouldn't take up space? You're asking for a huge array. If it didn't give it to you, that would be a specific optimisation.
This obviously ignores OS optimisations such as memory overcommit and other kernel and implementation specifics.

We Keep Coding

JavaScript is the programming language of the Web.

Javascript: is reading ES6 Map.size constant time? - javascript

Related

What is Liveness in JavaScript?

What Does the Javascript Spec say should be returned from an out of bounds array index on a NodeList

javascript cost of comparison with undefined

Why is arr = [] faster than arr = new Array?

If I set only a high index in an array, does it waste memory?

Categories

Resources