JavaScript: V8 question: are small integers pooled?

JavaScript: V8 question: are small integers pooled? - javascript

was looking at this V8 design doc where it has a section for Constant Pool Entries
it says
Constant pools are used to store heap objects and small integers that are referenced as constants in generated bytecode.
and
... Small integers and the strong referenced oddball type’s have bytecodes to load them directly and do not go into the constant pool.
So I am confused: are small integers pooled or not?
My understanding is that it is not worth it pooling small integers if sizeof(int) < sizeof(int *) - because it is cheaper to just copy the actual integer instead of copying the pointer that points to the integer in the constant pool. Also variables that hold integers can be optimised to be stored directly in CPU registers and skip being allocated in memory first.
Also, are they located on the V8 heap or the stack? My understanding had always been that smis are just be the immediate values allocated on the stack instead of being a pointer + an integer allocated on heap. Also if you take a heap snapshot using chrome devtool you cannot find smis in the heap snapshot - only heap number such as big integers or double like 3.14 are on the heap until I saw this article https://v8.dev/blog/pointer-compression#value-tagging-in-v8
JavaScript values in V8 are represented as objects and allocated on the V8 heap, no matter if they are objects, arrays, numbers or strings. This allows us to represent any value as a pointer to an object.
Now I am just baffled - are smis also allocated on the heap?

V8 developer here.
are small integers pooled or not?
They are not (at least not right now). That said, this is a small implementation detail and could be done either way: it would totally be possible to use the constant pool for Smis. I suppose the decision to build special machinery for Smis (instead of reusing the general-purpose constant pool) was made because things turned out to be more efficient that way.
it is not worth it pooling small integers if sizeof(int) < sizeof(int *)
The details are different (a Smi is not an int, and constant pool slots are referenced by index rather than C++ pointer), but this reasoning does go in the right direction: avoiding indirections can save time and memory.
are smis also allocated on the heap?
Yes, everything is allocated on the heap. The stack is only useful for temporary (and sufficiently small) things; that's largely unrelated to the type of thing.
The "trick" of Smis is that they're not stored as separate objects: when you have an object that refers to a Smi, such as let foo = {smi: 42}, then the value 42 can be smi-encoded and stored directly inside the "foo" object (whereas if the value was 42.5, then the object would store a pointer to a separate "HeapNumber"). But since the object is on the heap, so is the Smi.
#DanielCruz
What I understand [...] is that constant small integers are pooled. Variable small integers are not.
Nope. Any literal that occurs in source code is "constant". Whether you use let or const for your variables has nothing to do with this.

Related

Are JS primitives references?

I have found explanations that primitives and references get stored directly on the stack (static memory - size of the values is known), whereas objects, functions, etc get allocated on the heap (dynamic memory - to be able to grow).
Source:
https://felixgerschau.com/javascript-memory-management/
Now I've read a few articles where the wording suggests that everything in JS is accessed by reference.
https://daveceddia.com/javascript-references/
So this would mean that primitives are also stored as reference. Is any value stored directly on the stack after all? Another indication is that if you write something like
// no prior variable definition
console.log(a);
// ReferenceError: a is not defined
it will actually give you a ReferenceError, although it could be any type (including primitives).
So, it seems to me like everything in JS is a reference. Is that correct? If yes - where is a referenced primitive value stored? On the heap? On the stack (as it is a primitive)? Can a reference point to the stack?

I want to share the results of my research in case someone else finds it interesting as well.
While the ECMA specification does not seem to be specific (it is also hard to read from a users point of view), there is plenty of information how V8 (Chromium / NodeJS) handles this matter:
It basically puts everything on the heap and references it with a pointer, except for small integers. Small integers are baked into the pointer with a technique called pointer tagging (encoded in the last bits).
Here's what the V8 developer blog says about this topic:
https://v8.dev/blog/pointer-compression#value-tagging-in-v8
JavaScript values in V8 are represented as objects and allocated on the V8 heap, no matter if they are objects, arrays, numbers or strings. This allows us to represent any value as a pointer to an object.
Many JavaScript programs perform calculations on integer values, such as incrementing an index in a loop. To avoid us having to allocate a new number object each time an integer is incremented, V8 uses the well-known pointer tagging technique to store additional or alternative data in V8 heap pointers.

Are JavaScript Arrays significantly smaller than Objects on the heap?

I'm working with (network) graphs in the browser and was wondering if I can significantly reduce memory usage by changing the way I represent them.
I was thinking of changing the representation of the edges from:
{from : string, to : string, weight : number, bi : boolean}
To:
[string, string, number, boolean]
I recognize that the serialized Object sizes and Array sizes for this particular use case are not that different, particularly if I shorten my keys. But, I'm curious if there is a significant difference between the size of Objects and Arrays on the heap in most browsers. If I'm most concerned about browsers running V8, is there a way to test this in Node?
Note: without the graph portion of my application, my JS heap was already a little large (15 mb for my logic + 50mb for a couple frames). So, there is a bit of justification beyond my curiosity for trying something like this out.

(V8 developer here.)
What I'd expect is that the objects are slightly smaller than the arrays in this case. I'd also expect that the difference only matters if you have a lot of objects.
The reason is V8's "hidden classes" system, where the names of the properties are only stored once and shared by all objects with the same "shape", so each object needs 3 pointers (object header) + 4 pointers for the properties, with each pointer being 4 bytes that's a total of 7*4 = 28 bytes. The length of the property names does not affect the per-object memory requirements. Arrays have one more property (.length), and their elements will be stored in a separate backing store, so the total memory consumption for each array should be 3 (object header) + 1 (length) + 2 (backing store header) + 4 (elements) pointers = 10 pointers or 40 bytes.
In a simpler JavaScript engine where all objects are implemented as dictionaries, arrays might indeed save some memory, because each object would store the property names, at least as a pointer to a shared string -- unless, of course, arrays are also implemented as dictionaries in such an engine, in which case they'd again be a bit bigger because they have the additional length property.
Depending on what else you do with your objects, you could make V8 migrate them to dictionary mode (because hidden classes have many benefits and also some disadvantages in some cases), but it's more likely you'll encounter the situation described above.
So, as always for performance and memory questions, the only way to be sure is to implement both approaches in your real app (not in a simplified microbenchmark) and measure the impact. If you can't measure a difference, then there is no difference to worry about.

How many nested Objects or Arrays are possible in Javascript?

This question comes out of curiosity, but is there a limit to how many nested Objects a given Object can have (or similarly, since typeof [] === "object", how many Arrays can be nested in an Array) in JS?
I thought about creating an array that could have thousands of nested arrays, with thousands of numbers inside them as well. Setting aside possible best practice issues, I was wondering whether I could run into a cap at some point (that is, supposing it's not just a problem with, e.g., too much recursion or nested for-loops, or hardware limitations).
Related: how many nested object should i define in javascript?

When you create an object or array in Javascript, it allocates a small block of memory to store the value. All values in the object or array, including other objects or arrays, are allocated their own separate block of memory, with a reference to the address in the parent object or array.
Therefore, how many nested objects or arrays (or how many objects or arrays, period) that you can have initialized in a script at one time, is entirely dependent on how much RAM your computer has. Once you start to reach the limits of your RAM capacity, processor speed comes into play, as the faster your computer can locate, create, and reallocate address space, the longer the system can run before it crashes.

What is the complexity of JSON.parse() in JavaScript?

The title says it all. I'm going to be parsing a very large JSON string and was curious what the complexity of this built in method was.
I would hope that it's θ(n) where n is the number of characters in the string since it can determine whether there is a syntax error or not.
I tried searching but couldn't come up with anything.

JSON is very simple grammar that does not require even lookaheads. As soon as GC is not involved then it is purely O(n).

I do not know of the implementations in browsers, but your assumption is correct to a certain point. If the JSON includes mainly strings, it will be straight forward and very linear. If you have many floating points, it will take a bit of time to convert the numbers, but again quite linear (numbers with more digits take slightly longer, but in comparison to a long string... very similar).
Since in most cases arrays and objects are declared as maps, the memory allocation grows as required and will generally be linear. Many (if not most) implementations will make use of Java as a backend. This means garbage collection and thus a quite impossible way to know for sure how much time will be required to transform all the data as it will very much depend on things such as the size of the memory model used on the target computer and how often the garbage collection runs. However, it should generally just grow as items are added to the map and it will thus mostly look like it is linear as well. I would not expect an implementation to make use of a realloc() which would mean copying data and thus being slower and slower as an array/object grows bigger and bigger.

Was curious a little more so to add more info I believe this is the "high-level" implementation of JSON.parse. I tried finding if Chromium has their own source for it and not sure if this is it? This is going off of the source from Github.
Things to note:
worst case is probably the scenario of handling objects which requires O(N) time where N is the number of characters.
in the case a reviver function is passed, it has to rewalk the entire object after it's created but that only happens once so it's fairly negligible. Also depends what the reviver function is doing and you will have to account for it's own time complexity.

Are the advantages of Typed Arrays in JavaScript is that they work the same or similar in C?

I've been playing around with Typed Arrays in JavaScript.
var buffer = new ArrayBuffer(16);
var int32View = new Int32Array(buffer);
I imagine normal arrays ([1, 257, true]) in JavaScript have poor performance because their values could be of any type, therefore, reaching an offset in memory is not trivial.
I originally thought that JavaScript array subscripts worked the same as objects (as they have many similarities), and were hash map based, requiring a hash based lookup. But I haven't found much credible information to confirm this.
So, I'd assume the reason why Typed Arrays perform so well is because they work like normal arrays in C, where they're always typed. Given the initial code example above, and wishing to get the 10th value in the typed array...
var value = int32View[10];
The type is Int32, so each value must consist of 32 bits or 4 bytes.
The subscript is 10.
So the location in memory of that value is <array offset> + (4 * 10), and then read 4 bytes to get the total value.
I basically just want to confirm my assumptions. Is my thoughts around this correct, and if not, please elaborate.
I checked out the V8 source to see if I could answer it myself, but my C is rusty and I'm not too familiar with C++.

Typed Arrays were designed by the WebGL standards committee, for performance reasons. Typically Javascript arrays are generic and can hold objects, other arrays and so on - and the elements are not necessarily sequential in memory, like they would be in C. WebGL requires buffers to be sequential in memory, because that's how the underlying C API expects them. If Typed Arrays are not used, passing an ordinary array to a WebGL function requires a lot of work: each element must be inspected, the type checked, and if it's the right thing (e.g. a float) then copy it out to a separate sequential C-like buffer, then pass that sequential buffer to the C API. Ouch - lots of work! For performance-sensitive WebGL applications this could cause a big drop in the framerate.
On the other hand, like you suggest in the question, Typed Arrays use a sequential C-like buffer already in their behind-the-scenes storage. When you write to a typed array, you are indeed assigning to a C-like array behind the scenes. For the purposes of WebGL, this means the buffer can be used directly by the corresponding C API.
Note your memory address calculation isn't quite enough: the browser must also bounds-check the array, to prevent out-of-range accesses. This has to happen with any kind of Javascript array, but in many cases clever Javascript engines can omit the check when it can prove the index value is already within bounds (such as looping from 0 to the length of the array). It also has to check the array index is really a number and not a string or something else! But it is in essence like you describe, using C-like addressing.
BUT... that's not all! In some cases clever Javascript engines can also deduce the type of ordinary Javascript arrays. In an engine like V8, if you make an ordinary Javascript array and only store floats in it, V8 may optimistically decide it's an array of floats and optimise the code it generates for that. The performance can then be equivalent to typed arrays. So typed arrays aren't actually necessary to reach maximum performance: just use arrays predictably (with every element the same type) and some engines can optimise for that as well.
So why do typed arrays still need to exist?
Optimisations like deducing the type of arrays is really complicated. If V8 deduces an ordinary array has only floats in it, then you store an object in an element, it has to de-optimise and regenerate code that makes the array generic again. It's quite an achievement that all this works transparently. Typed Arrays are much simpler: they're guaranteed to be one type, and you just can't store other things like objects in them.
Optimisations are never guaranteed to happen; you may store only floats in an ordinary array, but the engine may decide for various reasons not to optimise it.
The fact they're much simpler means other less-sophisticated javascript engines can easily implement them. They don't need all the advanced deoptimisation support.
Even with really advanced engines, proving optimisations can be used is extremely difficult and can sometimes be impossible. A typed array significantly simplifies the level of proof the engine needs to be able to optimise around it. A value returned from a typed array is certainly of a certain type, and engines can optimise for the result being that type. A value returned from an ordinary array could in theory have any type, and the engine may not be able to prove it will always have the same type result, and therefore generates less efficient code. Therefore code around a typed array is more easily optimised.
Typed arrays remove the opportunity to make a mistake. You just can't accidentally store an object and suddenly get far worse performance.
So, in short, ordinary arrays can in theory be equally fast as typed arrays. But typed arrays make it much easier to reach peak performance.

Yes, you are mostly correct. With a standard JavaScript array, the JavaScript engine has to assume that the data in the array is all objects. It can still store this as a C-like array/vector, where the access to the memory is still like you described. The problem is that the data is not the value, but something referencing that value (the object).
So, performing a[i] = b[i] + 2 requires the engine to:
access the object in b at index i;
check what type the object is;
extract the value out of the object;
add 2 to the value;
create a new object with the newly computed value from 4;
assign the new object from step 5 into a at index i.
With a typed array, the engine can:
access the value in b at index i (including placing it in a CPU register);
increment the value by 2;
assign the new object from step 2 into a at index i.
NOTE: These are not the exact steps a JavaScript engine will perform, as that depends on the code being compiled (including surrounding code) and the engine in question.
This allows the resulting computations to be much more efficient. Also, the typed arrays have a memory layout guarantee (arrays of n-byte values) and can thus be used to directly interface with data (audio, video, etc.).

When it comes to performance, things can change fast. As AshleysBrain says, it comes down to whether the VM can deduce that a normal array can be implemented as a typed array quickly and accurately. That depends on the particular optimizations of the particular JavaScript VM, and it can change in any new browser version.
This Chrome developer comment provides some guidance that worked as of June 2012:
Normal arrays can be as fast as typed arrays if you do a lot of sequential access. Random access outside the bounds of the array causes the array to grow.
Typed arrays are fast for access, but slow to be allocated. If you create temporary arrays frequently, avoid typed arrays. (Fixing this is possible, but it's low priority.)
Micro-benchmarks such as JSPerf are not reliable for real-world performance.
If I might elaborate on the last point, I've seen this phenomenon with Java for years. When you test the speed of a small piece of code by running it over and over again in isolation, the VM optimizes the heck out of it. It makes optimizations which only make sense for that specific test. Your benchmark can get a hundredfold speed improvement compared to running the same code inside another program, or compared to running it immediately after running several different tests that optimize the same code differently.

I'm not really contributor to any javascript engine, only had some readings on v8, so my answer might not be completely true:
Well values in arrays(only normal arrays with no holes/gaps, not sparse. Sparse arrays are treated as objects.) are all either pointers or a number with a fixed length(in v8 they are 32 bit, if a 31 bit integer then it's tagged with a 0 bit in the end, else it's a pointer).
So I don't think finding the memory location is any different than a typedArray, since the number of the bytes are the same all over the array. But the difference comes that if it's an a object, then you have to add one unboxing layer, which doesn't happen for normal typedArrays.
And ofcourse when accessing typedArrays, definitely doesn't have type checking's that a normal array have(though that might be remove in a higly optimized code, which is only generated for hot code).
For Writing, if it's the same type shouldn't be much slower. If it's a different type then the JS engine might generate polymorphic code for it, which is slower.
You can also try making some benchmarks on jsperf.com to confirm.

We Keep Coding

JavaScript is the programming language of the Web.