Why are we allowed to create sparse arrays in JavaScript? - javascript

I was wondering what the use-cases for code like var foo = new Array(20), var foo = [1,2,3]; foo.length = 10 or var foo = [,,,] were (also, why would you want to use the delete operator instead of just removing the item from the array). As you may know already, all these will result in sparse arrays.
But why are we allowed to do the above thnigs ? Why would anyone want to create an array whose length is 20 by default (as in the first example) ? Why would anyone want to modify and corrupt the length property of an array (as in the second example) ? Why would anyone want to do something like [, , ,] ? Why would you use delete instead of just removing the element from the array ?
Could anyone provide some use-cases for these statements ?
I have been searching for some answers for ~3 hours. Nothing. The only thing most sources (2ality blog, JavaScript: The Definitive Guide 6th edition, and a whole bunch of other articles that pop up in the Google search results when you search for anything like "JavaScript sparse arrays") say is that sparse arrays are weird behavior and that you should stay away from them. No sources I read explained, or at least tried to explain, why we were allowed to create sparse arrays in the first place. Except for You Don't Know JS: Types & Grammar, here is what the book says about why JavaScript allows the creation of sparse arrays:
An array that has no explicit values in its slots, but has a length property that implies the slots exist, is a weird exotic type of data structure in JS with some very strange and confusing behavior. The capability to create such a value comes purely from old, deprecated, historical functionalities ("array-like objects" like the arguments object).
So, the book implies that the arguments object somehow, somewhere, uses one of the examples I listed above to create a sparse array. So, where and how does arguments use sparse arrays ?
Something else that is confusing me is this part in the book "JavaScript: The Definitive Guide 6th Edition":
Arrays that are sufficiently sparse are typically implemented in a slower, more memory-efficient way than dense arrays are`.
"more memory-efficient" appears like a contradiction to "slower" to me, so what is the difference between the two, in the context of sparse arrays especially ? Here is a link to that specific part of the book.

I was wondering what the use-cases for code like var foo = new Array(20), var foo = [1,2,3]; foo.length = 10 or var foo = [,,,] were
in theory, for the same reason people usually use sparse data structure ( not necessarily in order of importance ): memory usage ( var x = []; x[0]=123;x[100000]=456; won't consume 100000 'slots' ), performance ( say, take the avarage of the aforementioned x, via for-in or reduce() ) and convenience ( no 'hard' out of bound errors, no need to grow/shrink explicitly );
that said, semantically, a js array is just a special associative collection with index keys and a special property 'length' satisfying the invariant of being greater than all its index properties. While being a pretty elegant definition, it has the drawback of rendering sparsely defined arrays somewhat confusing and error prone as you noticed.
But why are we allowed to do the above thnigs ?
even if we were not allowed to define sparse arrays, we could still put undefined elements into arrays, resulting in basically the same usability problems you see with sparse arrays.
So, say, having [0,undefined,...,undefined,1,undefined] the same as [0,...,1,] would buy you nothing but more memory consuming arrays and slower iterations.
Arrays that are sufficiently sparse are typically implemented in a slower, more memory-efficient way than dense arrays are. more memory-efficient and slower appear like a contradiction to me
"dense arrays" used for general purpose data are typically implemented as a contiguous block of memory filled with elements of the same size; if you add more elements, you continue filling the memory block allocating a new block if exhausted. Given that reallocation implies moving all elements to the new memory block, said memory is typically allocated in abundance to minimize chances of reallocation ( something like the golden ratio times the last capacity ).
Hence, such data structures are typically the fastest for ordered/local traversal ( being more CPU/cache friendly ), the slowest for unpredicatble insertions/deletions ( for sufficiently big N ) and have high memory overhead ~ sizeof(elem) * N + extra space for future elems.
Conversely, "sparse arrays/matrices/..." are implemented by 'linking' together smaller memory blocks spreaded in memory or by using some 'logically compressed' form of a dense data structure or both; in either case, memory consumption is reduced for obvious reasons, but traversing them comparatively requires more work and less local memory access patterns.
So, if compared relative to the same effectively traversed elements sparse arrays consume much less memory but are much slower than dense arrays. However, given that you use sparse arrays with sparse data and algorithms acting trivially on 'zeros', sparse arrays can turn out much faster in some scenarios ( eg. multiply very big matrices with few non zero elements ... ).

Because a JS Array is a very curious kind of data type that do not obey the time complexity rules as you would probably expect when the right tool is used. I mean the for in loop or the Object.keys() method. Despite being a very functional man i would swing towards the for in loop here since it is breakable.
There are some very beneficial use cases of sparse arrays in JS such as inserting and deleting items into a sorted array in O(1) without disturbing the sorted structure if your values are numerals like a Limit Order Book. Or in another words, if you could establish a direct numerical correlation among your keys and values.

Related

Are native map, filter, etc. methods optimized to operate on a single intermediary array when possible?

Consider the below snippet, which converts an array of objects to an array of numbers, with negative values filtered out, and then doubled by 2:
var objects = (new Array(400)).fill({
value: Math.random() * 10 - 5
});
var positiveObjectValuesDoubled = objects.map(
item => item.value
).filter(
value => value > 0
).map(
value => value * 2
);
When chained together like this, how many actual Array objects are created in total? 1, or 3? (excluding the initial objects array).
In particular, I'm talking about the intermediary Array objects created by filter, and then by the second map call in the chain: considering these array objects are not explicitly referenced per se, are Javascript runtimes smart enough to optimize where possible in this case, to use the same memory area?
If this cannot be answered with a clear yes-or-no, how could I determine this in various browsers? (to the best of my knowledge, the array constructor can no longer be overridden, so that's not an option)
Good commentary so far, here's a summary answer: an engine might optimize for memory usage across chained method calls, but you should never count on an engine to do optimization for you.
As your example of chained methods is evaluated, the engine's memory heap is affected in the same order, step by step (MDN documentation on the event loop). But, how this works can depend on the engine...for some Array.map() might create a new array and garbage collect the old one before the next message executes, it might leave the old one hanging around until the space is needed again, it might change an array in place, whatever. The rabbithole for understanding this is very deep.
Can you test it? Sometimes! jQuery or javascript to find memory usage of page, this Google documentation are good places to start. Or you can just look at speed with something like http://jsperf.com/ which might give you at least an idea of how space-expensive something might be. But you could also use that time doing straightforward optimization in your own code. Probably a better call.

Which way is best to define a JavaScript array? [duplicate]

I want to create an array in javascript and remember two ways of doing it so I just want to know what the fundamental differences are and if there is a performance difference in these two "styles"
var array_1 = new Array("fee","fie","foo","fum");
var array_2 = ['a','b','c'];
for (let i=0; i<array_1.length; i++){
console.log(array_1[i])
}
for (let i=0; i<array_2.length; i++){
console.log(array_2[i])
}
They do the same thing. Advantages to the [] notation are:
It's shorter.
If someone does something silly like redefine the Array symbol, it still works.
There's no ambiguity when you only define a single entry, whereas when you write new Array(3), if you're used to seeing entries listed in the constructor, you could easily misread that to mean [3], when in fact it creates a new array with a length of 3 and no entries.
It may be a tiny little bit faster (depending on JavaScript implementation), because when you say new Array, the interpreter has to go look up the Array symbol, which means traversing all entries in the scope chain until it gets to the global object and finds it, whereas with [] it doesn't need to do that. The odds of that having any tangible real-world impact in normal use cases are low. Still, though...
So there are several good reasons to use [].
Advantages to new Array:
You can set the initial length of the array, e.g., var a = new Array(3);
I haven't had any reason to do that in several years (not since learning that arrays aren't really arrays and there's no point trying to pre-allocate them). And if you really want to, you can always do this:
var a = [];
a.length = 3;
There's no difference in your usage.
The only real usage difference is passing an integer parameter to new Array() which will set an initial array length (which you can't do with the [] array-literal notation). But they create identical objects either way in your use case.
This benchmark on JSPerf shows the array literal form to be generally faster than the constructor on some browsers (and not slower on any).
This behavior is, of course, totally implementation dependent, so you'll need to run your own test on your own target platforms.
I believe the performance benefits are negligible.
See http://jsperf.com/new-array-vs-literal-array/4
I think both ways are the same in terms of performance since they both create an "Array object" eventually. So once you start accessing the array the mechanism will be the same. I not too sure about how different the mechanisms to construct the arrays be (in terms of performance) though it shouldn't be any noticeable gains using one way to the other.

defining array in javascript

I want to create an array in javascript and remember two ways of doing it so I just want to know what the fundamental differences are and if there is a performance difference in these two "styles"
var array_1 = new Array("fee","fie","foo","fum");
var array_2 = ['a','b','c'];
for (let i=0; i<array_1.length; i++){
console.log(array_1[i])
}
for (let i=0; i<array_2.length; i++){
console.log(array_2[i])
}
They do the same thing. Advantages to the [] notation are:
It's shorter.
If someone does something silly like redefine the Array symbol, it still works.
There's no ambiguity when you only define a single entry, whereas when you write new Array(3), if you're used to seeing entries listed in the constructor, you could easily misread that to mean [3], when in fact it creates a new array with a length of 3 and no entries.
It may be a tiny little bit faster (depending on JavaScript implementation), because when you say new Array, the interpreter has to go look up the Array symbol, which means traversing all entries in the scope chain until it gets to the global object and finds it, whereas with [] it doesn't need to do that. The odds of that having any tangible real-world impact in normal use cases are low. Still, though...
So there are several good reasons to use [].
Advantages to new Array:
You can set the initial length of the array, e.g., var a = new Array(3);
I haven't had any reason to do that in several years (not since learning that arrays aren't really arrays and there's no point trying to pre-allocate them). And if you really want to, you can always do this:
var a = [];
a.length = 3;
There's no difference in your usage.
The only real usage difference is passing an integer parameter to new Array() which will set an initial array length (which you can't do with the [] array-literal notation). But they create identical objects either way in your use case.
This benchmark on JSPerf shows the array literal form to be generally faster than the constructor on some browsers (and not slower on any).
This behavior is, of course, totally implementation dependent, so you'll need to run your own test on your own target platforms.
I believe the performance benefits are negligible.
See http://jsperf.com/new-array-vs-literal-array/4
I think both ways are the same in terms of performance since they both create an "Array object" eventually. So once you start accessing the array the mechanism will be the same. I not too sure about how different the mechanisms to construct the arrays be (in terms of performance) though it shouldn't be any noticeable gains using one way to the other.

If I set only a high index in an array, does it waste memory?

In Javascript, if I do something like
var alpha = [];
alpha[1000000] = 2;
does this waste memory somehow? I remember reading something about Javascript arrays still setting values for unspecified indices (maybe sets them to undefined?), but I think this may have had something to do with delete. I can't really remember.
See this topic:
are-javascript-arrays-sparse
In most implementations of Javascript (probably all modern ones) arrays are sparse. That means no, it's not going to allocate memory up to the maximum index.
If it's anything like a Lua implementation there is actually an internal array and dictionary. Densely populated parts from the starting index will be stored in the array, sparse portions in the dictionary.
This is an old myth. The other indexes on the array will not be assigned.
When you assign a property name that is an "array index" (e.g. alpha[10] = 'foo', a name that represents an unsigned 32-bit integer) and it is greater than the current value of the length property of an Array object, two things will happen:
The "index named" property will be created on the object.
The length will be incremented to be that index + 1.
Proof of concept:
var alpha = [];
alpha[10] = 2;
alpha.hasOwnProperty(0); // false, the property doesn't exist
alpha.hasOwnProperty(9); // false
alpha.hasOwnProperty(10); // true, the property exist
alpha.length; // 11
As you can see, the hasOwnProperty method returns false when we test the presence of the 0 or 9 properties, because they don't exist physically on the object, whereas it returns true for 10, the property was created.
This misconception probably comes from popular JS consoles, like Firebug, because when they detect that the object being printed is an array-like one, they will simply make a loop, showing each of the index values from 0 to length - 1.
For example, Firebug detects array-like objects simply by looking if they have a length property whose its value is an unsigned 32-bit integer (less than 2^32 - 1), and if they have a splice property that is a function:
console.log({length:3, splice:function(){}});
// Firebug will log: `[undefined, undefined, undefined]`
In the above case, Firebug will internally make a sequential loop, to show each of the property values, but no one of the indexes really exist and showing [undefined, undefined, undefined] will give you the false sensation that those properties exist, or that they were "allocated", but that's not the case...
This has been like that since ever, it's specified even of the ECMAScript 1st Edition Specification (as of 1997), you shouldn't worry to have implementation differences.
About a year ago, I did some testing on how browsers handle arrays (obligatory self-promotional link to my blog post.) My testing was aimed more at CPU performance than at memory consumption, which is much harder to measure. The bottom line, though, was that every browser I tested with seemed to treat sparse arrays as hash tables. That is, unless you initialized the array from the get-go by putting values in consecutive indexes (starting from 0), the array would be implemented in a way that seemed to optimize for space.
So while there's no guarantee, I don't think that setting array[100000] will take any more room than setting array[1] -- unless you also set all the indexes leading up to those.
I dont think so because javascript treats arrays kinda like dictionaries, but with integer keys.
alpha[1000000] = alpha["1000000"]
I don't really know javascript, but it would be pretty odd behaviour if it DIDN'T allocate space for the entire array. Why would you think it wouldn't take up space? You're asking for a huge array. If it didn't give it to you, that would be a specific optimisation.
This obviously ignores OS optimisations such as memory overcommit and other kernel and implementation specifics.

How do arrays in JavaScript work ? (ie without a starting dimension size)

I was reading this post the other night about the inner workings of the Array, and learned a lot from the answers posted, especially from Jonathan Holland's one.
So the reason you give a size to an array beforehand is so that space will need to be reserved beforehand, so that elements in the array will be placed next each other in memory, and thus providing O(1) access time, because of the pointer + offset traversal.
But in JavaScript, you can initialize an array like such:
var anArray = []; //Initialize an empty array, without a dimension
So my question is, since in JavaScript you can initialize an array Without specifying a dimension before hand, how is memory allocated for an array to still provide a O(1) access time since the 'amount' of memory locations is not specified beforehand ?
Hmm. You should distinguish between arrays and associative arrays.
arrays:
A=[0,1,4,9,16];
associative arrays:
B={a:'ha',b:27,c:30};
The former has a length, the latter does not. When I run this in a javascript shell, I get:
js>A=[0,1,4,9,16];
0,1,4,9,16
js>A instanceof Array
true
js>A.length
5
js>B={a:'ha',b:27,c:30};
[object Object]
js>B instanceof Array
false
js>B.length
js>
How arrays "work" in Javascript is implementation-dependent. (Firefox and Microsoft and Opera and Google Chrome would all use different methods) My guess is they (arrays, not associative arrays) use something like STL's std::vector. Your question:
how is memory allocated for an array
to still provide a O(1) access time
since the 'amount' of memory locations
is not specified beforehand ?
is more along the lines of how std::vector (or similar resizable arrays) works. It reallocates to a larger array as necessary. Insertions at the end take amortized O(1) time, namely if you insert N elements where N is large, the total time takes N*O(1). Those individual inserts where it does have to resize the array may take longer, but on the average it takes O(1).
Arrays in Javascript are "fake". They are implemented as hash maps. So in the worst case their access time is not O(1). They also need more memory and you can use any string as an array index. You think that's weird? It is.
As I understand, it's like this:
There are two different things in JavaScript: Arrays and Objects. They both act as hashtables, although the underlying implementation is specific to each runtime. The difference between the two is that an Array has an implicit length property, while an object does not. Otherwise you can use [] or . syntaxes for both of them. Yes, that means that objects can have numerical properties and arrays have string indices. No problem. Although the length property might not be what you expect when using such tricks or sparse arrays. You should rely on it only if the array is not sparse and indices start from 0.
As for the performance - sorry, it's not the O(1) you'd expect. As stated before - it's actually implementation specific. But in general case it's not possible to ensure that there will be O(1) performance for all operations in a hashtable. That said, I'd expect that decent implementations should have quite a few optimizations in place for standard cases, which would make the performance quite close to O(1) under most scenarios. But at any rate - storing huge volumes of data in JavaScript is not a wise idea.
It is the same in PHP. I came from a PHP/Javascript background and dimensionalizing arrays really got me when i moved onto other languages.
javascript has no real arrays.
Elements are allocated as you define them.
It is a flexible tool. You can use the for many purposes but as a general purpose tool is not as efficient as special purpose arrays.
As Jason said, unless it is explicitly specified by the ECMAScript standard (unlikely), it is implementation dependent. The article shown by Feet shows that IE's implementation was poor (until IE8?), which is confirmed by JavaScript loop performance.
Other JS engines probably takes a more pragmatic approach. For example, they can do like in Lua, having a true array part and an associative array part: if the array is dense, it lives in the true array (which can still be extended at the cost of re-allocation), and you can still have sparse high indices living in the associative part.
Thus you have the best of two worlds, speed of access to dense parts and low memory use for sparse parts.
You can even do:
var asocArray = {key: 'val', 0: 'one', 1: 'two'}
and
var array = []; array['key'] = 'val'; array[0] = 'one'; array[1] = 'two';
When looping, you can use them in the same way using a for i in object loop, instead of using that for the asocArray and using a for var i=0; i<array.length; i++ loop for "array". The only difference here is that if you're expecting the indices in "array" to be a typeof i === 'number' or (i).constructor === Number then it will only be in the "for var i=0; i<array.length; i++ loop"; the for i in object loop makes all the keys (even indices) a String.

Categories