What use cases are there in JavaScript for Sparse Arrays?

What use cases are there in JavaScript for Sparse Arrays? - javascript

What possible programming use could you have where a sparse array would be better than an (regular) object?
By sparse array I mean one where:
arr = []; //Initialize
arr[0] = 'W';
arr[1] = 'T';
arr[3] = 'F';
console.log(arr[0] !== undefined) //true
console.log(arr[1] !== undefined) //true
console.log(arr[2] === undefined) //true
console.log(arr[3] !== undefined) //true
Or more formally:
An object, O, is said to be sparse if the following algorithm returns true:
1. Let len be the result of calling the [[Get]] internal method of O with argument
"length".
2. For each integer i in the range 0≤i<ToUint32(len)
a. Let elem be the result of calling the [[GetOwnProperty]] internal method of O
with argument ToString(i).
b. If elem is undefined, return true.
3. Return false.
ECMA 262 5.1 - 15.4 Array Objects
Moreover, the ECMA 262 5.1 Standard further defines length specifically as:
The length property of this Array object is a data property whose value is always numerically greater than the name of every deletable property whose name is an array index.
So the example above, arr.length === 4 despite there only being three elements defined.
In fact, according to the standard, any Number greater than 3 is a valid length for arr, including Math.PI.
Consequently, does this mean that no one should use:
for(var i=0; i<arr.length; i++)
//Cannot trust arr[i] exists
and instead it would be more appropriate to use
for(key in arr)
//Always exists
I've never encountered an intentional one in the wild, and really only began thinking about it while reading an odd Q&A here, and now I'm a little unsettled.
I've long known that there's not a neat way to remove an element from an Array, but now I'm even more confused as to why you would intentionally leave a hole, let alone define a standard where length can be any number greater than the last defined element.
If I wanted random key value pairs, I'd use an Object. If I want to be able to cleanly iterate, I use an Array. Am I missing something?
Note, I'm looking for a specific use case, or a class of generalized use cases not a reference to the standards, or an opinion. I know it's allowed, and I already have an opinion. :)
To add a test or see some of the ones I've seen where the Array standard works in unexpected ways, check out this fiddle
Sorry if this is a bit abstract. Been thinking about it all night.

One possible use-case for sparse arrays that I've come across in real usage is for a heat-map.
Start with your map being an empty array of X × Y elements. Load your data, and populate it into the map by incrementing the array elements at the relevant co-ords.
Another similar example might be a battleship game, where boats are placed into an empty grid by populating the array elements at the appropriate co-ordinates.
That's not to say this is this only way to do this, or even the best way -- both examples can quite easily be achieved without using a sparse array -- but the question was asking for use cases, so there you go.

implementing the mathematical idea of 'sparse array' with JavaScript plain object ({}):
source={}
source[1000000] = 1; source[2000000]=2; source[3000000]=3
start = new Date();
target = (function(){
var ret = {};
Object.keys(source).forEach(function(key){
ret[key] = source[key]*source[key]
});
return ret
})();
end = new Date();
document.getElementById('runtime').textContent = end - start;
document.getElementById('result').textContent = JSON.stringify(target);
........................ result ..........................
<div id="result"></div>
...................... run time ..........................
<div id="runtime"></div>
implementing the mathematical idea of 'sparse array' with JavaScript array ([]):
source=[]
source[1000000] = 1; source[2000000]=2; source[3000000]=3
start = new Date();
target = source.map(function(u){ return u*u; });
end = new Date();
document.getElementById('runtime').textContent = end - start;
//document.getElementById('result').textContent = JSON.stringify(target);
console.log(target)
........................ result ..........................
<div id="result">see console</div>
...................... run time ..........................
<div id="runtime"></div>
The runtime is show to be zero in the first snippet (below what can be properly measured unless you run it repeatedly). In the 2nd snippet the runtime is about 100 milliseconds on my machine.
Would you like to "pay" for a little less typing with such differences in runtime? And if you don't do the things that have this horrible runtime (do you know what they are??) what remains then of the saved typing? the answer to the last question is: nothing.
{} is the correct choice. [] is the wrong choice. IMHO.
related question

Related

How does native sort method deal with sparse arrays

When a sparse array is sorted [5, 2, 4, , 1].sort() -> [1, 2, 4, 5, empty], the empty value is always last even with callback no matter the return statement.
I'm building my own sort method as a challenge and I solved this problem by using filter method since filter skips empty values. Then iterate over filtered array and set original array's index to filtered array's values. Then I shorten the original array's length since the remaining items will be duplicates, and I can finally feed it in my sorting algorithm. Once that's done, then I set it's length back to original which adds appropriate amount of empty items at the end. Here's a snippet of code, but here's a link of the entire code
const collectUndefined = [];
// remove empty items and collect undefined
const removeSparse = this.filter(el => {
if (el === undefined) {
collectUndefined.push(el);
}
return el !== undefined;
});
const tempLength = this.length;
// reset values but will contain duplicates at the end
for (let i = 0; i < removeSparse.length; i++) {
this[i] = removeSparse[i];
}
// shorten length which will remove extra duplicates
this.length = removeSparse.length;
// sort algorithm ...
// place undefineds back into the array at the end
this.push(...collectUndefined);
// restores original length and add empty elemnts at the end
this.length = tempLength;
return this
Is the native sort implemented in this similar fashion when dealing with sparse arrays, or no.

When it comes to implementation of Array.sort you have to also ask which engine? They are not all equal in terms of how they end up getting to the final sorted version of the array. For example V8 has a pre-processing and post-processing step before it does any sorting:
V8 has one pre-processing step before it actually sorts anything and
also one post-processing step. The basic idea is to collect all
non-undefined values into a temporary list, sort this temporary list
and then write the sorted values back into the actual array or object.
This frees V8 from caring about interacting with accessors or the
prototype chain during the sorting itself.
You can find pretty detailed explanation of the entire process V8 goes through here
The actual source code for the V8 sort (using Timsort) can be found here and is now in Torque language.
The js tests for V8 Array.sort can be seen here
Bottom line however is that nothing is actually removed from the original array since it should not be. Sort is not supposed to mutate the original array. That would be super weird if you call myArray.sort() and all of a sudden it has 5 elements less from its 8 total (for example). That is not something you would find in any Array.sort specs.
Also Array.sort pays close attention to the types it sorts and orders them specifically. Example:
let arr = [4,2,5,,,,3,false,{},undefined,null,0,function(){},[]]
console.log(arr.sort())
Notice in the output above how array is first, followed by numeric values, object literal, Boolean, function, null and then undefined / empty. So if you want to really match the spec you would have to consider how different types are also sorted.

Accessing Array beyond its size in javascript

I read in a particular book that an array in JavaScript can hold 4,294,967,295 items and would throw exception if the number reaches beyond that.
I tried out the functionality using the following code:
var a = ["a","b","c"];
a[4294967300] = "d";
console.log(a[4294967300]);
It shows the output "d" and no exception or error. Am I missing something here? Can someone put some light on the topic and share some knowledge regarding max array items in JavaScript and various scenarios related to it?

An array doesn't have to hold all the items from 0 to N to contain one with index N.
That's because arrays in JavaScript engines can switch to a dictionnary mode when the holes are too big, those arrays are called sparse arrays (vs dense arrays).
It's important to know this distinction because the implementation is leaking on one point : performance. You should read this on this topic : http://www.html5rocks.com/en/tutorials/speed/v8/
But regarding indexes starting at 2³², sebcap26 is right, there's a distinction due to the fact the index is handled as a string. This distinction is important and can be verified by logging a.length : you'll see the length isn't impacted by such an element. There's no exception or error per se but it makes it impossible to use normal array operations like iterating up to the length or using array functions like map or filter (the elements with index greater than the numeric index limit are ignored by those functions).

If I understand well the ECMAScript specifications, an index which is not in [0 .. 2^32-1] is converted into a String and used as an Object key, not as an Array index.
A property name P (in the form of a String value) is an array index if and only if ToString(ToUint32(P)) is equal to P and ToUint32(P) is not equal to 2^32−1.

Try running this code: fiddle : http://jsfiddle.net/vXtfE/
var a = ["a","b","c"];
a[4294967300] = "d";
console.log(a.length);
console.log(a);
console.log(a[4294967300]);
You will see this output:
3
["a", "b", "c", 4294967300: "d"]
d
The initial items get stored as array elements, but for large index, the storage changes to hash based sparse array. Hence, it is a mix of both in your case.
Good explanation of this :
Why is array.push sometimes faster than array[n] = value?

What are the speed diferences between object's property access and normal variable access?

Before fully defining my question, I must say that >>this question/answer<< doesn't answer my problem, and I have proven it to myself that the given answer doesn't match at all with the actual effect of property vs. variable or cached property (see below).
I have been using HTML5 canvas, and I write raw pixel blocks many times in a second in a 640x480 area.
As advised by some tutorials, it is good to cache the .data property of an ImageData variable (in this case, it would be _SCimgData).
If I cache that property in SC_IMG_DATA, I can putImageData repeatedly in the Canvas with no problem; but if I repeatedly access it directly with _ScimgData.data, the slow-down of the code is noticieable (taking nearly 1 second to fill a single 640x480 Canvas):
var SomeCanvas = document.getElementById("SomeCanvas");
var SCContext = SomeCanvas.getContext("2d");
var _SCimgData = SomeCanvas.getImageData(0, 0, 640, 400);
var SC_IMG_DATA = _SCimgData.data;
Now I have the following doubt:
Would my code be as slow for other kinds of similar accesses?
I need an array of objects for a set of functions that can have several "instances" of an object (created by a regular utility function), and that need the index of the instance in an array of objects, either to create/initialize it, or to update its properties.
My concrete example is this:
var objArray=new Array();
var objArray[0]=new Object();
objArray[0].property1="some string property";
for(var x=0; x<65536; x++)
doSomething(objArray[0].property1, objIDX=0);
Would that code become as unacceptably slow as in the Canvas case, if the properties and functions contained in some properties are called very intensively (several times in a single milisecond, of course using setInterval and several "timer threads" to avoid locking the browser)?
If so, what other alternative is there to speed up access for the different properties of several objects in the main object array?
EDIT 1 (2012-08-27)
Thanks for the suggestions. I have up-voted them since I suspect they will be useful for the project I'm working on.
I am thinking in a combination of methods, using mainly Arrays instead of Objects to build an actual array of "base objects", and addressing array elements by numbers (arr[0]) instead of string array keys (arr["zero"]).
var OBJECTS_SIZE=10
var Obj_Instances=new Array();
Obj_Instances[0]="property or array 1 of Object 0";
Obj_Instances[1]=new Array();
Obj_Instances[1][0]=new ArrayBuffer(128);
Obj_Instances[1][1]=new DataView(Obj_Instances[1][0]);
Obj_Instances[2]="property or array 3 of Object 0";
Obj_Instances[3]=function(){alert("some function here")};
Obj_Instances[4]="property or array 5 of Object 0";
Obj_Instances[5]="property or array 6 of Object 0";
Obj_Instances[6]=3;
Obj_Instances[7]="property or array 8 of Object 0";
Obj_Instances[8]="property or array 9 of Object 0";
Obj_Instances[9]="property or array 10 of Object 0";
Obj_Instances[10]="property or array 1 of Object 1";
Obj_Instances[11]=new Array();
Obj_Instances[11][0]=new ArrayBuffer(128);
Obj_Instances[11][1]=new DataView(Obj_Instances[11][0]);
Obj_Instances[12]="property or array 3 of Object 1";
Obj_Instances[13]=function(){alert("some function there")};
Obj_Instances[14]="property or array 5 of Object 1";
Obj_Instances[15]="property or array 6 of Object 1";
Obj_Instances[16]=3;
Obj_Instances[17]="property or array 8 of Object 1";
Obj_Instances[18]="property or array 9 of Object 1";
Obj_Instances[19]="property or array 10 of Object 1";
function do_Something_To_Property_Number_6(objIdx)
{
//Fix the index to locate the base address
//of the object instance:
///
objIdx=(objIdx*OBJECTS_SIZE);
Obj_instances[objIdx+6]++; //Point to "Property" 6 of that object
}
I would have, say an "instance" of an "object" that takes up the first 10 array elements; the next "instance" would take the next 10 array elements, and so on (creating the initialization in a custom "constructor" function to add the new block of array elements).
I will also try to use jsPerf and JSHint to see which combination result better.

To answer your "doubts", I suggest using JSPerf to benchmark your code. One can't really tell by code alone if the procedure is faster than another unless tested.
Also, I suggest you use the literal notation for arrays and objects instead of the new notation during construction:
var objArray=[
{
property : 'some string property'
}, {
...
},
];
Also, based on your code, it's better to have this since you are using the same object per iteration:
var obj = objArray[0].property1,
objIDX = 0;
for(var x=0; x<65536; x++){
doSomething(obj,objIDX);
}

I realise this is not quite answering your question (as it has already been answered), however as you seem to be looking for speed improvements in regard to function calls that happen thousands of times (as others who find this might also be doing). I thought I'd include this here as it goes against assumptions:
An example function:
var go = function (a,b,c,d,e,f,g,h) {
return a+b+c+d+e+f+g+h;
}
The following is how you would normally call a repetitive function:
var i=500000; while(i--){
go(1,2,3,4,5,6,7,8);
}
However, if none (or a few) of those arguments ever change for this particular usage of the function, then it's far better to do this (from a speed pov - obviously not an asynchronous pov):
var i=500000; go.args = [1,2,3,4,5,6,7,8];
while(i--){
go();
}
In order for the above to work you only need a slight modification to the original function:
var go = function (a,b,c,d,e,f,g,h, i) {
if ( go.args ) {
i = go.args;
a = i[0]; b = i[1];
c = i[2]; d = i[3];
e = i[4]; f = i[5];
g = i[6]; h = i[7];
}
return a+b+c+d+e+f+g+h;
}
This second function runs significantly faster because you are not passing in any arguments (a function called with no args is very quick to initiate). Pulling the values from the .args array doesn't seem to be that costly either (unless you involve strings). Even if you update one or two of the args it's still far faster, which makes it perfect for pixel or imagedata manipulations because you are normally only shifting x & y:
var i=500000; go.args = [1,2,3,4,5,6,7,8];
while(i--){
go.args[2] = i;
go();
}
So in a way this is an example of where an object property can be faster than local vars - if a little convoluted and off topic ;)

Possible browser optimizations notwithstanding, accessing a property of an object is more expensive than accessing a local variable (but not necessarily a global variable or a variable of a parent function).
The deeper the property, the more of a performance hit you take. In other words,
for(var x=0; x<65536; x++)
doSomething(objArray[0].property1, objIDX=0);
would be improved by caching objArray[0].property1, and not repeatedly assigning to objIDX:
var prop = objArray[0].property1;
objIDX = 0;
for(var x=0; x<65536; x++)
doSomething(prop, 0);

What values will make this function crash?

I am thinking to use the following function:
function delDups(arr){
var out=[],obj={};
for(var i=0,len=arr.length;i<len;i++){
obj[arr[i]]=0;
}
for(i in obj){
out.push(i);
}
return out;
}
The function is slightly modified, original at be found here
However, I am sure there are some values that will make it crash, and I want to know exactly what values will (so I can do soemthing about it)

Well, if arr isn't an array-like object (e.g. with a length and indexed properties) it will crash.
It may however also not do what you expect whenever the data in arr isn't an array of strings. In that case you'd get an array back with strings only, the original objects and their data type will be lost. But it will not crash...

Crashing is not the only inconvenient outcome from executing code. Also of (perhaps greater) interest are cases where the code runs but returns an incorrect result. Your analysis might be something like:
for(var i=0,len=arr.length;i<len;i++){
In the above, it is assumed that the value of arr.length is a numeric value greater than or equal to zero. If arr does not have a length property, or its value isn't a non-negative number, the for loop will not behave as expected (e.g. it may result in an error or an infinite loop).
obj[arr[i]]=0;
In this line, the result of evaluating arr[i] is used as a property name, so wherever that expression returns something that isn't suitable as a property name, you will get an error, e.g. if arr[i] is an ActiveX object, you can expect unexpected outcomes. If it's a native Object the value will be the result of calling its toString method, which might provide the same value for different objects, or an error, or "just work".
for(i in obj){
will iterate over all enumerable properties of obj, including those it inherits. If an enumerable property is added to Object.prototype, it will turn up in the loop so it's common to use a hasOwnProperty test to filter out inherited properties.
How much you test for errors depends on the environment you expect the code to be used in. If you have reasonable control and have documented the values that are expected to be passed to the function (e.g. array of primitive values) then it is reasonable to do minimal (or no) testing of input values. If someone passes in an ActiveX object instead of an array and it goes belly up, you respond with "RTFM".
On the other hand, if it is known that the code will be used in a library in uncontrolled and widely varying situations, testing that the input has a non-negative, numeric length property seems sensible, as does adding a hasOwnProperty test to the for..in loop.
How much time and effort you put into making your code robust is a function of where you expect it to run, but adding some sensible and obvious checks up front may well save some grief later. So I'd do something like:
function delDups(arr) {
var out=[],obj={};
var len = arr && arr.length;
if (len && len > 0) {
for(var i=0; i<len; i++) {
obj[arr[i]] = 0;
for (i in obj) {
if (obj.hasOwnProperty(i)) {
out.push(i);
}
}
}
return out;
}

Woohoo! I crashed it, where is my prize?
var arr3 = delDups(eval());

Using the push method or .length when adding to array?

What are the downsides to doing:
var myArray = [];
myArray[myArray.length] = val1;
myArray[myArray.length] = val2;
instead of:
var myArray = [];
myArray.push(val1);
myArray.push(val2);
I'm sure the push method is much more "acceptable", but are there any differences in functionality?

push is way faster, almost 300% faster.
Proof: http://jsperf.com/push-vs-length-test

Since arrays in JavaScript do not have holes the functionality of those two methods is equal. And yes, using .push() is much cleaner (and shorter).

I've generally thought length assignment was faster. Just found Index vs. push performance which backs that up; for my Chrome 14 browser anyway, over a single test run. However there is not much in it in Chrome.

There seems to be discrepancy on which test is faster among the varying JavaScript engines. The differences in speed may be negligible (unless an unholy amount of pushes are needed). In that case, the prudent developer should always err on the side of readability. In this case, in my opinion and the opinion of #TheifMaster is that [].push() is cleaner and it is easier to read. Maintenance of code is the most expensive part of coding.

As I tested, the first way is faster, I'm not sure why, keep researching. Also the ECMA doesn't mentioned which one is better, I think it is depending on how the browser vendor implements this.
var b = new Array();
var bd1 = new Date().getTime();
for(var i =0;i<1000000; i++){
b[b.length] = i;
};
alert(new Date().getTime()- bd1);
var a = new Array();
var ad1 = new Date().getTime();
for(var i =0;i<1000000; i++){
a.push(i);
};
alert(new Date().getTime()- ad1);

In JS there are 3 different ways you can add an element to the end of an array. All three have their different use cases.
1) a.push(v), a.push(v1,v2,v3), a.push(...[1,2,3,4]), a.push(..."test")
Push is not a very well thought function in JS. It returns the length of the resulting array. How silly. So you can never chain push() in functional programming unless you want to return the length at the very end. It should have returned a reference to the object it's called upon. I mean then it would still be possible to get the length if needed like a.push(..."idiot").length. Forget about push if you have intentions to do something functional.
2) a[a.length] = "something"
This is the biggest rival of a.push("something"). People fight over this. To me the only two differences are that
This one returns the value added to the end of the array
Only accepts single value. It's not as clever as push.
You shall use it if the returned value is of use to you.
3. a.concat(v), a.concat(v1,v2,v3), a.concat(...[1,2,3,4]), a.concat([1,2,3,4])
Concat is unbelievably handy. You can use it exactly like push. If you pass the arguments in array it will spread them to the end of the array it's called upon. If you pass them as separate arguments it will still do the same like a = a.concat([1,2,3],4,5,6); //returns [1, 2, 3, 4, 5, 6] However don't do this.. not so reliable. It's better to pass all arguments in an array literal.
Best thing with concat is it will return a reference to the resulting array. So it's perfect to use it in functional programming and chaining.
Array.prototype.concat() is my preference.
4) A new push() proposal
Actually one other thing you can do is to overwrite the Array.prototype.push() function like;
Array.prototype.push = function(...args) {
return args.reduce(function(p,c) {
p[p.length] = c;
return p
}, this)
};
so that it perfectly returns a reference to the array it's called upon.

I have an updated benchmark here: jsbench.me
Feel free to check which is faster for your current engine. arr[arr.length] was about 40% faster than arr.push() on Chromium 86.

We Keep Coding

JavaScript is the programming language of the Web.

What use cases are there in JavaScript for Sparse Arrays? - javascript

Related

How does native sort method deal with sparse arrays

Accessing Array beyond its size in javascript

What are the speed diferences between object's property access and normal variable access?

What values will make this function crash?

Using the push method or .length when adding to array?

Categories

Resources