Javascript: What's the algorithmic performance of 'splice'?

Javascript: What's the algorithmic performance of 'splice'? - javascript

That is, would I be better suited to use some kind of tree or skip list data structure if I need to be calling this function a lot for individual array insertions?

You might consider whether you want to use an object instead; all JavaScript objects (including Array instances) are (highly-optimized) sets of key/value pairs with an optional prototype An implementation should (note I don't say "does") have a reasonable performance hashing algorithm. (Update: That was in 2010. Here in 2018, objects are highly optimized on all significant JavaScript engines.)
Aside from that, the performance of splice is going to vary a lot between implementations (e.g., vendors). This is one reason why "don't optimize prematurely" is even more appropriate advice for JavaScript applications that will run in multiple vendor implementations (web apps, for instance) than it is even for normal programming. Keep your code well modularized and address performance issues if and when they occur.

Here's a good rule of thumb, based on tests done in Chrome, Safari and Firefox: Splicing a single value into the middle of an array is roughly half as fast as pushing/shifting a value to one end of the array. (Note: Only tested on an array of size 10,000.)
http://jsperf.com/splicing-a-single-value
That's pretty fast. So, it's unlikely that you need to go so far as to implement another data structure in order to squeeze more performance out.
Update: As eBusiness points out in the comments below, the test performs an expensive copy operation along with each splice, push, and shift, which means that it understates the difference in performance. Here's a revised test that avoids the array copying, so it should be much more accurate: http://jsperf.com/splicing-a-single-value/19

Move single value
// tmp = arr[1][i];
// arr[1].splice(i, 1); // splice is slow in FF
// arr[1].splice(end0_1, 0, tmp);
tmp = arr[1][i];
ii = i;
while (ii<end0_1)
{
arr[1][ii] = arr[1][++ii];
cycles++;
}
arr[1][end0_1] = tmp;

Related

Why does Math.random() (in Chrome) allocate memory that needs cleanup by the Garbage Collector (gc)?

Story
During some tests for a performance critical code, I observed a side-effect of Math.random() that I do not understand. I am looking for
some deep technical explanation
a falsification for my test (or expectation)
link to a V8 problem/bug ticket
Problem
It looks like that calling Math.random() allocates some memory that needs to be cleaned up by the Gargabe Collector (gc).
Test: With Math.random()
const numberOfWrites = 100;
const obj = {
value: 0
};
let i = 0;
function test() {
for(i = 0; i < numberOfWrites; i++) {
obj.value = Math.random();
}
}
window.addEventListener('DOMContentLoaded', () => {
setInterval(() => {
test();
}, 10);
});
Observation 1: Chrome profile
Chrome: 95.0.463869, Windows 10, Edge: 95.0.1020.40
Running this code in the browser and record a perfromance profile will result in a classic memory zig-zag
Memory profile of Math.random() test
Obersation 2: Firefox
Firefox Developer: 95, Windows 10
No Garbage Collection (CC/GCMinor) detected - memory quite linear
Workarounds
crypto.getRandomValues()
Replace Math.random() with a large enough array of pre-calculated random numbers using self.crypto.getRandomValues`.

(V8 developer here.)
Yes, this is expected. It's a (pretty fundamental) design decision, not a bug, and not strictly related to Math.random(). V8 "boxes" floating-point numbers as objects on the heap. That's because it uses 32 bits per field in an object, which obviously isn't enough for a 64-bit double, and a layer of indirection solves that.
There are a number of special cases where this boxing can be avoided:
in optimized code, for values that never leave the current function.
for numbers whose values are sufficiently small integers ("Smis", signed 31-bit integer range).
for elements in arrays that have only ever seen numbers as elements (e.g. [1, 2.5, NaN], but not [1, true, "hello"]).
possibly other cases that I'm not thinking of right now. Also, all these internal details can (and do!) change over time.
Firefox uses a fundamentally different technique for storing internal references. The benefit is that it avoids having to box numbers, the drawback is that it uses more memory for things that aren't numbers. Neither approach is strictly better than the other, it's just a different tradeoff.
Generally speaking you're not supposed to have to worry about this, it's just your JavaScript engine doing its thing :-)
Problem: Running this code in the browser and record a performance profile will result in a classic memory zig-zag
Why is that a problem? That's how garbage-collected memory works. (Also, just to put things in perspective: the GC only spends ~0.3ms every ~8s in your profile.)
Workaround: Replace Math.random() with a large enough array of pre-calculated random numbers using self.crypto.getRandomValues`.
Replacing tiny short-lived HeapNumbers with a big and long-lived array doesn't sound like a great way to save memory.
If it really matters, one way to avoid boxing of numbers is to store them in arrays instead of as object properties. But before going through hard-to-maintain contortions in your code, be sure to measure whether it really matters for your app. It's easy to demonstrates huge effects in a microbenchmark, it's rare to see it have much impact in real apps.

javascript performance boost using pointer?

Let's say I have a javascript object.
var hash = { a : [ ] };
Now I want to edit the array hash.a 2 ways: first by accessing hash.a every time, second by making a pointer var arr = hash.a to store hash.a's memory address. Is second way faster, or they are the same.
Example:
// first way
hash.a.push(1);
hash.a.push(2);
hash.a.push(3);
hash.a.push(4);
hash.a.push(5);
//second way
var arr = hash.a;
arr.push(1);
arr.push(2);
arr.push(3);
arr.push(4);
arr.push(5);
Thanks a lot!

I don't think there would be any real performance gain, and if there is a slight gain it isn't worth it as you are hampering legibility and maintainability of code + use more memory (creation and garbage collection of another variable arr, also more chances for memory leaks too if you don't handle it properly). I wouldn't recommend it.
In a typical software project, only 20% of the time is development, rest 80% is testing and maintaining it.

You're doing the compiler's job in this situation - any modern compiler will optimize this code when interpreting it so both cases should be the same in terms of performance.
Even if these two cases weren't optimized by the compiler, the performance gains would be negligible. Focus on making your code as readable as possible, and let compiler handle these referencing optimizations.

Why is a frozen "enum" slower?

In order to access the data in an array I created an enum-like variable to have human readable identifiers to the fields.
var columns = { first: 0, second: 1 };
var array = ['first', 'second'];
var data = array[columns.first];
When I found out about Object.freeze I wanted to use this for the enum so that it can not be changed, and I expected the VM to use this info to its advantage.
As it turns out, the tests get slower on Chrome and Node, but slightly faster on Firefox (compared to direct access by number).
The code is available here: http://jsperf.com/array-access-via-enum
Here are the benchmarks from Node (corresponding to the JSPerf tests):
fixed Number: 12ms
enum: 12ms
frozenEnum: 85ms
Does V8 just not yet have a great implementation, or is there something suboptimal with this approach for my use-case?

I tried your test in Firefox 20, which is massively faster across the board, and IE 10 which slightly faster and more consistant.
So my answer is No, V8 does not yet have a great implementation

According to this bugreport, freezing an object currently puts it in "dictionary-mode", which is slow.
So instead of improving the performance, it becomes a definite slowdown for "enums"/small arrays.

How can I reliably subsort arrays using DOM methods?

UP-FRONT NOTE: I am not using jQuery or another library here because I want to understand what I’ve written and why it works (or doesn’t), so please don’t answer this with libraries or plugins for libraries. I have nothing against libraries, but for this project they’re inimical to my programming goals.
That said…
Over at http://meyerweb.com/eric/css/colors/ I added some column sorting using DOM functions I wrote myself. The problem is that while it works great for, say, the simple case of alphabetizing strings, the results are inconsistent across browsers when I try to sort on multiple numeric terms—in effect, when I try to do a sort with two subsorts.
For example, if you click “Decimal RGB” a few times in Safari or Firefox on OS X, you get the results I intended. Do the same in Chrome or Opera (again, OS X) and you get very different results. Yes, Safari and Chrome diverge here.
Here’s a snippet of the JS I’m using for the RGB sort:
sorter.sort(function(a,b){
return a.blue - b.blue;
});
sorter.sort(function(a,b){
return a.green - b.green;
});
sorter.sort(function(a,b){
return a.red - b.red;
});
(sorter being the array I’m trying to sort.)
The sort is done in the tradition of another StackOverflow question “How does one sort a multi dimensional array by multiple columns in JavaScript?” and its top answer. Yet the results are not what I expected in two of the four browsers I initially tried out.
I sort (ha!) of get that this has to do with array sorts being “unstable”—no argument here!—but what I don’t know is how to overcome it in a consistent, reliable manner. I could really use some help both understanding the problem and seeing the solution, or at least a generic description of the solution.
I realize there are probably six million ways to optimize the rest of the JS (yes, I used a global). I’m still a JS novice and trying to correct that through practice. Right now, it’s array sorting that’s got me confused, and I could use some help with that piece of the script before moving on to cleaning up the code elsewhere. Thanks in advance!
UPDATE
In addition to the great explanations and suggestions below, I got a line on an even more compact solution:
function rgbSort(a,b) {
return (a.red - b.red || a.green - b.green || a.blue - b.blue);
}
Even though I don’t quite understand it yet, I think I’m beginning to grasp its outlines and it’s what I’m using now. Thanks to everyone for your help!

OK. So, as you've discovered, your problem is that the default JavaScript sort is not guaranteed to be stable. Specifically, I think that in your mind it works like this: I'll sort by blueness, and then when I sort by greenness the sorter will just move entries in my array up and down but keep them ordered by blueness. Sadly, the universe is not so conveniently arranged; the built-in JS sort is allowed to do the sort how it likes. In particular, it's allowed to just throw the contents of the array into a big bucket and then pull them out sorted by what you asked for, completely ignoring how it was arranged before, and it looks like at least some browsers do precisely that.
There are a couple of ways around this, for your particular example. Firstly, you could still do the sort in three separate calls, but make sure those calls do the sort stably: this would mean that after sorting by blueness, you'd stably sort by greenness and that would give you an array sorted by greenness and in blueness order within that (i.e., precisely what you're looking for). My sorttable library does this by implementing a "shaker sort" or "cocktail sort" method (http://en.wikipedia.org/wiki/Cocktail_sort); essentially, this style of sorting walks through the list a lot and moves items up and down. (In particular, what it does not do is just throw all the list items into a bucket and pull them back out in order.) There's a nice little graphic on the Wikipedia article. This means that "subsorts" stay sorted -- i.e., that the sort is stable, and that will give you what you want.
However, for this use case, I wouldn't worry about doing the sort in three different calls and ensuring that they're stable and all that; instead, I'd do all the sorting in one go. We can think of an rgb colour indicator (255, 192, 80) as actually being a big number in some strange base: to avoid too much math, imagine it's in base 1000 (if that phrase makes no sense, ignore it; just think of this as converting the whole rgb attribute into one number encompassing all of it, a bit like how CSS computes precedences in the cascade). So that number could be thought of as actually 255,192,080. If you compute this number for each of your rows and then sort by this number, it'll all work out, and you'll only have to do the sort once: so instead of doing three sorts, you could do one: sorter.sort(function(a,b) { return (a.red*1000000 + a.green*1000 + a.blue) - (b.red*1000000 + b.green*1000 + b.blue) } and it'll all work out.
Technically, this is slightly inefficient, because you have to compute that "base 1000 number" every time that your sort function is called, which may be (is very likely to be) more than once per row. If that's a big problem (which you can work out by benchmarking it), then you can use a Schwartzian transform (sorry for all the buzzwords here): basically, you work out the base-1000-number for each row once, put them all in a list, sort the list, and then go through the sorted list. So, create a list which looks like [ [255192080, <table row 1>], [255255255, <table row 2>], [192000000, <table row 3>] ], sort that list (with a function like mylist.sort(function(a,b) { return a[0]-b[0]; })), and then walk through that list and appendChild each of the s onto the table, which will sort the whole table in order. You probably don't need this last paragraph for the table you've got, but it may be useful and it certainly doesn't hurt to know about this trick, which sorttable.js also uses.

I would approach this problem in a different manner. It appears you're trying to reconstruct all the data by extracting it from the markup, which can be a perilous task; a more straightforward approach would be to represent all the data you want to render out to the page in a format your programs can understand from the start, and then simply regenerate the markup first on page load and then on each subsequent sort.
For instance:
var colorsData = [
{
keyword: 'mediumspringgreen',
decimalrgb: {
r: 0,
g: 250,
b: 154
},
percentrgb: {
r: 0,
g: 98,
b: 60.4
},
hsl: {
h: 157,
s: 100,
l: 49
}
hex: '00FA9A',
shorthex: undefined
},
{
//next color...
}
];
That way, you can run sorts on this array in whatever way you'd like, and you're not trying to rip data out from markup and split it and reassign it and all that.
But really, it seems you're maybe hung up on the sort functions. Running multiple sorts one after the other will get unintended results; you have to run a single sort function that compares the next 'column' in the case the previous one is found to be equal. An RGB sort could look like:
var decimalRgbForwards = function(a,b) {
var a = a.decimalrgb,
b = b.decimalrgb;
if ( a.r === b.r ) {
if ( a.g === b.g ) {
return a.b - b.b;
} else {
return a.g - b.g;
}
} else {
return a.r - b.r;
}
};
So two colors with matching r and g values would return for equality on the b value, which is just what you're looking for.
Then, you can apply the sort:
colorsData.sort(decimalRgbForwards);
..and finally iterate through that array to rebuild the markup inside the table.
Hope it helps, sir-

What is the performance of Objects/Arrays in JavaScript? (specifically for Google V8)

Performance associated with Arrays and Objects in JavaScript (especially Google V8) would be very interesting to document. I find no comprehensive article on this topic anywhere on the Internet.
I understand that some Objects use classes as their underlying data structure. If there are a lot of properties, it is sometimes treated as a hash table?
I also understand that Arrays are sometimes treated like C++ Arrays (i.e. fast random indexing, slow deletion and resizing). And, other times, they are treated more like Objects (fast indexing, fast insertion/removal, more memory). And, maybe sometimes they are stored as linked lists (i.e. slow random indexing, fast removal/insertion at the beginning/end)
What is the precise performance of Array/Object retrievals and manipulations in JavaScript? (specifically for Google V8)
More specifically, what it the performance impact of:
Adding a property to an Object
Removing a property from an Object
Indexing a property in an Object
Adding an item to an Array
Removing an item from an Array
Indexing an item in an Array
Calling Array.pop()
Calling Array.push()
Calling Array.shift()
Calling Array.unshift()
Calling Array.slice()
Any articles or links for more details would be appreciated, as well. :)
EDIT: I am really wondering how JavaScript arrays and objects work under the hood. Also, in what context does the V8 engine "know" to "switch-over" to another data structure?
For example, suppose I create an array with...
var arr = [];
arr[10000000] = 20;
arr.push(21);
What's really going on here?
Or... what about this...???
var arr = [];
//Add lots of items
for(var i = 0; i < 1000000; i++)
arr[i] = Math.random();
//Now I use it like a queue...
for(var i = 0; i < arr.length; i++)
{
var item = arr[i].shift();
//Do something with item...
}
For conventional arrays, the performance would be terrible; whereas, if a LinkedList was used... not so bad.

I created a test suite, precisely to explore these issues (and more) (archived copy).
And in that sense, you can see the performance issues in this 50+ test case tester (it will take a long time).
Also as its name suggest, it explores the usage of using the native linked list nature of the DOM structure.
(Currently down, rebuilt in progress) More details on my blog regarding this.
The summary is as followed
V8 Array is Fast, VERY FAST
Array push / pop / shift is ~approx 20x+ faster than any object equivalent.
Surprisingly Array.shift() is fast ~approx 6x slower than an array pop, but is ~approx 100x faster than an object attribute deletion.
Amusingly, Array.push( data ); is faster than Array[nextIndex] = data by almost 20 (dynamic array) to 10 (fixed array) times over.
Array.unshift(data) is slower as expected, and is ~approx 5x slower than a new property adding.
Nulling the value array[index] = null is faster than deleting it delete array[index] (undefined) in an array by ~approx 4x++ faster.
Surprisingly Nulling a value in an object is obj[attr] = null ~approx 2x slower than just deleting the attribute delete obj[attr]
Unsurprisingly, mid array Array.splice(index,0,data) is slow, very slow.
Surprisingly, Array.splice(index,1,data) has been optimized (no length change) and is 100x faster than just splice Array.splice(index,0,data)
unsurprisingly, the divLinkedList is inferior to an array on all sectors, except dll.splice(index,1) removal (Where it broke the test system).
BIGGEST SURPRISE of it all [as jjrv pointed out], V8 array writes are slightly faster than V8 reads =O
Note: These metrics applies only to large array/objects which v8 does not "entirely optimise out". There can be very isolated optimised performance cases for array/object size less then an arbitrary size (24?). More details can be seen extensively across several google IO videos.
Note 2: These wonderful performance results are not shared across browsers, especially
*cough* IE. Also the test is huge, hence I yet to fully analyze and evaluate the results : please edit it in =)
Updated Note (dec 2012): Google representatives have videos on youtubes describing the inner workings of chrome itself (like when it switches from a linkedlist array to a fixed array, etc), and how to optimize them. See GDC 2012: From Console to Chrome for more.

At a basic level that stays within the realms of JavaScript, properties on objects are much more complex entities. You can create properties with setters/getters, with differing enumerability, writability, and configurability. An item in an array isn't able to be customized in this way: it either exists or it doesn't. At the underlying engine level this allows for a lot more optimization in terms of organizing the memory that represents the structure.
In terms of identifying an array from an object (dictionary), JS engines have always made explicit lines between the two. That's why there's a multitude of articles on methods of trying to make a semi-fake Array-like object that behaves like one but allows other functionality. The reason this separation even exists is because the JS engines themselves store the two differently.
Properties can be stored on an array object but this simply demonstrates how JavaScript insists on making everything an object. The indexed values in an array are stored differently from any properties you decide to set on the array object that represents the underlying array data.
Whenever you're using a legit array object and using one of the standard methods of manipulating that array you're going to be hitting the underlying array data. In V8 specifically, these are essentially the same as a C++ array so those rules will apply. If for some reason you're working with an array that the engine isn't able to determine with confidence is an array, then you're on much shakier ground. With recent versions of V8 there's more room to work though. For example, it's possible to create a class that has Array.prototype as its prototype and still gain efficient access to the various native array manipulation methods. But this is a recent change.
Specific links to recent changes to array manipulation may come in handy here:
http://code.google.com/p/v8/source/detail?r=10024
http://code.google.com/p/v8/source/detail?r=9849
http://code.google.com/p/v8/source/detail?r=9747
As a bit of extra, here's Array Pop and Array Push directly from V8's source, both implemented in JS itself:
function ArrayPop() {
if (IS_NULL_OR_UNDEFINED(this) && !IS_UNDETECTABLE(this)) {
throw MakeTypeError("called_on_null_or_undefined",
["Array.prototype.pop"]);
}
var n = TO_UINT32(this.length);
if (n == 0) {
this.length = n;
return;
}
n--;
var value = this[n];
this.length = n;
delete this[n];
return value;
}
function ArrayPush() {
if (IS_NULL_OR_UNDEFINED(this) && !IS_UNDETECTABLE(this)) {
throw MakeTypeError("called_on_null_or_undefined",
["Array.prototype.push"]);
}
var n = TO_UINT32(this.length);
var m = %_ArgumentsLength();
for (var i = 0; i < m; i++) {
this[i+n] = %_Arguments(i);
}
this.length = n + m;
return this.length;
}

I'd like to complement existing answers with an investigation to the question of how implementations behave regarding growing arrays: If they implement them the "usual" way, one would see many quick pushes with rare, interspersed slow pushes at which point the implementation copies the internal representation of the array from one buffer to a larger one.
You can see this effect very nicely, this is from Chrome:
16: 4ms
40: 8ms 2.5
76: 20ms 1.9
130: 31ms 1.7105263157894737
211: 14ms 1.623076923076923
332: 55ms 1.5734597156398105
514: 44ms 1.5481927710843373
787: 61ms 1.5311284046692606
1196: 138ms 1.5196950444726811
1810: 139ms 1.5133779264214047
2731: 299ms 1.5088397790055248
4112: 341ms 1.5056755767118273
6184: 681ms 1.5038910505836576
9292: 1324ms 1.5025873221216042
Even though each push is profiled, the output contains only those that take time above a certain threshold. For each test I customized the threshold to exclude all the pushes that appear to be representing the fast pushes.
So the first number represents which element has been inserted (the first line is for the 17th element), the second is how long it took (for many arrays the benchmark is done for in parallel), and the last value is the division of the first number by that of the one in the former line.
All lines that have less than 2ms execution time are excluded for Chrome.
You can see that Chrome increases array size in powers of 1.5, plus some offset to account for small arrays.
For Firefox, it's a power of two:
126: 284ms
254: 65ms 2.015873015873016
510: 28ms 2.0078740157480315
1022: 58ms 2.003921568627451
2046: 89ms 2.0019569471624266
4094: 191ms 2.0009775171065494
8190: 364ms 2.0004885197850513
I had to put the threshold up quite a bit in Firefox, that's why we start at #126.
With IE, we get a mix:
256: 11ms 256
512: 26ms 2
1024: 77ms 2
1708: 113ms 1.66796875
2848: 154ms 1.6674473067915691
4748: 423ms 1.6671348314606742
7916: 944ms 1.6672283066554338
It's a power of two at first and then it moves to powers of five thirds.
So all common implementations use the "normal" way for arrays (instead of going crazy with ropes, for example).
Here's the benchmark code and here's the fiddle it's in.
var arrayCount = 10000;
var dynamicArrays = [];
for(var j=0;j<arrayCount;j++)
dynamicArrays[j] = [];
var lastLongI = 1;
for(var i=0;i<10000;i++)
{
var before = Date.now();
for(var j=0;j<arrayCount;j++)
dynamicArrays[j][i] = i;
var span = Date.now() - before;
if (span > 10)
{
console.log(i + ": " + span + "ms" + " " + (i / lastLongI));
lastLongI = i;
}
}

While running under node.js 0.10 (built on v8) I was seeing CPU usage that seemed excessive for the workload. I traced one performance problem to a function that was checking for the existence of a string in an array. So I ran some tests.
loaded 90,822 hosts
loading config took 0.087 seconds (array)
loading config took 0.152 seconds (object)
Loading 91k entries into an array (with validate & push) is faster than setting obj[key]=value.
In the next test, I looked up every hostname in the list one time (91k iterations, to average the lookup time):
searching config took 87.56 seconds (array)
searching config took 0.21 seconds (object)
The application here is Haraka (a SMTP server) and it loads the host_list once at startup (and after changes) and subsequently performs this lookup millions of times during operation. Switching to an object was a huge performance win.

We Keep Coding

JavaScript is the programming language of the Web.