Which algorithm does the JavaScript Array#sort() function use? I understand that it can take all manner of arguments and functions to perform different kinds of sorts, I'm simply interested in which algorithm the vanilla sort uses.
I've just had a look at the WebKit (Chrome, Safari …) source. Depending on the type of array, different sort methods are used:
Numeric arrays (or arrays of primitive type) are sorted using the C++ standard library function std::qsort which implements some variation of quicksort (usually introsort).
Contiguous arrays of non-numeric type are stringified and sorted using mergesort, if available (to obtain a stable sorting) or qsort if no merge sort is available.
For other types (non-contiguous arrays and presumably for associative arrays) WebKit uses either selection sort (which they call “min” sort) or, in some cases, it sorts via an AVL tree. Unfortunately, the documentation here is rather vague so you’d have to trace the code paths to actually see for which types which sort method is used.
And then there are gems like this comment:
// FIXME: Since we sort by string value, a fast algorithm might be to use a
// radix sort. That would be O(N) rather than O(N log N).
– Let’s just hope that whoever actually “fixes” this has a better understanding of asymptotic runtime than the writer of this comment, and realises that radix sort has a slightly more complex runtime description than simply O(N).
(Thanks to phsource for pointing out the error in the original answer.)
If you look at this bug 224128, it appears that MergeSort is being used by Mozilla.
There is no draft requirement for JS to use a specific sorting algorthim. As many have mentioned here, Mozilla uses merge sort.However, In Chrome's v8 source code, as of today, it uses QuickSort and InsertionSort, for smaller arrays.
V8 Engine Source
From Lines 807 - 891
var QuickSort = function QuickSort(a, from, to) {
var third_index = 0;
while (true) {
// Insertion sort is faster for short arrays.
if (to - from <= 10) {
InsertionSort(a, from, to);
return;
}
if (to - from > 1000) {
third_index = GetThirdIndex(a, from, to);
} else {
third_index = from + ((to - from) >> 1);
}
// Find a pivot as the median of first, last and middle element.
var v0 = a[from];
var v1 = a[to - 1];
var v2 = a[third_index];
var c01 = comparefn(v0, v1);
if (c01 > 0) {
// v1 < v0, so swap them.
var tmp = v0;
v0 = v1;
v1 = tmp;
} // v0 <= v1.
var c02 = comparefn(v0, v2);
if (c02 >= 0) {
// v2 <= v0 <= v1.
var tmp = v0;
v0 = v2;
v2 = v1;
v1 = tmp;
} else {
// v0 <= v1 && v0 < v2
var c12 = comparefn(v1, v2);
if (c12 > 0) {
// v0 <= v2 < v1
var tmp = v1;
v1 = v2;
v2 = tmp;
}
}
// v0 <= v1 <= v2
a[from] = v0;
a[to - 1] = v2;
var pivot = v1;
var low_end = from + 1; // Upper bound of elements lower than pivot.
var high_start = to - 1; // Lower bound of elements greater than pivot.
a[third_index] = a[low_end];
a[low_end] = pivot;
// From low_end to i are elements equal to pivot.
// From i to high_start are elements that haven't been compared yet.
partition: for (var i = low_end + 1; i < high_start; i++) {
var element = a[i];
var order = comparefn(element, pivot);
if (order < 0) {
a[i] = a[low_end];
a[low_end] = element;
low_end++;
} else if (order > 0) {
do {
high_start--;
if (high_start == i) break partition;
var top_elem = a[high_start];
order = comparefn(top_elem, pivot);
} while (order > 0);
a[i] = a[high_start];
a[high_start] = element;
if (order < 0) {
element = a[i];
a[i] = a[low_end];
a[low_end] = element;
low_end++;
}
}
}
if (to - high_start < low_end - from) {
QuickSort(a, high_start, to);
to = low_end;
} else {
QuickSort(a, from, low_end);
from = high_start;
}
}
};
Update
As of 2018 V8 uses TimSort, thanks #celwell. Source
The ECMAScript standard does not specify which sort algorithm is to be used. Indeed, different browsers feature different sort algorithms. For example, Mozilla/Firefox's sort() is not stable (in the sorting sense of the word) when sorting a map. IE's sort() is stable.
I think that would depend on what browser implementation you are refering to.
Every browser type has it's own javascript engine implementation, so it depends.
You could check the sourcecode repos for Mozilla and Webkit/Khtml for different implementations.
IE is closed source however, so you may have to ask somebody at microsoft.
Google Chrome uses TimSort, Python's sorting algorithm, as of version 70 released on September 13, 2018.
See the the post on the V8 dev blog (V8 is Chrome's JavaScript engine) for details about this change. You can read the source code or patch 1186801 specifically.
After some more research, it appears, for Mozilla/Firefox, that Array.sort() uses Merge Sort. See the code here.
Related
I am trying to optimize a function. I believe this nested for loop is quadratic, but I'm not positive. I have recreated the function below
const bucket = [["e","f"],[],["j"],[],["p","q"]]
let totalLettersIWantBack = 4;
//I'm starting at the end of the bucket
function produceLetterArray(bucket, limit){
let result = [];
let countOfLettersAccumulated = 0;
let i = bucket.length - 1;
while(i > 0){
if(bucket[i].length > 0){
bucket[i].forEach( (letter) =>{
if(countOfLettersAccumulated === totalLettersIWantBack){
return;
}
result.push(letter);
countOfLettersAccumulated++;
})
}
i--;
}
return result;
}
console.log(produceLetterArray(bucket, totalLettersIWantBack));
Here is a trick for such questions. For the code whose complexity you want to analyze, just write the time that it would take to execute each statement in the worst case assuming no other statement exists. Note the comments begining with #operations worst case:
For the given code:
while(i > 0){ //#operations worst case: bucket.length
if(bucket[i].length > 0){ //#operations worst case:: 1
bucket[i].forEach( (letter) =>{ //#operations worst case: max(len(bucket[i])) for all i
if(countOfLettersAccumulated === totalLettersIWantBack){ //#operations worst case:1
return;
}
result.push(letter); //#operations worst case:1
countOfLettersAccumulated++; //#operations worst case:1
})
}
i--; ////#operations worst case:: 1
}
We can now multiply all the worst case times (since they all can be achieved in the worst case, you can always set totalLettersIWantBack = 10^9) to get the O complexity of the snippet:
Complexity = O(bucket.length * 1 * max(len(bucket[i])) * 1 * 1 * 1 * 1)
= O(bucket.length * max(len(bucket[i]))
If the length of each of the bucket[i] was a constant, K, then your complexity reduces to:
O(K * bucket.length ) = O(bucket.length)
Note that the complexity of the push operation may not remain constant as the number of elements grow (ultimately, the runtime will need to allocate space for the added elements, and all the existing elements may have to be moved).
Whether or not this is quadratic depends on what you consider N and how bucket is organized. If N is the total number of letters, then the runtime is bound by either the number of bins in your bucket, if that is larger than N, or it is bound by the number of letters in the bucket, if N is larger. In either case, the search time increases linearly with the larger bound, if one would dominate the other the time complexity is O(N). This is effectively a linear search with "turns" in it, scrunching a linear search and spacing it out does not change the time complexity. The existence of multiple loops in a piece of code does not alone make it non linear. Take the linear search example again. We search a list until we've found the largest element.
//12 elements
var array = [0,1,2,3,4,5,6,7,8,9,10,11];
var rows = 3;
var cols = 4;
var largest = -1;
for(var i = 0; i < rows; ++i){
for(var j = 0; j < cols; ++j){
var checked = array[(i * cols) + j];
if (checked > largest){
largest = checked;
}
}
}
console.log("found largest number (eleven): " + largest.toString());
Despite this using two loops instead of one, the runtime complexity is still O(N) where N is the number of elements in the input. Scrunching this down so each index is actually an array to multiple elements, or separating relevant elements by empty bins doesn't change the fact the runtime complexity is bound linearly.
This is technically linear with n being the number of elements total in your matrix. This is because the exit condition is the length of bucket and for each array in bucket you check if countOfLettersAccumulated is equal to totalLettersIWantBack. Continually looking at values.
It gets a lot more complicated if you are looking for an answer matching the dimensions of your matrix because it looks like the dimensions of bucket are not fixed.
You can turn this bit of code into constant by adding an additional check outside the bucket foreach which if countOfLettersAccumulated is equal to
totalLettersIWantBack then you do a break.
I like #axiom's explanation of complexity analyze.
Just would like to add possible optimized solution.
UPD .push (O(1)) is faster that .concat (O(n^2))
also here is test Array push vs. concat
const bucket = [["e","f"],[],["j", 'm', 'b'],[],["p","q"]]
let totalLettersIWantBack = 4;
//I'm starting at the end of the bucket
function produceLetterArray(bucket, limit){
let result = [];
for(let i = bucket.length-1; i > 0 && result.length < totalLettersIWantBack; i--){
//previous version
//result = result.concat(bucket[i].slice(0, totalLettersIWantBack-result.length));
//faster version of merging array
Array.prototype.push.apply(result, bucket[i].slice(0, totalLettersIWantBack-result.length));
}
return result;
}
console.log(produceLetterArray(bucket, totalLettersIWantBack));
Which algorithm does the JavaScript Array#sort() function use? I understand that it can take all manner of arguments and functions to perform different kinds of sorts, I'm simply interested in which algorithm the vanilla sort uses.
I've just had a look at the WebKit (Chrome, Safari …) source. Depending on the type of array, different sort methods are used:
Numeric arrays (or arrays of primitive type) are sorted using the C++ standard library function std::qsort which implements some variation of quicksort (usually introsort).
Contiguous arrays of non-numeric type are stringified and sorted using mergesort, if available (to obtain a stable sorting) or qsort if no merge sort is available.
For other types (non-contiguous arrays and presumably for associative arrays) WebKit uses either selection sort (which they call “min” sort) or, in some cases, it sorts via an AVL tree. Unfortunately, the documentation here is rather vague so you’d have to trace the code paths to actually see for which types which sort method is used.
And then there are gems like this comment:
// FIXME: Since we sort by string value, a fast algorithm might be to use a
// radix sort. That would be O(N) rather than O(N log N).
– Let’s just hope that whoever actually “fixes” this has a better understanding of asymptotic runtime than the writer of this comment, and realises that radix sort has a slightly more complex runtime description than simply O(N).
(Thanks to phsource for pointing out the error in the original answer.)
If you look at this bug 224128, it appears that MergeSort is being used by Mozilla.
There is no draft requirement for JS to use a specific sorting algorthim. As many have mentioned here, Mozilla uses merge sort.However, In Chrome's v8 source code, as of today, it uses QuickSort and InsertionSort, for smaller arrays.
V8 Engine Source
From Lines 807 - 891
var QuickSort = function QuickSort(a, from, to) {
var third_index = 0;
while (true) {
// Insertion sort is faster for short arrays.
if (to - from <= 10) {
InsertionSort(a, from, to);
return;
}
if (to - from > 1000) {
third_index = GetThirdIndex(a, from, to);
} else {
third_index = from + ((to - from) >> 1);
}
// Find a pivot as the median of first, last and middle element.
var v0 = a[from];
var v1 = a[to - 1];
var v2 = a[third_index];
var c01 = comparefn(v0, v1);
if (c01 > 0) {
// v1 < v0, so swap them.
var tmp = v0;
v0 = v1;
v1 = tmp;
} // v0 <= v1.
var c02 = comparefn(v0, v2);
if (c02 >= 0) {
// v2 <= v0 <= v1.
var tmp = v0;
v0 = v2;
v2 = v1;
v1 = tmp;
} else {
// v0 <= v1 && v0 < v2
var c12 = comparefn(v1, v2);
if (c12 > 0) {
// v0 <= v2 < v1
var tmp = v1;
v1 = v2;
v2 = tmp;
}
}
// v0 <= v1 <= v2
a[from] = v0;
a[to - 1] = v2;
var pivot = v1;
var low_end = from + 1; // Upper bound of elements lower than pivot.
var high_start = to - 1; // Lower bound of elements greater than pivot.
a[third_index] = a[low_end];
a[low_end] = pivot;
// From low_end to i are elements equal to pivot.
// From i to high_start are elements that haven't been compared yet.
partition: for (var i = low_end + 1; i < high_start; i++) {
var element = a[i];
var order = comparefn(element, pivot);
if (order < 0) {
a[i] = a[low_end];
a[low_end] = element;
low_end++;
} else if (order > 0) {
do {
high_start--;
if (high_start == i) break partition;
var top_elem = a[high_start];
order = comparefn(top_elem, pivot);
} while (order > 0);
a[i] = a[high_start];
a[high_start] = element;
if (order < 0) {
element = a[i];
a[i] = a[low_end];
a[low_end] = element;
low_end++;
}
}
}
if (to - high_start < low_end - from) {
QuickSort(a, high_start, to);
to = low_end;
} else {
QuickSort(a, from, low_end);
from = high_start;
}
}
};
Update
As of 2018 V8 uses TimSort, thanks #celwell. Source
The ECMAScript standard does not specify which sort algorithm is to be used. Indeed, different browsers feature different sort algorithms. For example, Mozilla/Firefox's sort() is not stable (in the sorting sense of the word) when sorting a map. IE's sort() is stable.
I think that would depend on what browser implementation you are refering to.
Every browser type has it's own javascript engine implementation, so it depends.
You could check the sourcecode repos for Mozilla and Webkit/Khtml for different implementations.
IE is closed source however, so you may have to ask somebody at microsoft.
Google Chrome uses TimSort, Python's sorting algorithm, as of version 70 released on September 13, 2018.
See the the post on the V8 dev blog (V8 is Chrome's JavaScript engine) for details about this change. You can read the source code or patch 1186801 specifically.
After some more research, it appears, for Mozilla/Firefox, that Array.sort() uses Merge Sort. See the code here.
I'm studying for an interview and have been working through some practice questions. The question is:
Find the most repeated integer in an array.
Here is the function I created and the one they created. They are appropriately named.
var arr = [3, 6, 6, 1, 5, 8, 9, 6, 6]
function mine(arr) {
arr.sort()
var count = 0;
var integer = 0;
var tempCount = 1;
var tempInteger = 0;
var prevInt = null
for (var i = 0; i < arr.length; i++) {
tempInteger = arr[i]
if (i > 0) {
prevInt = arr[i - 1]
}
if (prevInt == arr[i]) {
tempCount += 1
if (tempCount > count) {
count = tempCount
integer = tempInteger
}
} else {
tempCount = 1
}
}
console.log("most repeated is: " + integer)
}
function theirs(a) {
var count = 1,
tempCount;
var popular = a[0];
var temp = 0;
for (var i = 0; i < (a.length - 1); i++) {
temp = a[i];
tempCount = 0;
for (var j = 1; j < a.length; j++) {
if (temp == a[j])
tempCount++;
}
if (tempCount > count) {
popular = temp;
count = tempCount;
}
}
console.log("most repeated is: " + popular)
}
console.time("mine")
mine(arr)
console.timeEnd("mine")
console.time("theirs")
theirs(arr)
console.timeEnd("theirs")
These are the results:
most repeated is: 6
mine: 16.929ms
most repeated is: 6
theirs: 0.760ms
What makes my function slower than their?
My test results
I get the following results when I test (JSFiddle) it for a random array with 50 000 elements:
mine: 28.18 ms
theirs: 5374.69 ms
In other words, your algorithm seems to be much faster. That is expected.
Why is your algorithm faster?
You sort the array first, and then loop through it once. Firefox uses merge sort and Chrome uses a variant of quick sort (according to this question). Both take O(n*log(n)) time on average. Then you loop through the array, taking O(n) time. In total you get O(n*log(n)) + O(n), that can be simplified to just O(n*log(n)).
Their solution, on the other hand, have a nested loop where both the outer and inner loops itterate over all the elements. That should take O(n^2). In other words, it is slower.
Why does your test results differ?
So why does your test results differ from mine? I see a number of possibilities:
You used a to small sample. If you just used the nine numbers in your code, that is definately the case. When you use short arrays in the test, overheads (like running the console.log as suggested by Gundy in comments) dominate the time it takes. This can make the result appear completely random.
neuronaut suggests that it is related to the fact that their code operates on the array that is already sorted by your code. While that is a bad way of testing, I fail to see how it would affect the result.
Browser differences of some kind.
A note on .sort()
A further note: You should not use .sort() for sorting numbers, since it sorts things alphabetically. Instead, use .sort(function(a, b){return a-b}). Read more here.
A further note on the further note: In this particular case, just using .sort() might actually be smarter. Since you do not care about the sorting, only the grouping, it doesnt matter that it sort the numbers wrong. It will still group elements with the same value together. If it is faster without the comparison function (i suspect it is), then it makes sense to sort without one.
An even faster algorithm
You solved the problem in O(n*log(n)), but you can do it in just O(n). The algorithm to do that is quite intuitive. Loop through the array, and keep track of how many times each number appears. Then pick the number that appears the most times.
Lets say there are m different numbers in the array. Looping through the array takes O(n) and finding the max takes O(m). That gives you O(n) + O(m) that simplifies to O(n) since m < n.
This is the code:
function anders(arr) {
//Instead of an array we use an object and properties.
//It works like a dictionary in other languages.
var counts = new Object();
//Count how many of each number there is.
for(var i=0; i<arr.length; i++) {
//Make sure the property is defined.
if(typeof counts[arr[i]] === 'undefined')
counts[arr[i]] = 0;
//Increase the counter.
counts[arr[i]]++;
}
var max; //The number with the largest count.
var max_count = -1; //The largest count.
//Iterate through all of the properties of the counts object
//to find the number with the largerst count.
for (var num in counts) {
if (counts.hasOwnProperty(num)) {
if(counts[num] > max_count) {
max_count = counts[num];
max = num;
}
}
}
//Return the result.
return max;
}
Running this on a random array with 50 000 elements between 0 and 49 takes just 3.99 ms on my computer. In other words, it is the fastest. The backside is that you need O(m) memory to store how many time each number appears.
It looks like this isn't a fair test. When you run your function first, it sorts the array. This means their function ends up using already sorted data but doesn't suffer the time cost of performing the sort. I tried swapping the order in which the tests were run and got nearly identical timings:
console.time("theirs")
theirs(arr)
console.timeEnd("theirs")
console.time("mine")
mine(arr)
console.timeEnd("mine")
most repeated is: 6
theirs: 0.307ms
most repeated is: 6
mine: 0.366ms
Also, if you use two separate arrays you'll see that your function and theirs run in the same amount of time, approximately.
Lastly, see Anders' answer -- it demonstrates that larger data sets reveal your function's O(n*log(n)) + O(n) performance vs their function's O(n^2) performance.
Other answers here already do a great job of explaining why theirs is faster - and also how to optimize yours. Yours is actually better with large datasets (#Anders). I managed to optimize the theirs solution; maybe there's something useful here.
I can get consistently faster results by employing some basic JS micro-optimizations. These optimizations can also be applied to your original function, but I applied them to theirs.
Preincrementing is slightly faster than postincrementing, because the value does not need to be read into memory first
Reverse-while loops are massively faster (on my machine) than anything else I've tried, because JS is translated into opcodes, and guaranteeing >= 0 is very fast. For this test, my computer scored 514,271,438 ops/sec, while the next-fastest scored 198,959,074.
Cache the result of length - for larger arrays, this would make better more noticeably faster than theirs
Code:
function better(a) {
var top = a[0],
count = 0,
i = len = a.length - 1;
while (i--) {
var j = len,
temp = 0;
while (j--) {
if (a[j] == a[i]) ++temp;
}
if (temp > count) {
count = temp;
top = a[i];
}
}
console.log("most repeated is " + top);
}
[fiddle]
It's very similar, if not the same, to theirs, but with the above micro-optimizations.
Here are the results for running each function 500 times. The array is pre-sorted before any function is run, and the sort is removed from mine().
mine: 44.076ms
theirs: 35.473ms
better: 32.016ms
I'm studying for a technical interview right now, and writing quick javascript implementations of different sorts. The random-array benchmark results for most of the elementary sorts makes sense but the selection sort is freakishly fast. And I don't know why.
Here is my implementation of the Selection Sort:
Array.prototype.selectionSort = function () {
for (var target = 0; target < this.length - 1; target++) {
var min = target;
for (var j = target + 1; j < this.length - 1; j++) {
if (this[min] > this[j]) {
min = j;
}
}
if (min !== target) {
this.swap(min, target);
}
}
}
Here are the results of the same randomly generated array with 10000 elements:
BubbleSort => 148ms
InsertionSort => 94ms
SelectionSort => 91ms
MergeSort => 45ms
All the sorts are using the same swap method. So why is Selection Sort faster? My only guess is that Javascript is really fast at array traversal but slow at value mutation, since SelectionSort uses the least in value mutation, it's faster.
** For Reference **
Here is my Bubble Sort implementation
Array.prototype.bubbleSort = function () {
for (var i = this.length - 1; i > 1; i--) {
var swapped = false;
for (var j = 0; j < i; j++) {
if (this[j + 1] < this[j]) {
this.swap(j, j+1);
swapped = true;
}
}
if ( ! swapped ) {
return;
}
}
}
Here is the swap Implementation
Array.prototype.swap = function (index1, index2) {
var val1 = this[index1],
val2 = this[index2];
this[index1] = val2;
this[index2] = val1;
};
First let me point out two flaws:
The code for your selection sort is faulty. The inner loop needs to be
for (var j = target + 1; j < this.length; j++) {
otherwise the last element is never selected.
Your jsperf tests sort, as you say, the "same randomly generated array" every time. That means that the successive runs in each test loop will try to sort an already sorted array, which would favour algorithms like bubblesort that have a linear best-case performance.
Luckily, your test array is so freakishly large that jsperf runs only a single iteration of its test loop at once, calling the setup code that initialises the array before every run. This would haunt you for smaller arrays, though. You need to shuffle the array inside the "timed code" itself.
Why is Selection Sort faster? My only guess is that Javascript is really fast at array traversal but slow at value mutation.
Yes. Writes are always slower than reads, and have negative effects on cached values as well.
SelectionSort uses the least in value mutation
Yes, and that is quite significant. Both selection and bubble sort do have an O(n²) runtime, which means that both execute about 100000000 loop condition checks, index increments, and comparisons of two array elements.
However, while selection sort does only O(n) swaps, bubble sort does O(n²) of them. That means not only mutating the array, but also the overhead of a method call. And that much much more often than the selection sort does it. Here are some example logs:
> swaps in .selectionSort() of 10000 element arrays
9989
9986
9992
9990
9987
9995
9989
9990
9988
9991
> swaps in .bubbleSort() of 10000 element arrays
24994720
25246566
24759007
24912175
24937357
25078458
24918266
24789670
25209063
24894328
Ooops.
This is related to this question.
I have heard that the while pattern with a decrement and a greater than test is faster than any other loop pattern. Given that, is this the fastest possible array copy in js?
function arrayCopy(src,sstart,dst,dstart,length) {
length += sstart;
dstart += length;
while(--length >= sstart) {
dst[--dstart] = src[length];
}
}
Other test functions
function slowCopy(src,sstart,dst,dstart,length) {
for(var i = sstart; i < sstart+length;i+=1 ) {
dst[dstart++] = src[i];
}
}
function aCopy(src,sstart,dst,dstart,length) {
Array.prototype.splice.apply(dst,[dstart, length].concat(src.slice(sstart,sstart+length)));
}
Test Results http://jsperf.com/fastest-js-arraycopy
arrayCopy -
2,899
±5.27%
fastest
slowCopy - WINNER
2,977
±4.86%
fastest
aCopy -
2,810
±4.61%
fastest
I want to add some more of the suggested functions below to the jsPerf tests but none of them incorporate source start offset, destination start offset or length of copy. Anyway, I was somewhat surprised by these results which appear to be the opposite of what I expect
Who says you need a loop?
var myArrayCopy = JSON.parse(JSON.stringify(myArray));
This method makes a deep clone of the array. Here it is in a function:
function arrayCopy(src,sstart,dst,dstart,length) {
dst = JSON.parse(JSON.stringify(src));
}
Keep in mind the other variables (besides src and dst) are there just to maintain your original code structure in case you have pre-existing calls to this function. They won't be used and can be removed.
Slow Copy is, surprisingly, the winner. By a narrow margin:
function slowCopy(src,sstart,dst,dstart,length) {
for(var i = sstart; i < sstart+length;i+=1 ) {
dst[dstart++] = src[i];
}
}
I think this is the fastest way:
var original = [1, 2, 3];
var copy = original.slice(0);