Javascript add same element N times in an Array - javascript

Suppose I have map like this:
var map = {"a" : 100, "b" : 200, "c": 700};
And I want an Array consisting of "a" 100 times, "b" 200 times and "c" 700 times:
map_array = [a, a, a, a, ... a, b, b, b, ... b, c, c, c, ... c]
Simple solution is to just loop the frequency times and push in the array:
var map_array = []
for(key in map)
{
for(var i=1; i <= map[key] ; i++)
{
map_array.push(key)
}
}
But this will obviously take time to processes with large data, can we rework the above function to make it more efficient?

It seems to me that the real problem here is constructing the sub-arrays of repeated "a"'s, "b"'s, and "c"'s. Once you have them, you can just concat them to make your final array.
So, what we really want is a function f(x, n) which creates an array filled with n x's.
So, as a standard testbed, I'm going to define a pair of clock functions. The first measures the time it takes some array-filling function to create 500000 arrays, each containing 2187 "a"'s. The second measures the time it takes some array-filling function to create 500 arrays, each containing 1594323 "a"'s. I chose powers of three because some of my algorithms are binary-based, and I wanted to avoid any coincidences. Regardless, all of the algorithms will for for any n.
var clock1=function(f)
{
var m,t;
m=500000;
t=Date.now();
while(m--)
{
f("a", 2187);
}
t=Date.now()-t;
return t;
};
var clock2=function(f)
{
var m,t;
m=500;
t=Date.now();
while(m--)
{
f("a", 1594323);
}
t=Date.now()-t;
return t;
};
I'm running this test on my local machine running plain v8 in strict mode. Below are some candidates for f:
Linear Method
As already suggested by Alex, you can do this using a linear loop. Simply define an array and run a loop which executes n times, each time adding one x to our array.
var f=function(x,n)
{
var y;
y=Array(n);
while(n--)
{
y[n]=x;
}
return y;
};
We can optimize by using a counting variable, n to avoid calling push or y.length, as well as pre-initializing the array to the desired length. (Both suggested by Alex.) My backwards while loop is just an old habit that may boost performance slightly.
This function takes 2200ms to pass clock1, and 90658ms to pass clock2.
Partial Binary Method
We can also try constructing it using binary concatenation. The idea is that you start out with a single-element array, then , if its length is significantly less than the target length, you concat it with itself, effectively doubling it. When you get it close to the target size, switch back to adding elements one at a time until it reaches its target size:
var f=function(x,n)
{
var y,m;
y=[x];
m=1;
while(m<n)
{
if(m*2<=n)
{
y=y.concat(y);
m*=2;
}
else
{
y[m]=x;
m++;
}
}
return y;
};
Here, m is just a counting variable to keep track of the size of y.
This function takes 3630ms to pass clock1, and 42591ms to pass clock2, making it 65% slower than the linear method for small arrays, but 112% faster for large ones.
Full Binary Method
We can boost performance still further, however, by using full binary construction. The partial binary method suffers because it is forced to switch to element-by-element addition when it approaches its target length (on average, about 75% of the way through). We can fix this:
First, convert the target size into binary and save it to an array. Now, define y to be a single-element array z to be an empty array. Then, loop (backwards) through the binary array, for each element concating y with itself. In each iteration, if the respective binary digit is 1, save y into z. Finally, concat all of the elements of z together. The result is your complete array.
So, in order to fill an array of length 700, it first converts 700 to binary:
[1,0,1,0,1,1,1,1,0,0]
Looping backwards across it, it performs 9 concat's and 6 element-additions, generating a z which looks like this:
[0,0,4,8,16,32,128,512]
// I've written the lengths of the sub-arrays rather than the arrays themselves.
When it concat's everything in z together, it gets a single array of length 700, our result.
var f=function(x,n)
{
var y,z,c;
c=0;
y=[x];
z=[];
while(n>0)
{
if(n%2)
{
z[c++]=y;
n--;
}
if(n===0)
{
break;
}
n/=2;
y=y.concat(y);
}
return z.concat.apply([],z);
};
To optimize, I've compressed the binary conversion step and the loop together here. z.concat.apply([],z) uses a bit of apply magic to flatten z, an array of arrays, into a single array. For some reason, this is faster than concating to z on the fly. The second if statement prevents it from doubling y one last time after the computation is complete.
This function takes 3157ms to pass clock1 and 26809ms to pass clock2, making it 15% faster than the partial binary method for small arrays and 59% faster for large ones. It is still 44% slower than the linear method for small arrays.
Binary String Method
The concat function is weird. The larger the arrays to be concatenated, the more efficient it becomes, relatively speaking. In other words, combining two arrays of length 100 is significantly faster than combining four arrays of length 50 using concat. As a result, as the involved arrays get larger, concat becomes more efficient than push, or direct assignment. This is one of the main reasons why the binary methods are faster than the linear method for large arrays. Unfortunately, concat also suffers because it copies the involved arrays each time. Because arrays are objects, this gets quite costly. Strings are less complex than arrays, so perhaps using them would avoid this drain? We can simply use string addition (analogous to concatenation) to construct our array, and the split the resulting string.
A full binary method based on strings would look like this:
var f=function(x,n)
{
var y,z;
y=""+x;
z="";
while(n>0)
{
if(n%2)
{
z+=y;
n--;
}
if(n===0)
{
break;
}
y+=y;
n/=2;
}
return z.split("");
};
This function takes 3484ms to pass clock1 and 14534ms to pass clock2, making it 10% slower than the array-based full binary method at computing small arrays, but 85% faster for large arrays.
So, overall, its a mixed bag. The linear method gets very good performance on smaller arrays and is extremely simple. The binary string method, however, is a whopping 524% faster on large arrays, and is actually slightly less complex than the binary array method.
Hope this helps!

There is a new feature in ECMA6 called .repeat()
It will solve your issue as magic: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/repeat

You can do something like this:
const map = {"a" : 10, "b" : 20, "c": 7};
const keys = Object.keys(map);
let finalArr = [];
keys.forEach(key=>{
finalArr = [...finalArr,...((key+" ").repeat(map[key]).trim().split(" "))];
})
console.log(finalArr);

Maybe defining the array length may be more performant, at least your garbage collector will be happier:
map_array = new Array(map.length);
var c = 0;
for (key in map) {
var max = map[key];
for (var i = 1; i <= max; i++) {
map_array[c] = key;
c++;
}
}
thats more performant than using map()
http://jsperf.com/map-vs-forin/3

EDIT: I don't recommend this solution, but check the comments on this answer in order to get most performant answer.
var arrays = Object.keys(map).map(function(obj) {
var i = 0, l = map[obj], s = "";
for(;i<l;++i) {
s+= obj +",";
}
return s.split(",");
});
It actually returns three arrays with values, but you can flatten them later with:
map_array = map_array.concat.apply(map_array, arrays);
http://jsperf.com/map-vs-forin

Related

Is the space complexity O(1) or O(N) for the following code snippet?

This is one of the leetcode problems. In this problem we are to build an array from permutation and the challenge is to solve the problem in O(1) space complexity, so does the solution fulfills the criteria or not? One more thing if we are manipulating the same array but increasing its length by appending the elements at the end, does it mean that we are assigning new space hence causing O(N) space complexity during the execution of the program?
var buildArray = function(nums) {
let len = nums.length
for (let i = 0; i < len; i++) {
nums.push(nums[nums[i]]);
}
nums = nums.splice(len, nums.length);
return nums;
};
This is O(n) space complexity, you can't "cheat" by pushing your data on the input array, because you are using extra space anyways.
This code would be the equivalent of storing your data in a new array, and then returning that
I would guess the target of this space complexity limitation is for you to reach to a solution using pointers to mutate the input array
The space complexity still is O(n). The input array has n length. When you push in the array it basically allocates one more memory location and then update the array.
By pushing the elements in the array you are still using extra n space.
In C++, to resize the array code would be written as:
int* arr = new int[10]
int* resize_arr = new int[size*2];
for(int i = 0; i < size; i++)
resize_arr[i] = arr[i];
arr = resize_arr;
delete[] resize_arr;
after using all the allocated space, to add more elements you need to create a new array then copy the elements.
All these steps are done in one line in python. It does not mean you are not using more space.
Since for any given nums you are looping at least once the entire array and then do an O(1) operations ( keys lookup & push ), it would be safe to say this is an O(n) solution.

Why are negative array indices much slower than positive array indices?

Here's a simple JavaScript performance test:
const iterations = new Array(10 ** 7);
var x = 0;
var i = iterations.length + 1;
console.time('negative');
while (--i) {
x += iterations[-i];
}
console.timeEnd('negative');
var y = 0;
var j = iterations.length;
console.time('positive');
while (j--) {
y += iterations[j];
}
console.timeEnd('positive');
The first loop counts from 10,000,000 down to 1 and accesses an array with a length of 10 million using a negative index on each iteration. So it goes through the array from beginning to end.
The second loop counts from 9,999,999 down to 0 and accesses the same array using a positive index on each iteration. So it goes through the array in reverse.
On my PC, the first loop takes longer than 6 seconds to complete, but the second one only takes ~400ms.
Why is the second loop faster than the first?
Because iterations[-1] will evaluate to undefined (which is slow as it has to go up the whole prototype chain and can't take a fast path) also doing math with NaN will always be slow as it is the non common case.
Also initializing iterations with numbers will make the whole test more useful.
Pro Tip: If you try to compare the performance of two codes, they should both result in the same operation at the end ...
Some general words about performance tests:
Performance is the compiler's job these days, code optimized by the compiler will always be faster than code you are trying to optimize through some "tricks". Therefore you should write code that is likely by the compiler to optimize, and that is in every case, the code that everyone else writes (also your coworkers will love you if you do so). Optimizing that is the most benefitial from the engine's view. Therefore I'd write:
let acc = 0;
for(const value of array) acc += value;
// or
const acc = array.reduce((a, b) => a + b, 0);
However in the end it's just a loop, you won't waste much time if the loop is performing bad, but you will if the whole algorithm performs bad (time complexity of O(n²) or more). Focus on the important things, not the loops.
To elaborate on Jonas Wilms' answer, Javascript does not work with negative indice (unlike languages like Python).
iterations[-1] is equal to iteration["-1"], which look for the property named -1 in the array object. That's why iterations[-1] will evaluate to undefined.

calculation on each array member

Given an array of arrays representing points, I want to find the minimum distance between points and return that distance and that starting point. I am using lodash and would like to be as functional as possible.
I have an array of arrays:
var all = [[1,2], [3,4], [4,5]];
I also have an object which has the current minimum distance and current array:
var cur_min = {'current_min': 10, 'point': [9,10]};
I want to find the minimum distance between all of the points in my array and if that distance is less than the current_min in my cur_min variable it will be updated. I have come up with the following:
function find_new_min(current, arr) {
return _.transform(arr, function(result, a) {
_.forEach(arr, function(b) {
if (!_.isEqual(a,b)) {
var d = get_distance(a,b);
if (d<result.current_min) {
result.current_min = d;
result.point = a;
}
}
});
}, _.clone(current));
}
I am expecting 6 different pairs of arrays since getting the distance between a point and itself is 0.
I cannot imagine looping over the same array twice is an efficient way to solve this problem. I've tried to rewrite this function using various lodash functions like _.forEach and _.reduce but I cannot find a way to not loop twice over the same array. Is there a faster way to solve this problem?
an example output for the above code is:
{ current_min: 1.222450611061632, loc: [ 1, 2 ] }
Your problem is one of the oldest: how to most efficiently sort an array of values based on some functional key. However, you also want to record the shortest distance and for that you need to compare to each and every other value. Because of this, you can't efficiently sort, you need to run over the entire array array.length times.
Considering you're expecting only six pairs of points, this seems like a severe case of premature optimization.
Readability-wise, unfortunately lodash doesn't offer functionality to produce a Cartesian product which you could then _.sortBy, nor does it offer a sort function that allows you to compare manually left and right and modify the values. I think your current implementation currently is as good as it gets with lodash.

why to use sorting maps on arrays. how is it better in some instances

I'm trying to learn about array sorting. It seems pretty straightforward. But on the mozilla site, I ran across a section discussing sorting maps (about three-quarters down the page).
The compareFunction can be invoked multiple times per element within
the array. Depending on the compareFunction's nature, this may yield a
high overhead. The more work a compareFunction does and the more
elements there are to sort, the wiser it may be to consider using a
map for sorting.
The example given is this:
// the array to be sorted
var list = ["Delta", "alpha", "CHARLIE", "bravo"];
// temporary holder of position and sort-value
var map = [];
// container for the resulting order
var result = [];
// walk original array to map values and positions
for (var i=0, length = list.length; i < length; i++) {
map.push({
// remember the index within the original array
index: i,
// evaluate the value to sort
value: list[i].toLowerCase()
});
}
// sorting the map containing the reduced values
map.sort(function(a, b) {
return a.value > b.value ? 1 : -1;
});
// copy values in right order
for (var i=0, length = map.length; i < length; i++) {
result.push(list[map[i].index]);
}
// print sorted list
print(result);
I don't understand a couple of things. To wit: What does it mean, "The compareFunction can be invoked multiple times per element within the array"? Can someone show me an example of that. Secondly, I understand what's being done in the example, but I don't understand the potential "high[er] overhead" of the compareFunction. The example shown here seems really straightforward and mapping the array into an object, sorting its value, then putting it back into an array would take much more overhead I'd think at first glance. I understand this is a simple example, and probably not intended for anything else than to show the procedure. But can someone give an example of when it would be lower overhead to map like this? It seems like a lot more work.
Thanks!
When sorting a list, an item isn't just compared to one other item, it may need to be compared to several other items. Some of the items may even have to be compared to all other items.
Let's see how many comparisons there actually are when sorting an array:
var list = ["Delta", "alpha", "CHARLIE", "bravo", "orch", "worm", "tower"];
var o = [];
for (var i = 0; i < list.length; i++) {
o.push({
value: list[i],
cnt: 0
});
}
o.sort(function(x, y){
x.cnt++;
y.cnt++;
return x.value == y.value ? 0 : x.value < y.value ? -1 : 1;
});
console.log(o);
Result:
[
{ value="CHARLIE", cnt=3},
{ value="Delta", cnt=3},
{ value="alpha", cnt=4},
{ value="bravo", cnt=3},
{ value="orch", cnt=3},
{ value="tower", cnt=7},
{ value="worm", cnt=3}
]
(Fiddle: http://jsfiddle.net/Guffa/hC6rV/)
As you see, each item was compared to seveal other items. The string "tower" even had more comparisons than there are other strings, which means that it was compared to at least one other string at least twice.
If the comparison needs some calculation before the values can be compared (like the toLowerCase method in the example), then that calculation will be done several times. By caching the values after that calculation, it will be done only once for each item.
The primary time saving in that example is gotten by avoiding calls to toLowerCase() in the comparison function. The comparison function is called by the sort code each time a pair of elements needs to be compared, so that's a savings of a lot of function calls. The cost of building and un-building the map is worth it for large arrays.
That the comparison function may be called more than once per element is a natural implication of how sorting works. If only one comparison per element were necessary, it would be a linear-time process.
edit — the number of comparisons that'll be made will be roughly proportional to the length of the array times the base-2 log of the length. For a 1000 element array, then, that's proportional to 10,000 comparisons (probably closer to 15,000, depending on the actual sort algorithm). Saving 20,000 unnecessary function calls is worth the 2000 operations necessary to build and un-build the sort map.
This is called the “decorate - sort - undecorate” pattern (you can find a nice explanation on Wikipedia).
The idea is that a comparison based sort will have to call the comparison function at least n times (where n is the number of item in the list) as this is the number of comparison you need just to check that the array is already sorted. Usually, the number of comparison will be larger than that (O(n ln n) if you are using a good algorithm), and according to the pingeonhole principle, there is at least one value that will be passed twice to the comparison function.
If your comparison function does some expensive processing before comparing the two values, then you can reduce the cost by first doing the expensive part and storing the result for each values (since you know that even in the best scenario you'll have to do that processing). Then, when sorting, you use a cheaper comparison function that only compare those cached outputs.
In this example, the "expensive" part is converting the string to lowercase.
Think of this like caching. It's simply saying that you should not do lots of calculation in the compare function, because you will be calculating the same value over and over.
What does it mean, "The compareFunction can be invoked multiple times per element within the array"?
It means exactly what it says. Lets you have three items, A, B and C. They need to be sorted by the result of compare function. The comparisons might be done like this:
compare(A) to compare(B)
compare(A) to compare(C)
compare(B) to compare(C)
So here, we have 3 values, but the compare() function was executed 6 times. Using a temporary array to cache things ensures we do a calculation only once per item, and can compare those results.
Secondly, I understand what's being done in the example, but I don't understand the potential "high[er] overhead" of the compareFunction.
What if compare() does a database fetch (comparing the counts of matching rows)? Or a complex math calculation (factorial, recursive fibbinocci, or iteration over a large number of items) These sorts of things you don't want to do more than once.
I would say most of the time, it's fine to leave really simple/fast calculations inline. Don't over optimize. But if you need to anything complex or slow in the comparison, you have to be smarter about it.
To respond to your first question, why would the compareFunction be called multiple times per element in the array?
Sorting an array almost always requires more than N passes, where N is the size of the array (unless the array is already sorted). Thus, for every element in your array, it may be compared to another element in your array up to N times (bubble sort requires at most N^2 comparisons). The compareFunction you provide will be used every time to determine whether two elements are less/equal/greater and thus will be called multiple times per element in the array.
A simple response for you second question, why would there be potentially higher overhead for a compareFunction?
Say your compareFunction does a lot of unnecessary work while comparing two elements of the array. This can cause sort to be slower, and thus using a compareFunction could potentially cause higher overhead.

Is it correct to use JavaScript Array.sort() method for shuffling?

I was helping somebody out with his JavaScript code and my eyes were caught by a section that looked like that:
function randOrd(){
return (Math.round(Math.random())-0.5);
}
coords.sort(randOrd);
alert(coords);
My first though was: hey, this can't possibly work! But then I did some experimenting and found that it indeed at least seems to provide nicely randomized results.
Then I did some web search and almost at the top found an article from which this code was most ceartanly copied. Looked like a pretty respectable site and author...
But my gut feeling tells me, that this must be wrong. Especially as the sorting algorithm is not specified by ECMA standard. I think different sorting algoritms will result in different non-uniform shuffles. Some sorting algorithms may probably even loop infinitely...
But what do you think?
And as another question... how would I now go and measure how random the results of this shuffling technique are?
update: I did some measurements and posted the results below as one of the answers.
After Jon has already covered the theory, here's an implementation:
function shuffle(array) {
var tmp, current, top = array.length;
if(top) while(--top) {
current = Math.floor(Math.random() * (top + 1));
tmp = array[current];
array[current] = array[top];
array[top] = tmp;
}
return array;
}
The algorithm is O(n), whereas sorting should be O(n log n). Depending on the overhead of executing JS code compared to the native sort() function, this might lead to a noticable difference in performance which should increase with array sizes.
In the comments to bobobobo's answer, I stated that the algorithm in question might not produce evenly distributed probabilities (depending on the implementation of sort()).
My argument goes along these lines: A sorting algorithm requires a certain number c of comparisons, eg c = n(n-1)/2 for Bubblesort. Our random comparison function makes the outcome of each comparison equally likely, ie there are 2^c equally probable results. Now, each result has to correspond to one of the n! permutations of the array's entries, which makes an even distribution impossible in the general case. (This is a simplification, as the actual number of comparisons neeeded depends on the input array, but the assertion should still hold.)
As Jon pointed out, this alone is no reason to prefer Fisher-Yates over using sort(), as the random number generator will also map a finite number of pseudo-random values to the n! permutations. But the results of Fisher-Yates should still be better:
Math.random() produces a pseudo-random number in the range [0;1[. As JS uses double-precision floating point values, this corresponds to 2^x possible values where 52 ≤ x ≤ 63 (I'm too lazy to find the actual number). A probability distribution generated using Math.random() will stop behaving well if the number of atomic events is of the same order of magnitude.
When using Fisher-Yates, the relevant parameter is the size of the array, which should never approach 2^52 due to practical limitations.
When sorting with a random comparision function, the function basically only cares if the return value is positive or negative, so this will never be a problem. But there is a similar one: Because the comparison function is well-behaved, the 2^c possible results are, as stated, equally probable. If c ~ n log n then 2^c ~ n^(a·n) where a = const, which makes it at least possible that 2^c is of same magnitude as (or even less than) n! and thus leading to an uneven distribution, even if the sorting algorithm where to map onto the permutaions evenly. If this has any practical impact is beyond me.
The real problem is that the sorting algorithms are not guaranteed to map onto the permutations evenly. It's easy to see that Mergesort does as it's symmetric, but reasoning about something like Bubblesort or, more importantly, Quicksort or Heapsort, is not.
The bottom line: As long as sort() uses Mergesort, you should be reasonably safe except in corner cases (at least I'm hoping that 2^c ≤ n! is a corner case), if not, all bets are off.
It's never been my favourite way of shuffling, partly because it is implementation-specific as you say. In particular, I seem to remember that the standard library sorting from either Java or .NET (not sure which) can often detect if you end up with an inconsistent comparison between some elements (e.g. you first claim A < B and B < C, but then C < A).
It also ends up as a more complex (in terms of execution time) shuffle than you really need.
I prefer the shuffle algorithm which effectively partitions the collection into "shuffled" (at the start of the collection, initially empty) and "unshuffled" (the rest of the collection). At each step of the algorithm, pick a random unshuffled element (which could be the first one) and swap it with the first unshuffled element - then treat it as shuffled (i.e. mentally move the partition to include it).
This is O(n) and only requires n-1 calls to the random number generator, which is nice. It also produces a genuine shuffle - any element has a 1/n chance of ending up in each space, regardless of its original position (assuming a reasonable RNG). The sorted version approximates to an even distribution (assuming that the random number generator doesn't pick the same value twice, which is highly unlikely if it's returning random doubles) but I find it easier to reason about the shuffle version :)
This approach is called a Fisher-Yates shuffle.
I would regard it as a best practice to code up this shuffle once and reuse it everywhere you need to shuffle items. Then you don't need to worry about sort implementations in terms of reliability or complexity. It's only a few lines of code (which I won't attempt in JavaScript!)
The Wikipedia article on shuffling (and in particular the shuffle algorithms section) talks about sorting a random projection - it's worth reading the section on poor implementations of shuffling in general, so you know what to avoid.
I did some measurements of how random the results of this random sort are...
My technique was to take a small array [1,2,3,4] and create all (4! = 24) permutations of it. Then I would apply the shuffling function to the array a large number of times and count how many times each permutation is generated. A good shuffling algoritm would distribute the results quite evenly over all the permutations, while a bad one would not create that uniform result.
Using the code below I tested in Firefox, Opera, Chrome, IE6/7/8.
Surprisingly for me, the random sort and the real shuffle both created equally uniform distributions. So it seems that (as many have suggested) the main browsers are using merge sort. This of course doesn't mean, that there can't be a browser out there, that does differently, but I would say it means, that this random-sort-method is reliable enough to use in practice.
EDIT: This test didn't really measured correctly the randomness or lack thereof. See the other answer I posted.
But on the performance side the shuffle function given by Cristoph was a clear winner. Even for small four-element arrays the real shuffle performed about twice as fast as random-sort!
// The shuffle function posted by Cristoph.
var shuffle = function(array) {
var tmp, current, top = array.length;
if(top) while(--top) {
current = Math.floor(Math.random() * (top + 1));
tmp = array[current];
array[current] = array[top];
array[top] = tmp;
}
return array;
};
// the random sort function
var rnd = function() {
return Math.round(Math.random())-0.5;
};
var randSort = function(A) {
return A.sort(rnd);
};
var permutations = function(A) {
if (A.length == 1) {
return [A];
}
else {
var perms = [];
for (var i=0; i<A.length; i++) {
var x = A.slice(i, i+1);
var xs = A.slice(0, i).concat(A.slice(i+1));
var subperms = permutations(xs);
for (var j=0; j<subperms.length; j++) {
perms.push(x.concat(subperms[j]));
}
}
return perms;
}
};
var test = function(A, iterations, func) {
// init permutations
var stats = {};
var perms = permutations(A);
for (var i in perms){
stats[""+perms[i]] = 0;
}
// shuffle many times and gather stats
var start=new Date();
for (var i=0; i<iterations; i++) {
var shuffled = func(A);
stats[""+shuffled]++;
}
var end=new Date();
// format result
var arr=[];
for (var i in stats) {
arr.push(i+" "+stats[i]);
}
return arr.join("\n")+"\n\nTime taken: " + ((end - start)/1000) + " seconds.";
};
alert("random sort: " + test([1,2,3,4], 100000, randSort));
alert("shuffle: " + test([1,2,3,4], 100000, shuffle));
Interestingly, Microsoft used the same technique in their pick-random-browser-page.
They used a slightly different comparison function:
function RandomSort(a,b) {
return (0.5 - Math.random());
}
Looks almost the same to me, but it turned out to be not so random...
So I made some testruns again with the same methodology used in the linked article, and indeed - turned out that the random-sorting-method produced flawed results. New test code here:
function shuffle(arr) {
arr.sort(function(a,b) {
return (0.5 - Math.random());
});
}
function shuffle2(arr) {
arr.sort(function(a,b) {
return (Math.round(Math.random())-0.5);
});
}
function shuffle3(array) {
var tmp, current, top = array.length;
if(top) while(--top) {
current = Math.floor(Math.random() * (top + 1));
tmp = array[current];
array[current] = array[top];
array[top] = tmp;
}
return array;
}
var counts = [
[0,0,0,0,0],
[0,0,0,0,0],
[0,0,0,0,0],
[0,0,0,0,0],
[0,0,0,0,0]
];
var arr;
for (var i=0; i<100000; i++) {
arr = [0,1,2,3,4];
shuffle3(arr);
arr.forEach(function(x, i){ counts[x][i]++;});
}
alert(counts.map(function(a){return a.join(", ");}).join("\n"));
I have placed a simple test page on my website showing the bias of your current browser versus other popular browsers using different methods to shuffle. It shows the terrible bias of just using Math.random()-0.5, another 'random' shuffle that isn't biased, and the Fisher-Yates method mentioned above.
You can see that on some browsers there is as high as a 50% chance that certain elements will not change place at all during the 'shuffle'!
Note: you can make the implementation of the Fisher-Yates shuffle by #Christoph slightly faster for Safari by changing the code to:
function shuffle(array) {
for (var tmp, cur, top=array.length; top--;){
cur = (Math.random() * (top + 1)) << 0;
tmp = array[cur]; array[cur] = array[top]; array[top] = tmp;
}
return array;
}
Test results: http://jsperf.com/optimized-fisher-yates
I think it's fine for cases where you're not picky about distribution and you want the source code to be small.
In JavaScript (where the source is transmitted constantly), small makes a difference in bandwidth costs.
It's been four years, but I'd like to point out that the random comparator method won't be correctly distributed, no matter what sorting algorithm you use.
Proof:
For an array of n elements, there are exactly n! permutations (i.e. possible shuffles).
Every comparison during a shuffle is a choice between two sets of permutations. For a random comparator, there is a 1/2 chance of choosing each set.
Thus, for each permutation p, the chance of ending up with permutation p is a fraction with denominator 2^k (for some k), because it is a sum of such fractions (e.g. 1/8 + 1/16 = 3/16).
For n = 3, there are six equally-likely permutations. The chance of each permutation, then, is 1/6. 1/6 can't be expressed as a fraction with a power of 2 as its denominator.
Therefore, the coin flip sort will never result in a fair distribution of shuffles.
The only sizes that could possibly be correctly distributed are n=0,1,2.
As an exercise, try drawing out the decision tree of different sort algorithms for n=3.
There is a gap in the proof: If a sort algorithm depends on the consistency of the comparator, and has unbounded runtime with an inconsistent comparator, it can have an infinite sum of probabilities, which is allowed to add up to 1/6 even if every denominator in the sum is a power of 2. Try to find one.
Also, if a comparator has a fixed chance of giving either answer (e.g. (Math.random() < P)*2 - 1, for constant P), the above proof holds. If the comparator instead changes its odds based on previous answers, it may be possible to generate fair results. Finding such a comparator for a given sorting algorithm could be a research paper.
It is a hack, certainly. In practice, an infinitely looping algorithm is not likely.
If you're sorting objects, you could loop through the coords array and do something like:
for (var i = 0; i < coords.length; i++)
coords[i].sortValue = Math.random();
coords.sort(useSortValue)
function useSortValue(a, b)
{
return a.sortValue - b.sortValue;
}
(and then loop through them again to remove the sortValue)
Still a hack though. If you want to do it nicely, you have to do it the hard way :)
If you're using D3 there is a built-in shuffle function (using Fisher-Yates):
var days = ['Lundi','Mardi','Mercredi','Jeudi','Vendredi','Samedi','Dimanche'];
d3.shuffle(days);
And here is Mike going into details about it:
http://bost.ocks.org/mike/shuffle/
No, it is not correct. As other answers have noted, it will lead to a non-uniform shuffle and the quality of the shuffle will also depend on which sorting algorithm the browser uses.
Now, that might not sound too bad to you, because even if theoretically the distribution is not uniform, in practice it's probably nearly uniform, right? Well, no, not even close. The following charts show heat-maps of which indices each element gets shuffled to, in Chrome and Firefox respectively: if the pixel (i, j) is green, it means the element at index i gets shuffled to index j too often, and if it's red then it gets shuffled there too rarely.
These screenshots are taken from Mike Bostock's page on this subject.
As you can see, shuffling using a random comparator is severely biased in Chrome and even more so in Firefox. In particular, both have a lot of green along the diagonal, meaning that too many elements get "shuffled" somewhere very close to where they were in the original sequence. In comparison, a similar chart for an unbiased shuffle (e.g. using the Fisher-Yates algorithm) would be all pale yellow with just a small amount of random noise.
Here's an approach that uses a single array:
The basic logic is:
Starting with an array of n elements
Remove a random element from the array and push it onto the array
Remove a random element from the first n - 1 elements of the array and push it onto the array
Remove a random element from the first n - 2 elements of the array and push it onto the array
...
Remove the first element of the array and push it onto the array
Code:
for(i=a.length;i--;) a.push(a.splice(Math.floor(Math.random() * (i + 1)),1)[0]);
Can you use the Array.sort() function to shuffle an array – Yes.
Are the results random enough – No.
Consider the following code snippet:
/*
* The following code sample shuffles an array using Math.random() trick
* After shuffling, the new position of each item is recorded
* The process is repeated 100 times
* The result is printed out, listing each item and the number of times
* it appeared on a given position after shuffling
*/
var array = ["a", "b", "c", "d", "e"];
var stats = {};
array.forEach(function(v) {
stats[v] = Array(array.length).fill(0);
});
var i, clone;
for (i = 0; i < 100; i++) {
clone = array.slice();
clone.sort(function() {
return Math.random() - 0.5;
});
clone.forEach(function(v, i) {
stats[v][i]++;
});
}
Object.keys(stats).forEach(function(v, i) {
console.log(v + ": [" + stats[v].join(", ") + "]");
});
Sample output:
a: [29, 38, 20, 6, 7]
b: [29, 33, 22, 11, 5]
c: [17, 14, 32, 17, 20]
d: [16, 9, 17, 35, 23]
e: [ 9, 6, 9, 31, 45]
Ideally, the counts should be evenly distributed (for the above example, all counts should be around 20). But they are not. Apparently, the distribution depends on what sorting algorithm is implemented by the browser and how it iterates the array items for sorting.
There is nothing wrong with it.
The function you pass to .sort() usually looks something like
function sortingFunc( first, second )
{
// example:
return first - second ;
}
Your job in sortingFunc is to return:
a negative number if first goes before second
a positive number if first should go after second
and 0 if they are completely equal
The above sorting function puts things in order.
If you return -'s and +'s randomly as what you have, you get a random ordering.
Like in MySQL:
SELECT * from table ORDER BY rand()

Categories