Best ways to get random items from an array in javascript - javascript

Considering the performance, what's the best way to get random subset from an array?
Say we get an array with 90000 items, I wanna get 10000 random items from it.
One approach I'm thinking about is to get a random index from 0 to array.length and then remove the selected one from the original array by using Array.prototype.splice. Then get the next random item from the rest.
But the splice method will rearrange the index of all the items after the one we just selected and move them forward on step. Doesn't it affect the performance?
Items may duplicates, but what we select should not. Say we've selected index 0, then we should only look up the rest 1~89999.

If you want a subset of the shuffled array, you do not need to shuffle the whole array. You can stop the classic fisher-yates shuffle when you have drawn your 10000 items, leaving the other 80000 indices untouched.

I would first randomize the whole array then splice of a 10000 items.
How to randomize (shuffle) a JavaScript array?
Explains a good way to randomize a array in javascript

A reservoir sampling algorithm can do this.
Here's an attempt at implementing Knuth's "Algorithm S" from TAOCP Volume 2 Section 3.4.2:
function sample(source, size) {
var chosen = 0,
srcLen = source.length,
result = new Array(size);
for (var seen = 0; chosen < size; seen++) {
var remainingInput = srcLen - seen,
remainingOutput = size - chosen;
if (remainingInput*Math.random() < remainingOutput) {
result[chosen++] = source[seen];
}
}
return result;
}
Basically it makes one pass over the input array, choosing or skipping items based on a function of a random number, the number of items remaining in the input, and the number of items remaining to be required in the output.
There are three potential problems with this code: 1. I may have mucked it up, 2. Knuth calls for a random number "between zero and one" and I'm not sure if this means the [0, 1) interval JavaScript provides or the fully closed or fully open interval, 3. it's vulnerable to PRNG bias.
The performance characteristics should be very good. It's O(srcLen). Most of the time we finish before going through the entire input. The input is accessed in order, which is a good thing if you are running your code on a computer that has a cache. We don't even waste any time reading or writing elements that don't ultimately end up in the output.
This version doesn't modify the input array. It is possible to write an in-place version, which might save some memory, but it probably wouldn't be much faster.

Related

undertanding javascript pagination math problem

I am trying to understand how to approach math problems such as the following excerpt, which was demonstrated in a pagination section of a tutorial I was following.
const renderResults = (arrayOfItems, pageNum = 1, resultsPerPage = 10) => {
const start = (pageNum - 1) * resultsPerPage;
const end = pageNum * resultsPerPage;
arrayOfItems.splice(start, end).forEach(renderToScreenFunction);
};
In the tutorial this solution was just typed out and not explained, which got me thinking, had I not seen the solution, I would not have been able to think of it in such a way.
I understood the goal of the problem, and how splice works to break the array into parts. But it was not obvious to me how to obtain the start and end values for using the splice method on an array of of indefinite length. How should have I gone about thinking to solve this problem?
Please understand, I am learning programming in my spare time and what might seem simple to most, I have always been afraid and struggle with math and I am posting this question in hopes to get better.
I would really appreciate if anyone could explain how does one go about solving such problems in theory. And what area of mathematics/programming should I study to get better at such problems. Any pointers would be a huge help. Many thanks.
OK, so what you're starting with is
a list of things to display that's, well, it's as long as it is.
a page number, such that the first page is page 1
a page size (number of items per page)
So to know which elements in the list to show, you need to think about what the page number and page size say about how many elements you have to skip. If you're on page 1, you don't need to skip any elements. What if you're on page 5?
Well, the first page skips nothing. The second page will have to skip the number of elements per page. The third page will have to skip twice the number of elements per page, and so on. We can generalize that and see that for page p, you need to skip p - 1 times the number of elements per page. Thus for page 5 you need to skip 4 times the number of elements per page.
To show that page after skipping over the previous pages is easy: just show the next elements-per-page elements.
Note that there are two details that the code you posted does not appear to address. These details are:
What if the actual length of the list is not evenly divisible by the page size?
What if a page far beyond the actual length of the list is requested?
For the first detail, you just need to test for that situation after you've figured out how far to skip forward.
Your function has an error, in the Splice method
arrayOfItems.splice(start, end).forEach(renderToScreenFunction);
The second argument must be the length to extract, not the final
index. You don't need to calculate the end index, but use the
resultsPerPage instead.
I've rewrite the code without errors, removing the function wrapper for better understanding, and adding some comments...
// set the initial variables
const arrayOfItems =['a','b','c','d','e','f','g','h','i','j','k','l','m'];
const pageNum = 2;
const resultsPerPage = 5;
// calculate start index
const start = (pageNum - 1) * resultsPerPage; // (2-1)*5=5
// generate a new array with elements from arrayOfItems from index 5 to 10
const itemsToShow = arrayOfItems.splice(start, resultsPerPage) ;
// done! output the results iterating the resulting array
itemsToShow.forEach( x=> console.log(x) )
Code explanation :
Sets the initial parameters
Calculate the start index of the array, corresponding to the page you try to get. ( (pageNum - 1) * resultsPerPage )
Generates a new array, extracting resultsPerPage items from arrayOfItems , starting in the start index (empty array is returned if the page does not exist)
Iterates the generated array (itemsToShow) to output the results.
The best way to understand code, is sometimes try to run it and observe the behavior and results.

Need to write an algorithm for getting sum of values from Array 1 values for each Array 2 value

I am creating a algorithm to match any combination of cells of first array to second array value with priority in second array. for example in javascript :
var arr=[10,20,30,40,50,60,70,80,90];
var arr2=[100,120,140];
what I want is to define into following logic(priority for value of second array's cell serially) automatically and please help me finding pseudo for algorithm
100 = 10+20+30+40 //arr2[0] = arr1[0] + arr1[1] + arr1[2] + arr1[3]
120 = 50+70 //arr2[1] = arr1[4] + arr1[6]
140 = 60+80 //arr2[2] = arr1[5] + arr1[7]
90 = 90 //remaining arr1[8]
values are demo and can be changed dynamically.
Solution is possible if you take both array as sorted array and then start adding elements from last ends of first array (array1) which are the greatest as array is sorted , now check if sum matches then proceed else if sum is lesser than element in array2 you were checking then you need to add third element from array1. Another case if sum is greater than element in array2 then you have to neglect one of the element from array1 you have used in addition and replace the addition with the previous element you HV used from array one. Repeat the steps. You need to think how to do this correctly or else you need to share some of your work or logic u r thinking , so that we can help
As the matter is quite complex, over and above sufficing on a pseudo code style explanation, I have also coded a practical implementation that you may find at this link.
I advise you to refrain from looking at the solution and first try to implement the algorithm yourself as there is a lot of scope for further improvement.
Here is in broad lines an explanation to the way I have decided to tackle the algorithm:
The problem presented by the OP is related to a classic example of distributing n unique elements over k unique boxes.
In this case here, arr has 9 unique elements that need to be distributed over three distinct spots, represented by the container: arr2.
So the first step in tackling this problem is to figure out how you can implement a function that given n and k, is able to calculate all the possible distributions that apply.
The closest that I could come up with was the Stirling Numbers of the Second Kind, which is defined as:
The number of ways of partitioning a set of n elements into m nonempty sets (i.e., m set blocks), also called a Stirling set number. For example, the set {1,2,3} can be partitioned into three subsets in one way: {{1},{2},{3}}; into two subsets in three ways: {{1,2},{3}}, {{1,3},{2}}, and {{1},{2,3}}; and into one subset in one way: {{1,2,3}}.
If you pay close attention to the example provided, you will realize that it pertains to the enumeration of all the distribution combinations possible over INDISTINGUISHABLE partitions as order doesn't matter.
Since in our case, each spot in the container arr2 represents a UNIQUE spot and order therefore does matter, we will thus be required to enumerate all the Stirling Combinations over every possible combination of arr2.
Practically speaking, this means that for our example where arr2.length === 3, we will be required to apply all of the Stirling Combinations obtained to [100,120,140], [120,140,100], [140,100,120] etc.(in total 6 permutations)
The main challenging part here is to implement the Stirling Function, but luckily somebody has already done so:
http://blogs.msdn.com/b/oldnewthing/archive/2014/03/24/10510315.aspx
After copy and pasting the Stirling Function and using it to distribute arr over 3 unique spots, you now need to filter out the distributions that don't sum up to the designated spots encompassed by arr2.
This will then leave you with all the possible solutions that apply. In your case, for
var arr=[10,20,30,40,50,60,70,80,90];
var arr2=[100,120,140];
no solutions apply at all.
A quick workaround to that is by expanding the distribution target arr2 from [100,120,140] to [100,120,140,90]. A better workaround is that in the case zero solutions are found, then take away one element from list arr until you obtain a solution. Then you can later on expand your solution sets by including this element where it represents a mapping of it unto itself.

Random Selections from Arrays

I am creating a game in Unity. I'm in the planning stage of it right now, but I'm trying to work out a problem I've come to. The game involves randomly selected objects from three different categories falling and the player has to catch the particular objects in particular bins.
So here's what needs to happen:
One or two of the arrays must be randomly chosen, one or two of the objects within that particular array must be chosen, no more than four objects can fall at once, the different objects must fall from different places and fall at different times.
Now I have a clip of code that I got from another project I did that's written in JavaScript (which is what I've been using, but I could also do it in Boo or C++) that solves part of the last point. It chooses a random location along the x access and then has the object fall until y=0, and then it resets.
function Update()
{
transform.position.y -= 50 * Time.deltaTime;
if(transform.position.y < 0)
{
transform.position.y = 50;
transform.position.x = Random.Range(0,60);
transform.position.z = -16;
}
}
I'm going to rewrite part of it to say that it will reset after it hits a particular collider, yields for a short time period, and find then a new random and drop that instead. But what I'm having problems with is the actual randomizing of the objects. I have six objects in each of the three arrays, and I've looked for codes where something is chosen from an array by numerical value, but nothing about randomly choosing one of the arrays and then choosing something within the random array. Neither have I found anything about the random selection in JavaScript, Boo, or C++.
Any information on this code would be helpful, thanks in advance!
To select one object at random from one of three arrays at random, you better work with an array of array. You then will need to generate two random numbers and store them as indexes to the array of arrays.
so instead of three different arrays, initialize a single array
var a = [];
a.push([1,2,3]);
a.push([10,20]);
a.push([100,200,300,400]);
and then
var i = Math.floor(Math.random()*a.length);
var j = Math.floor(Math.random()*a[i].length);
var o = a[i][j];

why to use sorting maps on arrays. how is it better in some instances

I'm trying to learn about array sorting. It seems pretty straightforward. But on the mozilla site, I ran across a section discussing sorting maps (about three-quarters down the page).
The compareFunction can be invoked multiple times per element within
the array. Depending on the compareFunction's nature, this may yield a
high overhead. The more work a compareFunction does and the more
elements there are to sort, the wiser it may be to consider using a
map for sorting.
The example given is this:
// the array to be sorted
var list = ["Delta", "alpha", "CHARLIE", "bravo"];
// temporary holder of position and sort-value
var map = [];
// container for the resulting order
var result = [];
// walk original array to map values and positions
for (var i=0, length = list.length; i < length; i++) {
map.push({
// remember the index within the original array
index: i,
// evaluate the value to sort
value: list[i].toLowerCase()
});
}
// sorting the map containing the reduced values
map.sort(function(a, b) {
return a.value > b.value ? 1 : -1;
});
// copy values in right order
for (var i=0, length = map.length; i < length; i++) {
result.push(list[map[i].index]);
}
// print sorted list
print(result);
I don't understand a couple of things. To wit: What does it mean, "The compareFunction can be invoked multiple times per element within the array"? Can someone show me an example of that. Secondly, I understand what's being done in the example, but I don't understand the potential "high[er] overhead" of the compareFunction. The example shown here seems really straightforward and mapping the array into an object, sorting its value, then putting it back into an array would take much more overhead I'd think at first glance. I understand this is a simple example, and probably not intended for anything else than to show the procedure. But can someone give an example of when it would be lower overhead to map like this? It seems like a lot more work.
Thanks!
When sorting a list, an item isn't just compared to one other item, it may need to be compared to several other items. Some of the items may even have to be compared to all other items.
Let's see how many comparisons there actually are when sorting an array:
var list = ["Delta", "alpha", "CHARLIE", "bravo", "orch", "worm", "tower"];
var o = [];
for (var i = 0; i < list.length; i++) {
o.push({
value: list[i],
cnt: 0
});
}
o.sort(function(x, y){
x.cnt++;
y.cnt++;
return x.value == y.value ? 0 : x.value < y.value ? -1 : 1;
});
console.log(o);
Result:
[
{ value="CHARLIE", cnt=3},
{ value="Delta", cnt=3},
{ value="alpha", cnt=4},
{ value="bravo", cnt=3},
{ value="orch", cnt=3},
{ value="tower", cnt=7},
{ value="worm", cnt=3}
]
(Fiddle: http://jsfiddle.net/Guffa/hC6rV/)
As you see, each item was compared to seveal other items. The string "tower" even had more comparisons than there are other strings, which means that it was compared to at least one other string at least twice.
If the comparison needs some calculation before the values can be compared (like the toLowerCase method in the example), then that calculation will be done several times. By caching the values after that calculation, it will be done only once for each item.
The primary time saving in that example is gotten by avoiding calls to toLowerCase() in the comparison function. The comparison function is called by the sort code each time a pair of elements needs to be compared, so that's a savings of a lot of function calls. The cost of building and un-building the map is worth it for large arrays.
That the comparison function may be called more than once per element is a natural implication of how sorting works. If only one comparison per element were necessary, it would be a linear-time process.
edit — the number of comparisons that'll be made will be roughly proportional to the length of the array times the base-2 log of the length. For a 1000 element array, then, that's proportional to 10,000 comparisons (probably closer to 15,000, depending on the actual sort algorithm). Saving 20,000 unnecessary function calls is worth the 2000 operations necessary to build and un-build the sort map.
This is called the “decorate - sort - undecorate” pattern (you can find a nice explanation on Wikipedia).
The idea is that a comparison based sort will have to call the comparison function at least n times (where n is the number of item in the list) as this is the number of comparison you need just to check that the array is already sorted. Usually, the number of comparison will be larger than that (O(n ln n) if you are using a good algorithm), and according to the pingeonhole principle, there is at least one value that will be passed twice to the comparison function.
If your comparison function does some expensive processing before comparing the two values, then you can reduce the cost by first doing the expensive part and storing the result for each values (since you know that even in the best scenario you'll have to do that processing). Then, when sorting, you use a cheaper comparison function that only compare those cached outputs.
In this example, the "expensive" part is converting the string to lowercase.
Think of this like caching. It's simply saying that you should not do lots of calculation in the compare function, because you will be calculating the same value over and over.
What does it mean, "The compareFunction can be invoked multiple times per element within the array"?
It means exactly what it says. Lets you have three items, A, B and C. They need to be sorted by the result of compare function. The comparisons might be done like this:
compare(A) to compare(B)
compare(A) to compare(C)
compare(B) to compare(C)
So here, we have 3 values, but the compare() function was executed 6 times. Using a temporary array to cache things ensures we do a calculation only once per item, and can compare those results.
Secondly, I understand what's being done in the example, but I don't understand the potential "high[er] overhead" of the compareFunction.
What if compare() does a database fetch (comparing the counts of matching rows)? Or a complex math calculation (factorial, recursive fibbinocci, or iteration over a large number of items) These sorts of things you don't want to do more than once.
I would say most of the time, it's fine to leave really simple/fast calculations inline. Don't over optimize. But if you need to anything complex or slow in the comparison, you have to be smarter about it.
To respond to your first question, why would the compareFunction be called multiple times per element in the array?
Sorting an array almost always requires more than N passes, where N is the size of the array (unless the array is already sorted). Thus, for every element in your array, it may be compared to another element in your array up to N times (bubble sort requires at most N^2 comparisons). The compareFunction you provide will be used every time to determine whether two elements are less/equal/greater and thus will be called multiple times per element in the array.
A simple response for you second question, why would there be potentially higher overhead for a compareFunction?
Say your compareFunction does a lot of unnecessary work while comparing two elements of the array. This can cause sort to be slower, and thus using a compareFunction could potentially cause higher overhead.

Maximum size of an Array in Javascript

Context: I'm building a little site that reads an rss feed, and updates/checks the feed in the background. I have one array to store data to display, and another which stores ID's of records that have been shown.
Question: How many items can an array hold in Javascript before things start getting slow, or sluggish. I'm not sorting the array, but am using jQuery's inArray function to do a comparison.
The website will be left running, and updating and its unlikely that the browser will be restarted / refreshed that often.
If I should think about clearing some records from the array, what is the best way to remove some records after a limit, like 100 items.
The maximum length until "it gets sluggish" is totally dependent on your target machine and your actual code, so you'll need to test on that (those) platform(s) to see what is acceptable.
However, the maximum length of an array according to the ECMA-262 5th Edition specification is bound by an unsigned 32-bit integer due to the ToUint32 abstract operation, so the longest possible array could have 232-1 = 4,294,967,295 = 4.29 billion elements.
No need to trim the array, simply address it as a circular buffer (index % maxlen). This will ensure it never goes over the limit (implementing a circular buffer means that once you get to the end you wrap around to the beginning again - not possible to overrun the end of the array).
For example:
var container = new Array ();
var maxlen = 100;
var index = 0;
// 'store' 1538 items (only the last 'maxlen' items are kept)
for (var i=0; i<1538; i++) {
container [index++ % maxlen] = "storing" + i;
}
// get element at index 11 (you want the 11th item in the array)
eleventh = container [(index + 11) % maxlen];
// get element at index 11 (you want the 11th item in the array)
thirtyfifth = container [(index + 35) % maxlen];
// print out all 100 elements that we have left in the array, note
// that it doesn't matter if we address past 100 - circular buffer
// so we'll simply get back to the beginning if we do that.
for (i=0; i<200; i++) {
document.write (container[(index + i) % maxlen] + "<br>\n");
}
Like #maerics said, your target machine and browser will determine performance.
But for some real world numbers, on my 2017 enterprise Chromebook, running the operation:
console.time();
Array(x).fill(0).filter(x => x < 6).length
console.timeEnd();
x=5e4 takes 16ms, good enough for 60fps
x=4e6 takes 250ms, which is noticeable but not a big deal
x=3e7 takes 1300ms, which is pretty bad
x=4e7 takes 11000ms and allocates an extra 2.5GB of memory
So around 30 million elements is a hard upper limit, because the javascript VM falls off a cliff at 40 million elements and will probably crash the process.
EDIT: In the code above, I'm actually filling the array with elements and looping over them, simulating the minimum of what an app might want to do with an array. If you just run Array(2**32-1) you're creating a sparse array that's closer to an empty JavaScript object with a length, like {length: 4294967295}. If you actually tried to use all those 4 billion elements, you'll definitely crash the javascript process.
You could try something like this to test and trim the length:
http://jsfiddle.net/orolo/wJDXL/
var longArray = [1, 2, 3, 4, 5, 6, 7, 8];
if (longArray.length >= 6) {
longArray.length = 3;
}
alert(longArray); //1, 2, 3
I have built a performance framework that manipulates and graphs millions of datasets, and even then, the javascript calculation latency was on order of tens of milliseconds. Unless you're worried about going over the array size limit, I don't think you have much to worry about.
It will be very browser dependant. 100 items doesn't sound like a large number - I expect you could go a lot higher than that. Thousands shouldn't be a problem. What may be a problem is the total memory consumption.
I have shamelessly pulled some pretty big datasets in memory, and altough it did get sluggish it took maybe 15 Mo of data upwards with pretty intense calculations on the dataset. I doubt you will run into problems with memory unless you have intense calculations on the data and many many rows. Profiling and benchmarking with different mock resultsets will be your best bet to evaluate performance.

Categories