Picking a random item from array with equal distribution

Picking a random item from array with equal distribution - javascript

I want to pick a random item from an array at random.
Math.floor(Math.random() * array.length);
Is the way to go, but as far as I know this will cause a Uniform distribution to occur which means that the average is (lowbound+upperbound)/2 translated to an array with 10 elements the lower bound is the first element and the upper bound is the last element causes an average of 5, which is not random
Therefore, I looked at the frequency distribution of this way of random picking an item by having 10 elements and picking one with the code above. The element represents the index and is pushed into an array. After 10000 numbers, the frequency is counted and given.
This has the following results:
Index: Frequency
0: 1083
1: 996
2: 1022
3: 966
4: 958
5: 962
6: 1044
7: 1045
8: 972
9: 952
Ofc, this is only 1 run of 10k numbers. But it shows that index 0 has a 10.8% chance and index 9 has a 9.5% chance. This difference is 1.3% which I find quite a lot.
Are there methods that can do this better? For example, get to 0.05% difference in numbers? The ideal situation would be that they are all 10% (equally distributed).

If you can precompute the result (i.e. you need a finite number of results, not an infinite stream) and the number of results is divisible by the number of items, you can get a perfect distribution:
Generate an array that repeats the items until you've got enough, i.e. [1, 2, 3, 1, 2, 3, 1, 2, 3, ...]. The array is thus guaranteed to have exactly as many instances of each item.
Shuffle the array with a fair shuffle algorithm, e.g. Fisher-Yates. The array still has exactly as many instances of each item.
If you do need an infinite stream, you could use something like an "item bag" model (which, btw, is how the blocks in Tetris are chosen):
Fill a "bag" with your items ([1, 2, 3]). Shuffle it (as above).
When you need an item, pop the first one from the shuffled bag.
If the bag is empty, re-fill it according to step 1.
The only case where this doesn't have a perfect distribution is if you stop "mid-bag".

Here another method - count number of samples already happen, select values from Categorical distribution but with probability INVERSE to the count, thus more frequent item will be less probable to be sampled next time.
Some code (in C#)
import MathNet.Numerics.Distributions;
static void Main() {
const int N = 4;
var counts = new int [N] {1, 1, 1, 1};
var weights = new double [N] {1.0, 1.0, 1.0, 1.0};
while (true) {
int v = Categorical.Sample(weights[k]); // sample one value in [0...N)
// update counts and weights
counts[v] += 1;
weights[v] = 1.0/(double)counts[v];
// use v here for something
...
}
}
Actually, any monotonically growing function of count will do, f.e.
weights[v] = 1.0/(1.0 + .5*(double)counts[v]);
might work, or
var squared => (x) => x*x;
weights[v] = 1.0/(7.0 + .25*squared((double)counts[v]));
or
weights[v] = 1.0/(3.0 + Math.Sqrt((double)counts[v]));

What you have shown in your question is simply the fact that the JavaScript random number generator is simulating not just uniformly distributed, but also independent random numbers; each chosen number behaves as though it were independent of any other choice. Because of this independence, each number "doesn't care" how often each other number was chosen, as long as with each choice, each possible outcome is as likely as any other (according to the JavaScript generator).
If you want a distribution that "feels" more uniform, you will have to adjust the chances of each outcome, so that the chances depend on previous outcomes. A previous answer showed some ways how this can be done. Here is another, which I gave as an answer to a similar question.
Give each item the same weight, specified as a positive integer. For example, give a weight of 20 to each item.
Use a weighted-choice-with-replacement algorithm. Perhaps the simplest is rejection sampling, described as follows. Assume that the highest weight is max and each weight is 0 or greater. To choose an integer in the interval [1, weights.length] using rejection sampling:
Choose a uniform random integer i in [1, weights.length].
With probability weights[i]/max, return i. Otherwise, go to step 1. (For example, if all the weights are integers greater than 0, choose a uniform random integer in [1, max] and if that number is weights[i] or less, return i, or go to step 1 otherwise.)
There are many other ways to make a weighted choice besides rejection sampling; see my note on weighted choice algorithms.
As each item is chosen, reduce its weight by 1 to make it less likely to be chosen.
If all the weights are 0, assign each item the same weight chosen in step 1 (in this example, 20).
You didn't specify the kind of application you had in mind, but I see this desire for a "more uniform" distribution come up most often in games that wish to control which random numbers appear, to make the random outcomes appear "fairer" to players. In that case, however, you should also consider whether it may be better to make an (ordinary) independent uniform random choice instead, especially if you care whether players could gain an unfair advantage by predicting the random outcomes.

Related

Generating a random number within a large range based on probabilities

I am curious how Stake.com managed to create the game "Limbo" where the odds of a multiplier happening is specific to the probability they've calculated. Here's the game : https://stake.com/casino/games/limbo
For example :
Multiplier -> x2
Probability -> 49.5% chance.
What it means is you have a 49.5% chance of winning because those are the odds that the multiplier will actually hit a number above x2.
If you set the multiplier all the way up to x1,000,000. You have a 0.00099% chance of actually hitting 1,000,000.
It's not a project I'm working on but I'm just extremely curious how we could achieve this.
Example:
Math.floor(Math.random()*1000000)
is not as random as we think, since Math.random() generates a number between 0-1. When paired with a huge multiplier like 1,000,000. We would actually generate a 6-figure number most of the time and it's not as random as we thought.
I've read that we have to convert it into a power law distribution but I'm not sure how it works. Would love to have more material to read up on how it works.

It sounds like you need to define some function that gives the probability of winning for a given multiplier N. These probabilities don't have to add up to 1, because they are not part of the same random variable; there is a unique random variable for each N chosen and two events, win or lose; we can subscript them as win(N) and lose(N). We really only need to define win(N) since lose(N) = 1 - win(N).
Something like an exponential functional would make sense here. Consider win(N) = 2^(1 - N). Then we get the following probabilities of winning:
n win(n)
1 1
2 1/2
3 1/4
4 1/8
etc
Or we could use just an inverse function: win(N) = 1/N
n win(n)
1 1
2 1/2
3 1/3
...
Then to actually see whether you win or lose for a given N, just choose a random number in some range - [0.0, 1.0) works fine for this purpose - and see whether that number is less than the win(N). If so, it's a win, of not, it's a loss.
Yes, technically speaking, it is probably true that the floating point numbers are not really uniformly distributed over [0, 1) when calling standard library functions. If you really need that level of precision then you have a much harder problem. But, for a game, regular rand() type functions should be plenty uniform for your purposes.

I have an array of objects. I want to randomly output objects whose sum of values is equal to eight

const obj =
[
{noir: 1},
{croch: 0.5},
{doubleCroch: 1},
{triollet: 1.5},
{quatreDouble: 2}
];
desired output example:
2, 2, 0.5, 0.5, 1, 2
for (var i = 0 ; i < obj.length ; i++ ){
var randomItem = obj[Math.floor(Math.random()*obj.length )];
}
That brings me five random items. But I want to have elements whose sum of values is equal to eight.
It looks like a condition inside the random

I think you have a couple options here:
1. The foolproof approach
You calculate and collect all the possible combinations that sum up to 8. Then you pick one combination at random. This might be computationally expensive, especially since you seem to allow for duplicates. Not trivial to implement.
2. The happy-go-lucky approach
You keep picking elements at random from your object as you're doing and keep track of the sum. If the sum hits 8, you have a winning combination. If it surpasses 8, you start over from scratch. A lot easier to implement, but might theoretically run infinitely.
3. The over-engineered approach
You start out the same as in the previous approach, but if the sum surpasses 8, you discard the last selected element and pick another one. This would require you to keep track of all the elements you've already tried for every picked element because you might be required to backtrack further than the last selected element. The would be the hardest to implement.

Logic to find the nearest possible sum of array values to target

This is a simpler version of knapsack, which I am having trouble wrapping my head around.
In my version I don't care how valuable the items are. I just want to get as close to the weight capacity as possible, and order doesn't matter because I'm doing it multiple times and shuffling in between.
So to be clear:
I have an array of values like: weights = [{44, 52, 100, 33, 33, 22, 25, 4, 6, 77, 88, 45}] and a capacity of, for example:capacity: 204
I want the closest combination of array values to that capacity number without repeating any, I'm not super great at math, and the wikipedia article has completely lost me.
Can someone explain how to get this?

Naive approach: cycle through all subsets of N numbers, and check the sum of weights. Running time is O(2^N*N)
You can try dynamic programming.
The problem can be divided into 2 subproblems, to check whether the sum of set is equal to or less than the capacity.
1) Include the current element in subset, and recur for the remaining items with remaining sum.
2) Exclude the current element from the subset, recur remaining items.
The base case of the recursion would be when no items are left. Finally, we output the items included in the subset.
Running time is O(n*capacity) in O(n)

Efficient sparse mapping from integers to integers

I'm implementing a purpose-built regular expression engine using finite automata. I will have to store thousands of states, each state with its own transition table from unicode code points (or UTF-16 code units; I haven't decided) to state IDs.
In many cases, the table will be extremely sparse, but in other cases it will be nearly full. In most cases, most of the entries will fall into several contiguous ranges with the same value.
The simplest implementation would be a lookup table, but each such table would take up a great deal of space. A list of (range, value) pairs would be much smaller, but slower. A binary search tree would be faster than a list.
Is there a better approach, perhaps leveraging built-in functionality?

Unfortunately, JavaScript's built-in data-types - especially Map - are not of great help in accomplishing this task, as they lack the relevant methods.
In most cases, most of the entries will fall into several contiguous
ranges with the same value.
We can however exploit this and use a binary search strategy on sorted arrays, assuming the transition tables won't be modified often.
Encode contiguous input ranges leading to the same state by storing each input range's lowest value in a sorted array. Keep the states at corresponding indices in a separate array:
let inputs = [0, 5, 10]; // Input ranges [0,4], [5,9], [10,∞)
let states = [0, 1, 0 ]; // Inputs [0,4] lead to state 0, [5,9] to 1, [10,∞) to 0
Now, given an input, you need to perform a binary search on the inputs array similar to Java's floorEntry(k):
// Returns the index of the greatest element less than or equal to
// the given element, or undefined if there is no such element:
function floorIndex(sorted, element) {
let low = 0;
let high = sorted.length - 1;
while (low <= high) {
let mid = low + high >> 1;
if (sorted[mid] > element) {
high = mid - 1;
} else if (sorted[mid] < element) {
low = mid + 1;
} else {
return mid
}
}
return low - 1;
}
// Example: Transition to 1 for emoticons in range 1F600 - 1F64F:
let transitions = {
inputs: [0x00000, 0x1F600, 0x1F650],
states: [0, 1, 0 ]
};
let input = 0x1F60B; // 😋
let next = transitions.states[floorIndex(transitions.inputs, input)];
console.log(`transition to ${next}`);
This search completes in O(log n) steps where n is the number of contiguous input ranges. The transition table for a single state then has a space requirement of O(n). This approach works equally well for sparse and dense transition tables as long as our initial assumption - the number of contiguous input ranges leading to the same state is small - holds.

Sounds like you have two very different cases ("in many cases, the table will be extremely sparse, but in other cases it will be nearly full").
For the sparse case, you could possibly have a separate sparse index (or several layers of indexes), then your actual data could be stored in a typed array. Because the index(es) would be mapping from integers to integers, they could be represented as typed arrays as well.
Looking up a value would look like this:
Binary search the index. The index stores pairs as consecutive entries in the typed array – the first element is the search value, the second is the position in the data set (or the next index).
If you have multiple indexes, repeat 1 as necessary.
Start iterating your dataset at the position given by the last index. Because the index is sparse, this position might not be the one where the value is stored, but it is a good starting point as the correct value is guaranteed to be nearby.
The dataset itself is represented as a typed array where consecutive pairs hold the key and the value.
I cannot think of anything better to use in JavaScript. Typed arrays are pretty fast and having indexes should increase the speed drastically. That being said, if you only have a couple thousand entries, don't bother with indexes, do a binary search directly on the typed array (described in 4. above).
For the dense case, I am not sure. If the dense case happens to be a case where repeated values across ranges of keys are likely, consider using something like run-length encoding – identical consecutive values are represented simply as their number of occurrences and then the actual value. Once again, use typed arrays and binary search, possibly even indexes to make this faster.

What are the chances that JavaScript Math.Random() will create the same number twice in a row?

Is this correct? using - http://en.wikipedia.org/wiki/Binomial_probability
Looks like values are from .0000000000000000 to .9999999999999999
Probability of happening twice = p^2 = (1/9999999999999999)^2 = 1.0 e-32
I think I am missing something here?
Also, how does being a pseudo random number generator change this calculation?
Thank You.

In an ideal world Math.random() would be absolutely random, with one output being completely independent from another, which (assuming p=the probability of any given number being produced) results in a probably of p^2 for any value being repeated immediately after another (as others have already said).
In practice people want Math.random to be fast which means pseudo-random number generators are used by the engines. There are many different kinds of PRNG but the most basic is a linear congruential generator, which is basically a function along the lines of:
s(n + 1) = some_prime * s(n) + some_value mod some_other_prime
If such a generator is used then you won't see a value repeated until you've called random() some_other_prime times. You're guaranteed of that.
Relatively recently however it's become apparent that this kind of behaviour (coupled with seeding the PRNGs with the current time) could be used for some forms tracking have led to browsers doing a number of things that mean you can't assume anything about subsequent random() calls.

I think the probability of getting two numbers in a row is 1 divided by the range of the generator, assuming that it has a good distribution.
The reason for this is that the first number can be anything, and the second number needs to just be that number again, which means we don't care about the first number at all. The probability of getting the same number twice in a row is the same as the probability of getting any particular number once.
Getting some particular number twice in a row, e.g. two 0.5s in a row, would be p^2; however, if you just care about any number twice in a row, it's just p.

If the numbers were truly random, you'd expect them, indeed, to appear with probability 1/p, so twice that would be 1/p^2.
The value for p is not exactly the one you have though, because the numbers are being represented internally as binary. Figure out how many bits of mantissa the numbers have in javascript and use that for your combinatoric count.
The "pseudorandom" part is more interesting, because the properties of pseudorandom number generators vary. Knuth does some lovely work with that in Seminumerical Algorithms, but basically most usual PN generators have at least some spectral distributiuon. Cryptograp0hic PN generators are generally stronger.
Update: The amount of time shouldn't be significant. Whether it's a millisecond or a year, as long as you don't update the state The probabilities will stay the same.

The probability that you would get 2 given numbers is (1/p)^2, but the probability that you get 2 of same numbers (any) is 1/p. That is because the first number can be anything, and the second just needs to match that.

You can kind of find out, just let it run a few days :)
var last = 0.1;
var count = 0 | 0;
function rand(){
++count;
var num = Math.random();
if(num === last){
console.log('count: '+count+' num: '+num);
}
last = num;
}
while(true) rand();

We Keep Coding

JavaScript is the programming language of the Web.