Binary Search in JS: trying to find a consistent mental model

Binary Search in JS: trying to find a consistent mental model - javascript

I am grinding LeetCode these days and I encountered the challenge 162. Find Peak Element:
A peak element is an element that is strictly greater than its neighbors.
Given an integer array nums, find a peak element, and return its index. If the array contains multiple peaks, return the index to any of the peaks.
You may imagine that nums[-1] = nums[n] = -∞.
You must write an algorithm that runs in O(log n) time.
Constraints:
1 <= nums.length <= 1000
-231 <= nums[i] <= 231 - 1
nums[i] != nums[i + 1] for all valid i
This question is about using binary search to find a peak element in an array.
I know we can think of the array as alternating ascending and descending sequences. Here is my solution
var findPeakElement = function(nums) {
if(nums.length <= 1) return 0
let left = 0, right = nums.length - 1
while(left <= right) {
const mid = left + right >>> 1
if(nums[mid] > nums[mid + 1]) {
right = mid - 1
} else {
left = mid + 1
}
}
return left === nums.length ? left - 1 : left
};
If the nums[mid] is bigger than the next element in the array that we know we are in the descending sub array and the peak element must be lying in the left, and vice versa if then nums[mid] is smaller than the next element. So far so good. But what confused me is which index I should return eventually - left or right? To figure this out I need to go through a bunch of trial and error.
And if I slightly tweek the question to find the valley element e.g. [1, 3, 20, 4, 1, 0]'s valley elements should be 0. While I can reason about how we narrow the window but I still cannot seem to figure out which index I should return at the end of the binary search.
Here is my attempt for returning the valley element in the array by mirroring what I did for findPeakElement
var findValleyElement = function (nums) {
if (nums.length <= 1) return 0
let left = 0,
right = nums.length - 1
while (left <= right) {
const mid = (left + right) >>> 1
if (nums[mid] > nums[mid + 1]) {
left = mid + 1
} else {
right = mid - 1
}
}
return right
}
But this time I cannot use right as the returned index. I need to use left instead. I cannot seem to think of a consistent way of thinking through this without going through a bunch of examples, which is really not ideal since you still might miss some edge cases.
So my question is, is there some consistent mental model we can adopt when thinking about these binary search problems, specifically which index we should return to satisfy the requirements.

When the following condition is true:
if(nums[mid] > nums[mid + 1]) {
...then it could be that mid is a solution, maybe even the only one. So that means you shouldn't exclude it from the range, yet with right = mid - 1 you do exclude it. You should set right = mid. To then avoid a potentially endless loop, the loop condition should be left < right. This will ensure the loop will always end: the range is guaranteed to become smaller in each iteration*
* Let's for instance assume left == right + 1 at a certain moment. Then mid will become equal to left (since the odd bit in the sum is dropped with >>>). Now either we do right = mid or we do left = mid + 1. In either case we get that left == right. In all other cases where left < right, we get a mid that is strictly between those two limits, and then surely the range will become smaller.
Once the loop exits, left has become equal to right. The only possible index in that range (of 1) is that index.
There is now no more need to check whether left is nums.length, as this cannot happen: with our chosen while condition, left can never become greater than right, ... only equal to it. And since right is a valid index, no such out-of-range check is needed.
Also the case of array size 1 does not need special treatment now.
So:
var findPeakElement = function(nums) {
let left = 0,
right = nums.length - 1;
while (left < right) {
const mid = (left + right) >>> 1;
if (nums[mid] > nums[mid + 1]) {
right = mid;
} else {
left = mid + 1;
}
}
return left;
};
Valleys instead of Peaks
Here is my attempt for returning the valley element
If you want to find the valley element, it will not always work unless the following assumption in the question is changed from this:
You may imagine that nums[-1] = nums[n] = -∞
...to this:
You may imagine that nums[-1] = nums[n] = ∞
Once you have that agreed upon, you only have to change the comparison in the above code block from nums[mid] > nums[mid + 1] to nums[mid] < nums[mid + 1].

Since the array is capped at 1000 elements, a simple scan is constant time. If we're imagining that n (size of array) and k (values limited to range -k to +k) can grow, then use trincot's answer, modifying the initial selection of right to be capped at min(2k, n) since the max increasing streak is of size 2k+1.
def f(arr)
0.upto(998) do |i|
return arr[i] if arr[i+1] < arr[i]
end
return arr[999] # if we reach this, arr[998] < arr[999] and arr[1000] is -infinity
end

A peak is defined as any element whose neighbours are both less than the element. In the example below, there are are two peak elements, 5 and 4 -
5,
4, 4,
3, 3, 3,
2, 2, 2, 2 ]
[ 1, 1,
So we can take three elements off the input, a, b, and c and -
if any a, b, or c is null, a valid comparison cannot be made and therefore there is no peak. stop the program
otherwise if a < b and b > c, a peak has been found. output the peak
finally drop a, and recur on the same input to search for additional peaks
That would look something like this -
function* peaks ([ a, b, c, ...more ]) {
if (a == null || b == null || c == null) return // 1
if (a < b && b > c) yield b // 2
yield *peaks([ b, c, ...more ]) // 3
}
for (const p of peaks([1,2,1,3,4,5,4,2,1,5,6,7,4]))
console.log("found peak", p)
found peak 2
found peak 5
found peak 7
If you have a significantly large input, which I'm sure LeetCode will give you, handling arrays like this will create an enormous amount of wasteful intermediate values. A better approach would be to use an index, i -
function* peaks (t, i = 0) {
let a = t[i], b = t[i + 1], c = t[i + 2]
if (a == null || b == null || c == null) return // 1
if (a < b && b > c) yield b // 2
yield *peaks(t, i + 1) // 3
}
for (const p of peaks([1,2,1,3,4,5,4,2,1,5,6,7,4]))
console.log("found peak", p)
found peak 2
found peak 5
found peak 7
And finally the use of recursion will limit the size of input that this program can handle. We can use a for loop to avoid any recursion limits -
function* peaks (t) {
let a, b, c
for (let i = 0; i<t.length; i++) {
a = t[i], b = t[i + 1], c = t[i + 2]
if (a == null || b == null || c == null) return // 1
if (a < b && b > c) yield b // 2
}
}
for (const p of peaks([1,2,1,3,4,5,4,2,1,5,6,7,4]))
console.log("found peak", p)
found peak 2
found peak 5
found peak 7
In the last two example we perform three array lookups per step, t[i], t[i + 1], and t[i + 2]. As an optimization we can reduce this to just a single lookup -
function* peaks (t) {
let a, b, c
for (let i = 0; i<t.length; i++) {
a = b, b = c, c = t[i]
if (a == null || b == null || c == null) continue
if (a < b && b > c) yield b
}
}
for (const p of peaks([1,2,1,3,4,5,4,2,1,5,6,7,4]))
console.log("found peak", p)
found peak 2
found peak 5
found peak 7
This works because our program effectively shifts the elements through a, b, and c in a "leftward" direction. Note the peaks in the b column -
a
b
c
...
null
null
1
2,1,3,4,5,4,2,1,5,6,7,4
null
1
2
1,3,4,5,4,2,1,5,6,7,4
1
2 (peak)
1
3,4,5,4,2,1,5,6,7,4
2
1
3
4,5,4,2,1,5,6,7,4
1
3
4
5,4,2,1,5,6,7,4
3
4
5
4,2,1,5,6,7,4
4
5 (peak)
4
2,1,5,6,7,4
5
4
2
1,5,6,7,4
4
2
1
5,6,7,4
2
1
5
6,7,4
1
5
6
7,4
5
6
7
4
6
7 (peak)
4
With our optimised program, we can drop several other unnecessary actions. Index i is no longer needed and we can skip having to worry about off-by-one errors caused by i++ and comparisons of i<t.length. Additionally, we can skip the c == null check as c will always represent an element of the input array -
function* peaks (t) {
let a, b, c
for (const v of t) {
a = b, b = c, c = v
if (a == null || b == null) continue
if (a < b && b > c) yield b
}
}
for (const p of peaks([1,2,1,3,4,5,4,2,1,5,6,7,4]))
console.log("found peak", p)
found peak 2
found peak 5
found peak 7
If you want to collect all peaks, you can use Array.from to convert any iterable into an array -
const allPeaks = peaks([1,2,1,3,4,5,4,2,1,5,6,7,4])
console.log(allPeaks)
[2, 5, 7]
Generators are a good fit for this kind of problem because they can be paused/canceled at any time, ie after the first peak is found -
const firstPeak (t) {
for (const p of peaks(t))
return p // <- immediately stops `peaks`
}
firstPeak([1,2,1,3,4,5,4,2,1,5,6,7,4])
2
If however you want to write firstPeaks without using a generator, there's nothing stopping you from doing so. Instead of using yield you can simply return -
function firstPeak (t) {
let a, b, c
for (const v of t) {
a = b, b = c, c = v
if (a == null || b == null) continue
if (a < b && b > c) return b // <-
}
}
console.log("first peak", firstPeak([1,2,1,3,4,5,4,2,1,5,6,7,4]))
first peak 2

Related

Hackerank "Find the Factor" (in javascript) is failing because of time limit?

I had to find all the factors of positive number that evenly divide into a number and then return a p^th element of the list, sorted ascending.
If there is no p^th return 0.
I tested almost all the answers i could solve and found online:
Example:
function pthFactor(n, k) {
let arr = [];
for (let i = 1; i <= n; i++) {
if (n % i === 0) {
arr.push(i);
}
if (arr.length === k) {
return arr[arr.length - 1];
}
}
if (arr.length !== k) {
return 1;
}
};
or
var kthFactor = function(n, k) {
let factors = [1]
for(let i=2;i<=Math.floor(n/2);i++){
if(n%i == 0) factors.push(i)
}
factors.push(n)
return factors.length < k?-1:factors[k-1]
};
buts its failing 10 sec time limit.
What i am doing wrong ?
By the way i also tried Math.sqrt etc in order not to loop n times. Didn't work as well.
Do i need to know more than for loop ? Like dynamic programming etc to solve this ?

I couldn't find this challenge on HackerRank, but the 1492. The kth Factor of n challenge on LeetCode seems to be the same as you describe:
Given two positive integers n and k.
A factor of an integer n is defined as an integer i where n % i == 0.
Consider a list of all factors of n sorted in ascending order, return the kth factor in this list or return -1 if n has less than k factors.
It is strange that you used the name pthFactor in your first code block, while the name of the relevant argument is k and not p. That is quite confusing.
I also tried Math.sqrt etc in order not to loop n times
You cannot only consider factors up to the square root of 𝑛. For instance, 6 is a factor of 12, and even 12 is a factor of 12.
However, about half of all factors are in the range up to the square root. And those factors that are greater, are equal to the quotients found by performing the divisions with those smaller factors. So in the end, you can stop at the square root, provided that you collect all those quotients, and pick the right one from those if you didn't arrive at the 𝑘th factor yet.
So here is how that idea could be implemented:
var kthFactor = function(n, k) {
let bigFactors = [];
let quotient = n + 1;
for (let factor = 1; factor < quotient; factor++) {
if (n % factor === 0) {
quotient = n / factor;
k--;
if (k <= 0) return factor;
if (factor >= quotient) break;
bigFactors.push(quotient);
}
}
return bigFactors[bigFactors.length - k] ?? -1;
};
// A few test cases:
console.log(kthFactor(12, 3)); // 3
console.log(kthFactor(7, 2)); // 7
console.log(kthFactor(16, 5)); // 16
console.log(kthFactor(27, 5)); // -1
console.log(kthFactor(927, 1)); // 1

This code uses the same algorithm as in trincot's answer, but I think it's expressed more simply with recursion:
const kthFactor = (n, k, f = 1, r = []) =>
f * f > n
? r [k - 1] ?? -1
: n % f == 0
? k == 1 ? f : kthFactor (n, k - 1, f + 1, f * f == n ? r : [n / f, ...r])
: kthFactor (n, k, f + 1, r)
console .log (kthFactor (12, 3)); //=> 3
console .log (kthFactor (7, 2)); //=> 7
console .log (kthFactor (16, 5)); //=> 16
console .log (kthFactor (27, 5)); //=> -1
console .log (kthFactor (927, 1)); //=> 1
Here we track the factor we're testing (f) and the remaining numbers (r) to be tested, decreasing k whenever we find a new factor and adding the quotient n / f to our remaining numbers, increasing f on every step, stopping when we've gotten too big (f * f > n) and then finding the index in the ordered list of remaining numbers, returning the factor when k is 1, and simply recurring with an incremented factor otherwise.
The only tricky bit is the handling of perfect squares, where the square root should only be counted once, so we don't prepend it to the remaining numbers if f * f == n.
Because of the recursion, this is only suitable for a k so large. While we could make this tail recursive in a simple manner, it doesn't matter much as of this writing, since tail call optimization has still not happened in most engines.

Codility Leader Algorithm in JavaScript

I am attempting the Codility 'Leader' question in JavaScript.
The question is as follows:
An array A consisting of N integers is given. The dominator of array A is the value that occurs in more than half of the elements of A.
For example, consider array A such that
A[0] = 3 A[1] = 4 A[2] = 3
A[3] = 2 A[4] = 3 A[5] = -1
A[6] = 3 A[7] = 3
The dominator of A is 3 because it occurs in 5 out of 8 elements of A (namely in > those with indices 0, 2, 4, 6 and 7) and 5 is more than a half of 8.
Write a function function solution(A); that, given an array A consisting of N integers, returns index of any element of array A in which the dominator of A occurs. The function should return −1 if array A does not have a dominator.
For example, given array A such that
A[0] = 3 A[1] = 4 A[2] = 3
A[3] = 2 A[4] = 3 A[5] = -1
A[6] = 3 A[7] = 3
the function may return 0, 2, 4, 6 or 7, as explained above.
Write an efficient algorithm for the following assumptions:
N is an integer within the range [0..100,000];
and each element of array A is an integer within the range [−2,147,483,648..2,147,483,647].
My answer is below:
function solution(A) {
const length = A.length
if (length > 100000) return -1
const counters = new Array(length).fill(0)
const negativeCounters = new Array(length).fill(0)
for (i=0; i < length; i++){
if (A[i] < -2147483648 || A[i] > 2147483647) return -1
if (A[i] > -1){
counters[A[i]] = counters[A[i]] + 1
if (counters[A[i]] > (length / 2)) return i
} else {
negativeCounters[A[i] * -1] = negativeCounters[A[i] * -1] + 1
if (negativeCounters[A[i] * -1] > (length / 2)) return i
}
}
return -1
}
This is failing the correctness tests although I have tried a variety of inputs with which it succeeds. The Codility evaluation doesn't list the test input so I can't find input that is breaking the algorithm.
Can anyone spot the problem?

Contrary to some comments and one answer, in fact using an array of counters in JavaScript works fine. This is because arrays in JavaScript are associative objects. One problem with your code is that it fails to initialise to zero counters whose key/index is greater than the initial counters array length.
Here is code that gets 100% on all of Codility's measures:
function solution(A) {
const arr = []
for (let i=0; i<A.length; i++){
if (!arr[A[i]])
arr[A[i]] = 1
else
arr[A[i]]++
if (arr[A[i]] > A.length/2)
return i
}
return -1
}

You shouldn't use an array with that big range of values (the array for the positive values would need a length of 2,147,483,648). You should use a map or in js you can use an object. I hacked something together, not very elegant, but it passes all tests:
function solution(A) {
const map = {}
for (let i = 0; i < A.length; i++) {
const key = '' + A[i]
map[key] = key in map ? map[key] + 1 : 1
if (map[key] > A.length / 2)
return i
}
return -1
}

Randomly split up elements from a stream of data without knowing the total number of elements

Given a "split ratio", I am trying to randomly split up a dataset into two groups. The catch is, that I do not know beforehand how many items the dataset contains. My library receives the data one by one from an input stream and is expected to return the data to two output streams. The resulting two datasets should ideally be exactly split into the given split ratio.
Illustration:
┌─► stream A
input stream ──► LIBRARY ──┤
└─► stream B
For example, given a split ratio of 30/70, stream A would be expected to receive 30% of the elements from the input stream and stream B the remaining 70%. The order must remain.
My ideas so far:
Idea 1: "Roll the dice" for each element
The obvious approach: For each element the algorithm randomly decides if the element should go into stream A or B. The problem is, that the resulting data sets might be far off from the expected split ratio. Given a split ratio of 50/50, the resulting data split might be something far off (could even be 100/0 for very small data sets). The goal is to keep the resulting split ratio as close as possible to the desired split ratio.
Idea 2: Use a cache and randomize the cached data
Another idea is to cache a fixed number of elements before passing them on. This would result in caching 1000 elements and shuffling the data (or their corresponding indices to keep the order stable), splitting them up and passing the resulting data sets on. This should work very well, but I'm unsure if the randomization is really random for large data sets (I imagine there will patterns when looking at the distribution).
Both algorithms are not optimal, so I hope you can help me.
Background
This is about a layer-based data science tool, where each layer receives data from the previous layer via a stream. This layer is expected to split the data (vectors) up into a training and test set before passing them on. The input data can range from just a few elements to a never ending stream of data (hence, the streams). The code is developed in JavaScript, but this question is more about the algorithm than the actual implementation.

You could adjust the probability as it shifts away from the desired rate.
Here's an example along with tests for various levels of adjusting the probability. As we increase the adjustments, we see the stream splitter deviates less from the ideal ratio, but it also means its less random (knowing the previous values, you can predict the next values).
// rateStrictness = 0 will lead to "rolling the dice" for each invocations
// higher values of rateStrictness will lead to strong "correcting" forces
function* splitter(desiredARate, rateStrictness = .5) {
let aCount = 0, bCount = 0;
while (true) {
let actualARate = aCount / (aCount + bCount);
let aRate = desiredARate + (desiredARate - actualARate) * rateStrictness;
if (Math.random() < aRate) {
aCount++;
yield 'a';
} else {
bCount++;
yield 'b';
}
}
}
let test = (desiredARate, rateStrictness) => {
let s = splitter(desiredARate, rateStrictness);
let values = [...Array(1000)].map(() => s.next().value);
let aCount = values.map((_, i) => values.reduce((count, v, j) => count + (v === 'a' && j <= i), 0));
let aRate = aCount.map((c, i) => c / (i + 1));
let deviation = aRate.map(a => a - desiredARate);
let avgDeviation = deviation.reduce((sum, dev) => sum + dev, 0) / deviation.length;
console.log(`inputs: desiredARate = ${desiredARate}; rateStrictness = ${rateStrictness}; average deviation = ${avgDeviation}`);
};
test(.5, 0);
test(.5, .25);
test(.5, .5);
test(.5, .75);
test(.5, 1);
test(.5, 10);
test(.5, 100);

How about rolling the dice twice: First of all decide wether the stream should be chosen randomly or if the ratio should be taken into account. Then for the first case, roll the dice, for the second case take the ratio. Some pseudocode:
const toA =
Math.random() > 0.5 // 1 -> totally random, 0 -> totally equally distributed
? Math.random() > 0.7
: (numberA / (numberA + numberB) > 0.7);
That's just an idea I came up with, I haven't tried that ...

Here is a way that combines both of your ideas: It uses a cache. As long as the amount of elements in cache can handle that if the stream ends, we can still approach target distribution, we just roll a dice. If not, we add it to the cache. When input stream ends, we shuffle elements in cache and send them trying to approach distribution. I am not sure if there is any gain in this over just forcing element to go to x if distribution is straying off too much in terms of randomness.
Beware that this approach does not preserve order from original input stream. A few other things could be added such as cache limit and relaxing distribution error (using 0 here). If you need to preserve order, it can be done by sending cache value and pushing to cache current one instead of just sending current one when there are still elements in cache.
let shuffle = (array) => array.sort(() => Math.random() - 0.5);
function* generator(numElements) {
for (let i = 0; i < numElements;i++) yield i;
}
function* splitter(aGroupRate, generator) {
let cache = [];
let sentToA = 0;
let sentToB = 0;
let bGroupRate = 1 - aGroupRate;
let maxCacheSize = 0;
let sendValue = (value, group) => {
sentToA += group == 0;
sentToB += group == 1;
return {value: value, group: group};
}
function* retRandomGroup(value, expected) {
while(Math.random() > aGroupRate != expected) {
if (cache.length) {
yield sendValue(cache.pop(), !expected);
} else {
yield sendValue(value, !expected);
return;
}
}
yield sendValue(value, expected);
}
for (let value of generator) {
if (sentToA + sentToB == 0) {
yield sendValue(value, Math.random() > aGroupRate);
continue;
}
let currentRateA = sentToA / (sentToA + sentToB);
if (currentRateA <= aGroupRate) {
// can we handle current value going to b group?
if ((sentToA + cache.length) / (sentToB + sentToA + 1 + cache.length) >= aGroupRate) {
for (val of retRandomGroup(value, 1)) yield val;
continue;
}
}
if (currentRateA > aGroupRate) {
// can we handle current value going to a group?
if (sentToA / (sentToB + sentToA + 1 + cache.length) <= aGroupRate) {
for (val of retRandomGroup(value, 0)) yield val;
continue;
}
}
cache.push(value);
maxCacheSize = Math.max(maxCacheSize, cache.length)
}
shuffle(cache);
let totalElements = sentToA + sentToB + cache.length;
while (sentToA < totalElements * aGroupRate) {
yield {value: cache.pop(), group: 0}
sentToA += 1;
}
while (cache.length) {
yield {value: cache.pop(), group: 1}
}
yield {cache: maxCacheSize}
}
function test(numElements, aGroupRate) {
let gen = generator(numElements);
let sentToA = 0;
let total = 0;
let cacheSize = null;
let split = splitter(aGroupRate, gen);
for (let val of split) {
if (val.cache != null) cacheSize = val.cache;
else {
sentToA += val.group == 0;
total += 1
}
}
console.log("required rate for A group", aGroupRate, "actual rate", sentToA / total, "cache size used", cacheSize);
}
test(3000, 0.3)
test(5000, 0.5)
test(7000, 0.7)

Let's say you have to maintain a given ratio R for data items going to stream A, e.g. R = 0.3 as per your example. Then on receiving each data item count the
total number of items and the items passed on to stream A and decide for each item if it goes to A based on what choice keeps you closer to your target ratio R.
That should be about the best you can do for any size of the data set. As for randomness the resulting streams A and B should be about as random as your input stream.
Let's see how this plays out for the first couple of iterations:
Example: R = 0.3
N : total number of items processed so far (initially 0)
A : numbers passed on to stream A so far (initially 0)
First Iteration
N = 0 ; A = 0 ; R = 0.3
if next item goes to stream A then
n = N + 1
a = A + 1
r = a / n = 1
else if next item goes to stream B
n = N + 1
a = A
r = a / n = 0
So first item goes to stream B since 0 is closer to 0.3
Second Iteration
N = 1 ; A = 0 ; R = 0.3
if next item goes to stream A then
n = N + 1
a = A + 1
r = a / n = 0.5
else if next item goes to stream B
n = N + 1
a = A
r = a / n = 0
So second item goes to stream A since 0.5 is closer to 0.3
Third Iteration
N = 2 ; A = 1 ; R = 0.3
if next item goes to stream A then
n = N + 1
a = A + 1
r = a / n = 0.66
else if next item goes to stream B
n = N + 1
a = A
r = a / n = 0.5
So third item goes to stream B since 0.5 is closer to 0.3
Fourth Iteration
N = 3 ; A = 1 ; R = 0.3
if next item goes to stream A then
n = N + 1
a = A + 1
r = a / n = 0.5
else if next item goes to stream B
n = N + 1
a = A
r = a / n = 0.25
So third item goes to stream B since 0.25 is closer to 0.3
So this here would be the pseudo code for deciding each data item:
if (((A + 1) / (N + 1)) - R) < ((A / (N + 1)) - R ) then
put the next data item on stream A
A = A + 1
N = N + 1
else
put the next data item on B
N = N + 1
As discussed in the comments below, that is not random in the sense intended by the OP. So once we know the correct target stream for the next item we flip a coin to decide if we actually put it there, or introduce an error.
if (((A + 1) / (N + 1)) - R) < ((A / (N + 1)) - R ) then
target_stream = A
else
target_stream = B
if random() < 0.5 then
if target_stream == A then
target_stream = B
else
target_stream = A
if target_stream == A then
put the next data item on stream A
A = A + 1
N = N + 1
else
put the next data item on B
N = N + 1
Now that could lead to an arbitrarily large error overall. So we have to set an error limit L and check how far off the resulting ratio is from the target R when errors are about to be introduced:
if (((A + 1) / (N + 1)) - R) < ((A / (N + 1)) - R ) then
target_stream = A
else
target_stream = B
if random() < 0.5 then
if target_stream == A then
if abs((A / (N + 1)) - R) < L then
target_stream = B
else
if abs(((A + 1) / (N + 1)) - R) < L then
target_stream = A
if target_stream == A then
put the next data item on stream A
A = A + 1
N = N + 1
else
put the next data item on B
N = N + 1
So here we have it: Processing data items one by one we know the correct stream to put the next item on, then we introduce random local errors and we are able to limit the overall error with L.

Looking at the two numbers you wrote (chunk size of 1000, probability split of 0.7) you might not have any problem with the simple approach of just rolling the dice for every element.
Talking about probability and high numbers, you have the law of large numbers.
This means, that you do have a risk of splitting the streams very unevenly into 0 and 1000 elements, but in practice this is veeery unlikely to happen. As you are talking about testing and training sets I also do not expect your probability split to be far off of 0.7. And in case you are allowed to cache, you can still use this for the first 100 elements, so that you are sure to have enough data for the law of large numbers to kick in.
This is the binomial distribution for n=1000, p=.7
In case you want to reproduce the image with other parameters
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import binom
index = np.arange(binom.ppf(0.01, n, p), binom.ppf(0.99, n, p))
pd.Series(index=index, data=binom.pmf(x, n, p)).plot()
plt.show()

Find sequential smallest sums of the multiples of three different numbers, javascript

On this post I found an algorithm to determine the luminance of an RGB color:
Luminance (standard for certain colour spaces): (0.2126*R + 0.7152*G + 0.0722*B)
I want to use this equation, starting at rgb(0,0,0), to generate all RGB colors in order from lowest to highest luminance and then draw them to a 4096x4096 canvas.
My issue is that with 16.7 million different combinations I can't generate them all and then sort them without either crashing my browser or taking multiple days to complete the render. So I want to find a way to find the multiples of each number that will summate to the next lowest number.
For instance, starting at and rgb of 0,0,0, the luminance would be 0 (0.2126*0 + 0.7152*0 + 0.0722*0 = 0), the next least luminescent rgb value would be 0,0,1 because 0.2126*0 + 0.7152*0 + 0.0722*1 = .0722, and there's no set of multiples that would summate to a smaller number.
The first 19 sequential luminance values would be as follows (I may have missed one or two, because I calculated them manually, but hopefully it helps to make the point):
RGB => Luminence
0,0,0 => 0
0,0,1 => .0722
0,0,2 => .1444
1,0,0 => .2126
0,0,3 => .2166
1,0,1 => .2848
0,0,4 => .2888
1,0,2 => .357
0,0,5 => .361
2,0,0 => .4252
1,0,3 => .4292
0,0,6 => .4332
2,0,1 => .4974
1,0,4 => .5014
0,0,7 => .5054
2,0,2 => .5696
1,0,5 => .5736
0,0,8 => .5776
3,0,0 => .6378
I can't seem to find any pattern so I was hoping that maybe there was an equation or a coding trick out there that would allow me to find the smallest sum, higher than the previous sum, of the multiples of three numbers, without brute forcing it and checking every possible value.
EDIT: I Did some extra research and it looks like the solution may lie in using linear diophantine equations. If I take each decimal and multiply by 1000, to get 2126, 7152, & 722. Then count 1-by-1 up to 2,550,000 (2126*255 + 7152*255 + 722*255), I can check each number to see if it's a solution to the equation 2126r + 7152g + 722b = n, where n is the current number counted to, and r, g, & b are unknowns. If I could do this, I could figure out all possible rgb values at the next sequential luminance value, without even having to double over any values for duplicate luminance values and I'd only have to do 2.55 million calculations instead of 16.77+ million (one for each color). If anyone has any idea how to code this equation, or if anyone has any better solution, I'd be extremely grateful. Thanks!

Here's an algorithm (have forgotten its name) for your problem:
The algorithm can list all color tuples {R,G,B} sorted in some order. In your case it's by luminance ascending: color1 < color2 <==> f(color1) < f(color2), where f(color) = 0.2126*R + 0.7152*G + 0.0722*B
Initialize: arr = [{r:0, g:0, b:0}] (the minimum color)
Repeat:
Select min(iR): a[iR] = {rR < 255, gR, bR}, and cR = {rR + 1, gR, bR} > arr[i] for every i. (Select the first color in arr such that if we add 1 to its r component, we get a new color that is greater than every colors currently in arr)
Similar for iG and iB => also get cG = {rG, gG + 1, bG} and cB = {rB, gB, bB + 1}
Among cR, cG and cB select the minimum color c
Append c to the array arr
The algorithm stops when no such iR, iG, or iB could be found.
Notes:
arr is always in sorted (ascending) order, because every time a new color is appended to arr, it is always greater than every elements currently in arr.
Because arr is in ascending order, we only have to compare cR/cG/cB with the last element of arr to check if it's greater than every elements of arr
iR, iG and iB increase through out the algorithm
The complexity is O(N) with N the number of colors (2^24) ~ 16M. With heap-based algorithm the complexity is about O(NlogN).
Here is my implementation (Tested in nodejs 6)
// use integer to avoid floating point inaccuracy
const lumixOf = {r: 2126, g: 7152, b: 722};
const maxValue = 256;
const components = ['r', 'g', 'b'];
class Color {
constructor(r, g, b, lum) {
this.r = r;
this.g = g;
this.b = b;
this.lum = lum;
}
add(component) {
const ans = new Color(this.r, this.g, this.b, this.lum);
if (++ans[component] >= maxValue) return null; // exceed 255
ans.lum += lumixOf[component];
return ans;
}
greater(color2) {
// return this.lum > color2.lum;
if (this.lum !== color2.lum) return this.lum > color2.lum;
if (this.r !== color2.r) return this.r > color2.r;
if (this.g !== color2.g) return this.g > color2.g;
return this.b > color2.b;
}
}
let a = [new Color(0, 0, 0, 0)]; // R, G, B, lumix
let index = {r: 0, g: 0, b: 0};
console.log('#0:', a[0]);
// Test: print the first 100 colors
for (let count = 1; count < 100; ++count) {
let nextColor = null;
const len = a.length;
const currentColor = a[len - 1];
components.forEach(component => {
let cIndex = index[component];
for (; cIndex < len; ++cIndex) {
const newColor = a[cIndex].add(component);
if (!newColor || !newColor.greater(currentColor)) continue;
// find the minimum next color
if (nextColor == null || nextColor.greater(newColor)) {
nextColor = newColor;
}
break;
}
index[component] = cIndex;
});
if (!nextColor) break; // done. No more color
a.push(nextColor);
console.log('#' + count + ':', nextColor);
}
console.log(a.length);
This implementation list all 2^24 = 16777216 colors (once you remove the count < 100 condition in the main loop, but you wouldn't want to print out so many lines). If some colors have the same luminance value, they are then sorted by their R value, then G value, then B value. If you just need one color for each luminance value, uncomment the first line in greater() function - then you get 1207615 colors with distinct luminance

One fact you can make use of is that each triplet in the sequence will have an R, G or B value only one greater than a triplet that has already been output.
So, you could maintain a BinaryHeap (sorted by luminance) containing all the triplets that are 1 greater in R, G or B than a triplet that has already been output, and do this in a loop:
Remove the smallest element (r, g, b) from the heap
Output it
Add (r+1, g, b), (r, g+1, b) and (r, g, b+1) to the heap, but only if they are valid triplets (all values less than or equal to 255), and only if they are not already in the heap. A triplet will not already be in the heap if the alternative triplets that it could have been generated from (1 less in either r, g, or b, within allowed bounds) have a higher luminance than (r, g, b).
For example only add (r+1, g, b) if (r+1, g-1, b) has a higher luminance than (r, g, b) or (r+1, g-1, b) is invalid. Since the factors for computing luminance based on r, g, b are fixed, (r+1, g-1, b) will always have a lower luminance, and you should only add (r+1, g, b) if (r+1, g-1, b) is invalid, which is when g is 0.
In pseudo-code the rules are like this:
function addTriplets(r, g, b)
{
if(g < 255)
pushTripletToHeap(r, g + 1, b);
if((g == 0) && (r < 255))
pushTripletToHeap(r + 1, g, b);
if((g == 0) && (r == 0) && (b < 255))
pushTripletToHeap(r, g, b + 1);
}
Push the (0, 0, 0) triplet onto the heap before starting the loop and stop the loop when the heap is empty.

Sorry but i have to say that you are doing some wasteful work.
8 bit quantization of RGB would yield 16.7M colors at 256 luminance levels (black and white included) However you haven't got enough pixels to display them all on a 4K monitor which would have like 3840 x 2160 = 8294400 pixels on 4K TV standard or like 4096 x 2160 = 8847360 on 4K movie standard. Besides what's the meaning of 1 pixel color sample to an eye especially on 4K display..?
I would recommend you to use 7 bits quantization instead of 8 bits. This will give you 2^21 => 2097152 color samples and they will be mapped as single pixel on an HD monitor/TV and 2x2 pixels on a 4K monitor/TV. Beautiful.
The code would be as follows;
"use strict";
var allColors = Array(Math.pow(2,21)), // All 2^21 colors
cgbl = Array(128).fill().map(e => []); // Colors gropuped by luminance
for (var i = 0, len = allColors.length; i < len; i++) allColors[i] = [i>>14, (i&16256)>>7, i&127];
allColors.reduce((g,c) => (g[Math.round(c[0]*0.2126 + c[1]*0.7152 + c[2]*0.0722)].push(c),g), cgbl);
cgbl.forEach((y,i) => console.log(y.length,"Colors at luminance level:",i));
However remember that your RGB values are now in 7 bit quantization. Since we have already grouped them into 128 luminance levels I would also advise you to map each RGB values in the luminance groups (sub arrays) back into 8 bits by shifting their values left by 1 bit (r << 1; g << 1; b << 1;) before displaying them. By using the .map() functor it's a trivial job.

Because my original answer is already long, I'm making this answer to clarify the algorithm, as OP has requested
Let's consider a similar problem (but easier to reason about):
Let A the set of numbers, in ascending order, which have no other prime factor than 2, 3 and 5
(Ai = 2^x * 3^y * 5^z)
Find the n-th number of A
Sure A1 = 1 = 2^0 * 3^0 * 5^0
Let's assume at some step, we have calculated A1..An and we need to find A[n+1]
If A[n+1] is divisible by 2, then A[n+1] = A[i2]*2 with 1 <= i2 <= n
If A[n+1] is divisible by 3, then A[n+1] = A[i3]*3 with 1 <= i3 <= n
If A[n+1] is divisible by 5, then A[n+1] = A[i5]*5 with 1 <= i5 <= n
(and obviously A[n+1] is divisible by at least one of those)
Proof: A[n+1] = 2^x * 3^y * 5^z. If A[n+1] is divisible by 2 then x > 0, so B = A[n+1]/2 = 2^(x-1) * 3^y * 5^z must be in A. And because B < A[n+1], it must come before A[n+1] in A, so B = A[i2], with 1 <= i2 <= n.
So to find A[n+1], we can:
Find the minimum i2 that A[i2]*2 > A[n]
Similar for i3 and i5
==> A[n+1] = min(A[i2]*2, A[i3]*3, A[i5]*5)
Having A1 = 1, run these steps (n - 1) times and we find the n-th number of A
Now, if at every iteration finding A[n+1], we use 3 for loops from 1 to n to calculate i2, i3 and i5, the time complexity would be O(N^2). But you can see i2, i3 and i5 for each iteration never decrease ( be less than those value for previous iteration, respectively ). So we can save those i2, i3 and i5 values, and at each iteration we just need to:
while (A[i2]*2 <= A[n]) ++i2;
while (A[i3]*3 <= A[n]) ++i3;
while (A[i5]*5 <= A[n]) ++i5;
Now the time complexity becomes O(N): while the while loops are still nested in the main for 1->n loop, there are only 4 variables increasing from 1 -> n, and can be considered 4 independent loops. You can verify the O(N) property by using a clock and measure the run time for different N s
Applying this algorithm to your problem:
2, 3 and 5 becomes R, G and B
Integer comparison becomes a color comparison function you define
If you need to list all 2^24 colors, define the comparison function so that no 2 different colors are "equal" (if C1 and C2 are 2 different colors then either C1 < C2 or C2 < C1) - like I did in the original answer

How to implement carousel with an array

I have an array of items representing a virtual carousel.
const carousel = ['a','b','c','d','e'];
let currentIndex = 0;
function move (amount) {
const l = items.length; // in case carousel size changes
// need to update currentIndex
return carousel[currentIndex];
}
What is a clean or clever way to handle moving left when currentIndex == 0 and moving right when currentIndex == length-1?
I have thought about this before and have never come with anything very clever or concise.

short answer
Implement a circular array via modular arithmetic. Given a distance to move, to calculate the appropriate index:
// put distance in the range {-len+1, -len+2, ..., -1, 0, 1, ..., len-2, len-1}
distance = distance % len
// add an extra len to ensure `distance+len` is non-negative
new_index = (index + distance + len) % len
long answer
Use modular arithmetic much like how you'd read a typical analog clock. The premise is to add two integers, divide by a integer, and keep the remainder. For example, 13 = 3 (mod 10) because 13 is 1*10 + 3 and 3 is 0*10 + 3.
But why did we choose to arrange 3 and 13 as we did? To answer that, we consider the Euclidean division algorithm (EDA). It says for two integers a and b there exists unique integers q and r such that
a = b*q + r
with 0 ≤ r < b. This is more powerful than you'd think: it allows us to "work modulo n."
That is, we can say a = b (mod n) iff there are unique integers q1, r1, q2, and r2 such that
a = n * q1 + r1, 0 ≤ r1 < n
b = n * q2 + r2, 0 ≤ r2 < n
and r1 equals r2. We call r1 and r2 the "remainders."
To go back to the previous example, we now know why 13 = 3 (mod 10). The EDA says 13 = 1*10 + 3 and that 1 and 3 are the only q and r satisfying the necessary constraints; by similar logic, 3 = 0*10 + 3. Since the remainders are equal, we say that 13 and 3 are equal when "working mod 10."
Fortunately, JavaScript implements a modulo operator natively. Unfortunately, we need to watch out for a quirk, i.e., the modulo operator keeps the sign of its operands. This gives you some results like -6 % 5 == -1 and -20 % 7 == -6. While perfectly valid mathematical statements (check why), this doesn't help us when it comes to array indices.
Lemma 1: a + n = a (mod n)
Lemma 2: -1 = n-1 (mod n)
Lemma 3: -a = n-a (mod n)
The way to overcome this is to "trick" JavaScript into using the correct sign. Suppose we have an array with length len and current index index; we want to move the index by a distance d:
// put `d` within the range {-len+1, -len+2, ..., -2, -1, -0}
d = d % len
// add an extra len to ensure `d+len` is non-negative
new_index = (index + d + len) % len
We accomplish this by first putting d within the range {-len+1, -len+2, ..., -2, -1, -0}. Next, we add an extra len to make sure the distance we're moving is within the range {1, 2, ..., len-1, len}, thereby ensuring the result of the % operation has a positive sign. We know this works because (-a+b) + a = b (mod a). Then we just set the new index to index + d + len (mod len).
More detailed implementation:
class Carousel {
// assumes `arr` is non-empty
constructor (arr, index = 0) {
this.arr = arr
this.index = index % arr.length
}
// `distance` is an integer (...-2, -1, 0, 1, 2, ...)
move (distance) {
let len = this.arr.length
distance = distance % len
let new_index = (this.index + distance + len) % len
this.index = new_index
return this.arr[this.index]
}
}
// usage:
let c = new Carousel(['a','b','c','d','e'], 1) // position pointer set at 'b'
c.move(-1) // returns 'a' as (1 + -1 + 5) % 5 == 5 % 5 == 0
c.move(-1) // returns 'e' as (0 + -1 + 5) % 5 == 4 % 5 == 4
c.move(21) // returns 'a' as (4 + 21 + 5) % 5 == 30 % 5 == 0

I had implemented an Array.prototype.rotate() a while back. It might come very handy for this job. Here is the code;
Array.prototype.rotate = function(n) {
var len = this.length;
return !(n % len) ? this.slice()
: this.map((e,i,a) => a[(i + (len + n % len)) % len]);
};
var a = [1,2,3,4,5,6,7,8,9],
b = a.rotate(10);
console.log(JSON.stringify(b));
b = a.rotate(-10);
console.log(JSON.stringify(b));

currentIndex = currentIndex + change;
if (currentIndex >= l) currentIndex = 0;
if (currentIndex < 0) currentIndex = l - 1;
This will modify the index, check if it's broken possible values and adjust to either 'side' of the carousel.

We Keep Coding

JavaScript is the programming language of the Web.