Find sequential smallest sums of the multiples of three different numbers, javascript - javascript

On this post I found an algorithm to determine the luminance of an RGB color:
Luminance (standard for certain colour spaces): (0.2126*R + 0.7152*G + 0.0722*B)
I want to use this equation, starting at rgb(0,0,0), to generate all RGB colors in order from lowest to highest luminance and then draw them to a 4096x4096 canvas.
My issue is that with 16.7 million different combinations I can't generate them all and then sort them without either crashing my browser or taking multiple days to complete the render. So I want to find a way to find the multiples of each number that will summate to the next lowest number.
For instance, starting at and rgb of 0,0,0, the luminance would be 0 (0.2126*0 + 0.7152*0 + 0.0722*0 = 0), the next least luminescent rgb value would be 0,0,1 because 0.2126*0 + 0.7152*0 + 0.0722*1 = .0722, and there's no set of multiples that would summate to a smaller number.
The first 19 sequential luminance values would be as follows (I may have missed one or two, because I calculated them manually, but hopefully it helps to make the point):
RGB => Luminence
0,0,0 => 0
0,0,1 => .0722
0,0,2 => .1444
1,0,0 => .2126
0,0,3 => .2166
1,0,1 => .2848
0,0,4 => .2888
1,0,2 => .357
0,0,5 => .361
2,0,0 => .4252
1,0,3 => .4292
0,0,6 => .4332
2,0,1 => .4974
1,0,4 => .5014
0,0,7 => .5054
2,0,2 => .5696
1,0,5 => .5736
0,0,8 => .5776
3,0,0 => .6378
I can't seem to find any pattern so I was hoping that maybe there was an equation or a coding trick out there that would allow me to find the smallest sum, higher than the previous sum, of the multiples of three numbers, without brute forcing it and checking every possible value.
EDIT: I Did some extra research and it looks like the solution may lie in using linear diophantine equations. If I take each decimal and multiply by 1000, to get 2126, 7152, & 722. Then count 1-by-1 up to 2,550,000 (2126*255 + 7152*255 + 722*255), I can check each number to see if it's a solution to the equation 2126r + 7152g + 722b = n, where n is the current number counted to, and r, g, & b are unknowns. If I could do this, I could figure out all possible rgb values at the next sequential luminance value, without even having to double over any values for duplicate luminance values and I'd only have to do 2.55 million calculations instead of 16.77+ million (one for each color). If anyone has any idea how to code this equation, or if anyone has any better solution, I'd be extremely grateful. Thanks!

Here's an algorithm (have forgotten its name) for your problem:
The algorithm can list all color tuples {R,G,B} sorted in some order. In your case it's by luminance ascending: color1 < color2 <==> f(color1) < f(color2), where f(color) = 0.2126*R + 0.7152*G + 0.0722*B
Initialize: arr = [{r:0, g:0, b:0}] (the minimum color)
Repeat:
Select min(iR): a[iR] = {rR < 255, gR, bR}, and cR = {rR + 1, gR, bR} > arr[i] for every i. (Select the first color in arr such that if we add 1 to its r component, we get a new color that is greater than every colors currently in arr)
Similar for iG and iB => also get cG = {rG, gG + 1, bG} and cB = {rB, gB, bB + 1}
Among cR, cG and cB select the minimum color c
Append c to the array arr
The algorithm stops when no such iR, iG, or iB could be found.
Notes:
arr is always in sorted (ascending) order, because every time a new color is appended to arr, it is always greater than every elements currently in arr.
Because arr is in ascending order, we only have to compare cR/cG/cB with the last element of arr to check if it's greater than every elements of arr
iR, iG and iB increase through out the algorithm
The complexity is O(N) with N the number of colors (2^24) ~ 16M. With heap-based algorithm the complexity is about O(NlogN).
Here is my implementation (Tested in nodejs 6)
// use integer to avoid floating point inaccuracy
const lumixOf = {r: 2126, g: 7152, b: 722};
const maxValue = 256;
const components = ['r', 'g', 'b'];
class Color {
constructor(r, g, b, lum) {
this.r = r;
this.g = g;
this.b = b;
this.lum = lum;
}
add(component) {
const ans = new Color(this.r, this.g, this.b, this.lum);
if (++ans[component] >= maxValue) return null; // exceed 255
ans.lum += lumixOf[component];
return ans;
}
greater(color2) {
// return this.lum > color2.lum;
if (this.lum !== color2.lum) return this.lum > color2.lum;
if (this.r !== color2.r) return this.r > color2.r;
if (this.g !== color2.g) return this.g > color2.g;
return this.b > color2.b;
}
}
let a = [new Color(0, 0, 0, 0)]; // R, G, B, lumix
let index = {r: 0, g: 0, b: 0};
console.log('#0:', a[0]);
// Test: print the first 100 colors
for (let count = 1; count < 100; ++count) {
let nextColor = null;
const len = a.length;
const currentColor = a[len - 1];
components.forEach(component => {
let cIndex = index[component];
for (; cIndex < len; ++cIndex) {
const newColor = a[cIndex].add(component);
if (!newColor || !newColor.greater(currentColor)) continue;
// find the minimum next color
if (nextColor == null || nextColor.greater(newColor)) {
nextColor = newColor;
}
break;
}
index[component] = cIndex;
});
if (!nextColor) break; // done. No more color
a.push(nextColor);
console.log('#' + count + ':', nextColor);
}
console.log(a.length);
This implementation list all 2^24 = 16777216 colors (once you remove the count < 100 condition in the main loop, but you wouldn't want to print out so many lines). If some colors have the same luminance value, they are then sorted by their R value, then G value, then B value. If you just need one color for each luminance value, uncomment the first line in greater() function - then you get 1207615 colors with distinct luminance

One fact you can make use of is that each triplet in the sequence will have an R, G or B value only one greater than a triplet that has already been output.
So, you could maintain a BinaryHeap (sorted by luminance) containing all the triplets that are 1 greater in R, G or B than a triplet that has already been output, and do this in a loop:
Remove the smallest element (r, g, b) from the heap
Output it
Add (r+1, g, b), (r, g+1, b) and (r, g, b+1) to the heap, but only if they are valid triplets (all values less than or equal to 255), and only if they are not already in the heap. A triplet will not already be in the heap if the alternative triplets that it could have been generated from (1 less in either r, g, or b, within allowed bounds) have a higher luminance than (r, g, b).
For example only add (r+1, g, b) if (r+1, g-1, b) has a higher luminance than (r, g, b) or (r+1, g-1, b) is invalid. Since the factors for computing luminance based on r, g, b are fixed, (r+1, g-1, b) will always have a lower luminance, and you should only add (r+1, g, b) if (r+1, g-1, b) is invalid, which is when g is 0.
In pseudo-code the rules are like this:
function addTriplets(r, g, b)
{
if(g < 255)
pushTripletToHeap(r, g + 1, b);
if((g == 0) && (r < 255))
pushTripletToHeap(r + 1, g, b);
if((g == 0) && (r == 0) && (b < 255))
pushTripletToHeap(r, g, b + 1);
}
Push the (0, 0, 0) triplet onto the heap before starting the loop and stop the loop when the heap is empty.

Sorry but i have to say that you are doing some wasteful work.
8 bit quantization of RGB would yield 16.7M colors at 256 luminance levels (black and white included) However you haven't got enough pixels to display them all on a 4K monitor which would have like 3840 x 2160 = 8294400 pixels on 4K TV standard or like 4096 x 2160 = 8847360 on 4K movie standard. Besides what's the meaning of 1 pixel color sample to an eye especially on 4K display..?
I would recommend you to use 7 bits quantization instead of 8 bits. This will give you 2^21 => 2097152 color samples and they will be mapped as single pixel on an HD monitor/TV and 2x2 pixels on a 4K monitor/TV. Beautiful.
The code would be as follows;
"use strict";
var allColors = Array(Math.pow(2,21)), // All 2^21 colors
cgbl = Array(128).fill().map(e => []); // Colors gropuped by luminance
for (var i = 0, len = allColors.length; i < len; i++) allColors[i] = [i>>14, (i&16256)>>7, i&127];
allColors.reduce((g,c) => (g[Math.round(c[0]*0.2126 + c[1]*0.7152 + c[2]*0.0722)].push(c),g), cgbl);
cgbl.forEach((y,i) => console.log(y.length,"Colors at luminance level:",i));
However remember that your RGB values are now in 7 bit quantization. Since we have already grouped them into 128 luminance levels I would also advise you to map each RGB values in the luminance groups (sub arrays) back into 8 bits by shifting their values left by 1 bit (r << 1; g << 1; b << 1;) before displaying them. By using the .map() functor it's a trivial job.

Because my original answer is already long, I'm making this answer to clarify the algorithm, as OP has requested
Let's consider a similar problem (but easier to reason about):
Let A the set of numbers, in ascending order, which have no other prime factor than 2, 3 and 5
(Ai = 2^x * 3^y * 5^z)
Find the n-th number of A
Sure A1 = 1 = 2^0 * 3^0 * 5^0
Let's assume at some step, we have calculated A1..An and we need to find A[n+1]
If A[n+1] is divisible by 2, then A[n+1] = A[i2]*2 with 1 <= i2 <= n
If A[n+1] is divisible by 3, then A[n+1] = A[i3]*3 with 1 <= i3 <= n
If A[n+1] is divisible by 5, then A[n+1] = A[i5]*5 with 1 <= i5 <= n
(and obviously A[n+1] is divisible by at least one of those)
Proof: A[n+1] = 2^x * 3^y * 5^z. If A[n+1] is divisible by 2 then x > 0, so B = A[n+1]/2 = 2^(x-1) * 3^y * 5^z must be in A. And because B < A[n+1], it must come before A[n+1] in A, so B = A[i2], with 1 <= i2 <= n.
So to find A[n+1], we can:
Find the minimum i2 that A[i2]*2 > A[n]
Similar for i3 and i5
==> A[n+1] = min(A[i2]*2, A[i3]*3, A[i5]*5)
Having A1 = 1, run these steps (n - 1) times and we find the n-th number of A
Now, if at every iteration finding A[n+1], we use 3 for loops from 1 to n to calculate i2, i3 and i5, the time complexity would be O(N^2). But you can see i2, i3 and i5 for each iteration never decrease ( be less than those value for previous iteration, respectively ). So we can save those i2, i3 and i5 values, and at each iteration we just need to:
while (A[i2]*2 <= A[n]) ++i2;
while (A[i3]*3 <= A[n]) ++i3;
while (A[i5]*5 <= A[n]) ++i5;
Now the time complexity becomes O(N): while the while loops are still nested in the main for 1->n loop, there are only 4 variables increasing from 1 -> n, and can be considered 4 independent loops. You can verify the O(N) property by using a clock and measure the run time for different N s
Applying this algorithm to your problem:
2, 3 and 5 becomes R, G and B
Integer comparison becomes a color comparison function you define
If you need to list all 2^24 colors, define the comparison function so that no 2 different colors are "equal" (if C1 and C2 are 2 different colors then either C1 < C2 or C2 < C1) - like I did in the original answer

Related

Binary Search in JS: trying to find a consistent mental model

I am grinding LeetCode these days and I encountered the challenge 162. Find Peak Element:
A peak element is an element that is strictly greater than its neighbors.
Given an integer array nums, find a peak element, and return its index. If the array contains multiple peaks, return the index to any of the peaks.
You may imagine that nums[-1] = nums[n] = -∞.
You must write an algorithm that runs in O(log n) time.
Constraints:
1 <= nums.length <= 1000
-231 <= nums[i] <= 231 - 1
nums[i] != nums[i + 1] for all valid i
This question is about using binary search to find a peak element in an array.
I know we can think of the array as alternating ascending and descending sequences. Here is my solution
var findPeakElement = function(nums) {
if(nums.length <= 1) return 0
let left = 0, right = nums.length - 1
while(left <= right) {
const mid = left + right >>> 1
if(nums[mid] > nums[mid + 1]) {
right = mid - 1
} else {
left = mid + 1
}
}
return left === nums.length ? left - 1 : left
};
If the nums[mid] is bigger than the next element in the array that we know we are in the descending sub array and the peak element must be lying in the left, and vice versa if then nums[mid] is smaller than the next element. So far so good. But what confused me is which index I should return eventually - left or right? To figure this out I need to go through a bunch of trial and error.
And if I slightly tweek the question to find the valley element e.g. [1, 3, 20, 4, 1, 0]'s valley elements should be 0. While I can reason about how we narrow the window but I still cannot seem to figure out which index I should return at the end of the binary search.
Here is my attempt for returning the valley element in the array by mirroring what I did for findPeakElement
var findValleyElement = function (nums) {
if (nums.length <= 1) return 0
let left = 0,
right = nums.length - 1
while (left <= right) {
const mid = (left + right) >>> 1
if (nums[mid] > nums[mid + 1]) {
left = mid + 1
} else {
right = mid - 1
}
}
return right
}
But this time I cannot use right as the returned index. I need to use left instead. I cannot seem to think of a consistent way of thinking through this without going through a bunch of examples, which is really not ideal since you still might miss some edge cases.
So my question is, is there some consistent mental model we can adopt when thinking about these binary search problems, specifically which index we should return to satisfy the requirements.
When the following condition is true:
if(nums[mid] > nums[mid + 1]) {
...then it could be that mid is a solution, maybe even the only one. So that means you shouldn't exclude it from the range, yet with right = mid - 1 you do exclude it. You should set right = mid. To then avoid a potentially endless loop, the loop condition should be left < right. This will ensure the loop will always end: the range is guaranteed to become smaller in each iteration*
* Let's for instance assume left == right + 1 at a certain moment. Then mid will become equal to left (since the odd bit in the sum is dropped with >>>). Now either we do right = mid or we do left = mid + 1. In either case we get that left == right. In all other cases where left < right, we get a mid that is strictly between those two limits, and then surely the range will become smaller.
Once the loop exits, left has become equal to right. The only possible index in that range (of 1) is that index.
There is now no more need to check whether left is nums.length, as this cannot happen: with our chosen while condition, left can never become greater than right, ... only equal to it. And since right is a valid index, no such out-of-range check is needed.
Also the case of array size 1 does not need special treatment now.
So:
var findPeakElement = function(nums) {
let left = 0,
right = nums.length - 1;
while (left < right) {
const mid = (left + right) >>> 1;
if (nums[mid] > nums[mid + 1]) {
right = mid;
} else {
left = mid + 1;
}
}
return left;
};
Valleys instead of Peaks
Here is my attempt for returning the valley element
If you want to find the valley element, it will not always work unless the following assumption in the question is changed from this:
You may imagine that nums[-1] = nums[n] = -∞
...to this:
You may imagine that nums[-1] = nums[n] = ∞
Once you have that agreed upon, you only have to change the comparison in the above code block from nums[mid] > nums[mid + 1] to nums[mid] < nums[mid + 1].
Since the array is capped at 1000 elements, a simple scan is constant time. If we're imagining that n (size of array) and k (values limited to range -k to +k) can grow, then use trincot's answer, modifying the initial selection of right to be capped at min(2k, n) since the max increasing streak is of size 2k+1.
def f(arr)
0.upto(998) do |i|
return arr[i] if arr[i+1] < arr[i]
end
return arr[999] # if we reach this, arr[998] < arr[999] and arr[1000] is -infinity
end
A peak is defined as any element whose neighbours are both less than the element. In the example below, there are are two peak elements, 5 and 4 -
5,
4, 4,
3, 3, 3,
2, 2, 2, 2 ]
[ 1, 1,
So we can take three elements off the input, a, b, and c and -
if any a, b, or c is null, a valid comparison cannot be made and therefore there is no peak. stop the program
otherwise if a < b and b > c, a peak has been found. output the peak
finally drop a, and recur on the same input to search for additional peaks
That would look something like this -
function* peaks ([ a, b, c, ...more ]) {
if (a == null || b == null || c == null) return // 1
if (a < b && b > c) yield b // 2
yield *peaks([ b, c, ...more ]) // 3
}
for (const p of peaks([1,2,1,3,4,5,4,2,1,5,6,7,4]))
console.log("found peak", p)
found peak 2
found peak 5
found peak 7
If you have a significantly large input, which I'm sure LeetCode will give you, handling arrays like this will create an enormous amount of wasteful intermediate values. A better approach would be to use an index, i -
function* peaks (t, i = 0) {
let a = t[i], b = t[i + 1], c = t[i + 2]
if (a == null || b == null || c == null) return // 1
if (a < b && b > c) yield b // 2
yield *peaks(t, i + 1) // 3
}
for (const p of peaks([1,2,1,3,4,5,4,2,1,5,6,7,4]))
console.log("found peak", p)
found peak 2
found peak 5
found peak 7
And finally the use of recursion will limit the size of input that this program can handle. We can use a for loop to avoid any recursion limits -
function* peaks (t) {
let a, b, c
for (let i = 0; i<t.length; i++) {
a = t[i], b = t[i + 1], c = t[i + 2]
if (a == null || b == null || c == null) return // 1
if (a < b && b > c) yield b // 2
}
}
for (const p of peaks([1,2,1,3,4,5,4,2,1,5,6,7,4]))
console.log("found peak", p)
found peak 2
found peak 5
found peak 7
In the last two example we perform three array lookups per step, t[i], t[i + 1], and t[i + 2]. As an optimization we can reduce this to just a single lookup -
function* peaks (t) {
let a, b, c
for (let i = 0; i<t.length; i++) {
a = b, b = c, c = t[i]
if (a == null || b == null || c == null) continue
if (a < b && b > c) yield b
}
}
for (const p of peaks([1,2,1,3,4,5,4,2,1,5,6,7,4]))
console.log("found peak", p)
found peak 2
found peak 5
found peak 7
This works because our program effectively shifts the elements through a, b, and c in a "leftward" direction. Note the peaks in the b column -
a
b
c
...
null
null
1
2,1,3,4,5,4,2,1,5,6,7,4
null
1
2
1,3,4,5,4,2,1,5,6,7,4
1
2 (peak)
1
3,4,5,4,2,1,5,6,7,4
2
1
3
4,5,4,2,1,5,6,7,4
1
3
4
5,4,2,1,5,6,7,4
3
4
5
4,2,1,5,6,7,4
4
5 (peak)
4
2,1,5,6,7,4
5
4
2
1,5,6,7,4
4
2
1
5,6,7,4
2
1
5
6,7,4
1
5
6
7,4
5
6
7
4
6
7 (peak)
4
With our optimised program, we can drop several other unnecessary actions. Index i is no longer needed and we can skip having to worry about off-by-one errors caused by i++ and comparisons of i<t.length. Additionally, we can skip the c == null check as c will always represent an element of the input array -
function* peaks (t) {
let a, b, c
for (const v of t) {
a = b, b = c, c = v
if (a == null || b == null) continue
if (a < b && b > c) yield b
}
}
for (const p of peaks([1,2,1,3,4,5,4,2,1,5,6,7,4]))
console.log("found peak", p)
found peak 2
found peak 5
found peak 7
If you want to collect all peaks, you can use Array.from to convert any iterable into an array -
const allPeaks = peaks([1,2,1,3,4,5,4,2,1,5,6,7,4])
console.log(allPeaks)
[2, 5, 7]
Generators are a good fit for this kind of problem because they can be paused/canceled at any time, ie after the first peak is found -
const firstPeak (t) {
for (const p of peaks(t))
return p // <- immediately stops `peaks`
}
firstPeak([1,2,1,3,4,5,4,2,1,5,6,7,4])
2
If however you want to write firstPeaks without using a generator, there's nothing stopping you from doing so. Instead of using yield you can simply return -
function firstPeak (t) {
let a, b, c
for (const v of t) {
a = b, b = c, c = v
if (a == null || b == null) continue
if (a < b && b > c) return b // <-
}
}
console.log("first peak", firstPeak([1,2,1,3,4,5,4,2,1,5,6,7,4]))
first peak 2

Hackerank "Find the Factor" (in javascript) is failing because of time limit?

I had to find all the factors of positive number that evenly divide into a number and then return a p^th element of the list, sorted ascending.
If there is no p^th return 0.
I tested almost all the answers i could solve and found online:
Example:
function pthFactor(n, k) {
let arr = [];
for (let i = 1; i <= n; i++) {
if (n % i === 0) {
arr.push(i);
}
if (arr.length === k) {
return arr[arr.length - 1];
}
}
if (arr.length !== k) {
return 1;
}
};
or
var kthFactor = function(n, k) {
let factors = [1]
for(let i=2;i<=Math.floor(n/2);i++){
if(n%i == 0) factors.push(i)
}
factors.push(n)
return factors.length < k?-1:factors[k-1]
};
buts its failing 10 sec time limit.
What i am doing wrong ?
By the way i also tried Math.sqrt etc in order not to loop n times. Didn't work as well.
Do i need to know more than for loop ? Like dynamic programming etc to solve this ?
I couldn't find this challenge on HackerRank, but the 1492. The kth Factor of n challenge on LeetCode seems to be the same as you describe:
Given two positive integers n and k.
A factor of an integer n is defined as an integer i where n % i == 0.
Consider a list of all factors of n sorted in ascending order, return the kth factor in this list or return -1 if n has less than k factors.
It is strange that you used the name pthFactor in your first code block, while the name of the relevant argument is k and not p. That is quite confusing.
I also tried Math.sqrt etc in order not to loop n times
You cannot only consider factors up to the square root of 𝑛. For instance, 6 is a factor of 12, and even 12 is a factor of 12.
However, about half of all factors are in the range up to the square root. And those factors that are greater, are equal to the quotients found by performing the divisions with those smaller factors. So in the end, you can stop at the square root, provided that you collect all those quotients, and pick the right one from those if you didn't arrive at the 𝑘th factor yet.
So here is how that idea could be implemented:
var kthFactor = function(n, k) {
let bigFactors = [];
let quotient = n + 1;
for (let factor = 1; factor < quotient; factor++) {
if (n % factor === 0) {
quotient = n / factor;
k--;
if (k <= 0) return factor;
if (factor >= quotient) break;
bigFactors.push(quotient);
}
}
return bigFactors[bigFactors.length - k] ?? -1;
};
// A few test cases:
console.log(kthFactor(12, 3)); // 3
console.log(kthFactor(7, 2)); // 7
console.log(kthFactor(16, 5)); // 16
console.log(kthFactor(27, 5)); // -1
console.log(kthFactor(927, 1)); // 1
This code uses the same algorithm as in trincot's answer, but I think it's expressed more simply with recursion:
const kthFactor = (n, k, f = 1, r = []) =>
f * f > n
? r [k - 1] ?? -1
: n % f == 0
? k == 1 ? f : kthFactor (n, k - 1, f + 1, f * f == n ? r : [n / f, ...r])
: kthFactor (n, k, f + 1, r)
console .log (kthFactor (12, 3)); //=> 3
console .log (kthFactor (7, 2)); //=> 7
console .log (kthFactor (16, 5)); //=> 16
console .log (kthFactor (27, 5)); //=> -1
console .log (kthFactor (927, 1)); //=> 1
Here we track the factor we're testing (f) and the remaining numbers (r) to be tested, decreasing k whenever we find a new factor and adding the quotient n / f to our remaining numbers, increasing f on every step, stopping when we've gotten too big (f * f > n) and then finding the index in the ordered list of remaining numbers, returning the factor when k is 1, and simply recurring with an incremented factor otherwise.
The only tricky bit is the handling of perfect squares, where the square root should only be counted once, so we don't prepend it to the remaining numbers if f * f == n.
Because of the recursion, this is only suitable for a k so large. While we could make this tail recursive in a simple manner, it doesn't matter much as of this writing, since tail call optimization has still not happened in most engines.

Finding the best possible subset combinations of numbers to reach a given sum or closest to it

So, I have this problem I need to solve, apparently this is called Subset Sum Problem, except I need to get the subset not only when is exact to the given number, but the closest in case there is no exact sum that reaches the given number, it shouldn’t go over the reference number, only below, also if there are more than two possible subsets with the same result, I'd like to get the subset with the better distribution, from the highest to lowest number in the array as preferred, and limiting each subset to not overpass 10 times the same number, repeating is allowed, for example:
Here is the array with the predefined values:
let num = [64.20, 107, 535, 1070];
and a given number:
let investment = 806.45
One possible solution would be:
[0, 2, 1, 0] // this sums up to 749 (since there is no way to get to 806.45 with the given array)
Please notice this result is referring to how many times each value in nums is allowed to reach the sum:
But a better solution would be:
[4, 5, 0, 0] // this sums up to 791.80 (since there is no way to get to 806.45 with the given array)
And an even better solution (because takes in consideration the higher values over the lower ones first)
[4, 0, 1, 0] // this sums up to 791.80 also but you can see it's taking a higher value when possible.
Another important restriction would be that should never give a negative result.
So far i have tried like this (in VueJS):
getPackages(){
let investment = 806.45;
const num = [64.20, 107, 535, 1070]
let a, b, c, d;
let results = [];
a = investment / num[0] >= 0 ? (investment/num[0]) : 0;
b = investment / num[1] >= 0 ? (investment/num[1]) : 0;
c = investment / num[2] >= 0 ? (investment/num[2]) : 0;
d = investment / num[3] >= 0 ? (investment/num[3]) : 0;
let dResult = [], cResult = [], bResult = [], aResult = [];
for (let i = 0; i <= d; i++){
if (i>0){
dResult.push((i * num[3]))
}
}
for (let i = 0; i <= c; i++){
if (i>0){
cResult.push((i * num[2]))
}
}
for (let i = 0; i <= b; i++){
if (i>0){
bResult.push((i * num[1]))
}
}
for (let i = 0; i <= a; i++){
if (i>0){
aResult.push((i * num[0]))
}
}
let aResultCoincidences = [];
let bResultCoincidences = [];
let cResultCoincidences = [];
let dResultCoincidences = [];
bResult.forEach(value => {
aResult.findIndex(item => item === value) > 0 ? bResultCoincidences.push(aResult.findIndex(item => item === value)) : null
})
aResult.splice(0, Math.max(...bResultCoincidences) + 1)
cResult.forEach(value => {
bResult.findIndex(item => item === value) > 0 ? cResultCoincidences.push(bResult.findIndex(item => item === value)) : null
})
bResult.splice(0, Math.max(...cResultCoincidences) + 1)
dResult.forEach(value => {
cResult.findIndex(item => item === value) > 0 ? dResultCoincidences.push(cResult.findIndex(item => item === value)) : null
})
cResult.splice(0, Math.max(...dResultCoincidences) + 1)
this.package1 = aResult.length
this.package2 = bResult.length
this.package3 = cResult.length
this.package4 = dResult.length
},
What happens in my approach is that I try to get all possible results from each multiplication, and then I remove the ones that matches between the arrays I made with this combination, to finally get the result, but this is not well optimized, and I'm sure there is probably a better solution to this problem.
Anyway ignore the vuejs implementation, that's only to set the values in the DOM.
***ES6 solution would be awesome.
CodeSandbox to play around: CODESANDBOX LINK
Thanks in advance.
Here's an approach. I didn't look yours over carefully, and don't know if this offers any advantages over it.
It is a brute-force approach, simply taking the cross-product of the potential individual values for each category, summing the totals, and then reducing the list to find all the options closest to the target. I originally wrote this slightly differently, trying to capture the closest value whether over or under the target.
Here's an implementation with several helper functions:
const sum = (ns) =>
ns .reduce ((a, b) => a + b, 0)
const crossproduct = (xss) =>
xss.reduce((xs, ys) => xs.flatMap(x => ys.map(y => [...x, y])), [[]])
const range = (lo, hi) =>
[...Array(hi - lo)] .map ((_, i) => lo + i)
const call = (fn, ...args) =>
fn (...args)
const closestSums = (t, ns) =>
call (
(opts = crossproduct (ns .map (n => range (0, 1 + Math .ceil (t / n))))) =>
opts .map (xs => [xs, sum (xs .map ((x, i) => x * ns [i]))])
.reduce (
({best, opts}, [opt, tot]) =>
call (
(diff = t - tot) =>
diff >= 0 && diff < best
? {best: diff, opts: [opt]}
: diff >= 0 && diff == best
? {best, opts: [...opts, opt]}
: {best, opts}
),
{best: Infinity, opts: []}
) .opts
)
const byHigher = (as, bs) =>
as .reduceRight ((r, _, i) => r || (as[i] < bs[i] ? 1 : as[i] > bs[i] ? -1 : 0), 0)
const closestCounts = (t, ns) =>
closestSums (t * 100, ns .map (n => 100 * n))
.filter (cs => cs.every(c => c <= 10))
.sort (byHigher)
console .log (
closestCounts (806.45, [64.20, 107, 535, 1070]),
closestCounts (791.8, [64.20, 107, 535, 1070]) // exact match
)
.as-console-wrapper {max-height: 100% !important; top: 0}
Note that the wrapper multiplies everything by 100 to rid us of decimals and the potential floating point rounding errors. If that won't do, it would be easy enough to add some tolerance for matching.
The final sorting is naïve, assuming that your values are in ascending numeric order. If not, that sorter would have to get more sophisticated.
As I said, this is not likely to be efficient, and it might not gain anything on your version, but it's at least a different approach.
Lets first consider the 2-varible case. We want to find two numbers x1, x2 such that
f(x1,x2) = 64.20 * x1 + 107 * x2 - 806.45
If we allow x1, x2 to be real numbers then this is just the equation of a line. In the problem we only want positive integer solutions, i.e. the grid points. I've badly drawn the grid points with those above the line in red and those below the line in blue. The problem is then finding grid point closest to the line.
Note that there are a lot of points we never need to consider, only the red and blue grid point are possible candidates.
Generating the coloured grid points is quite simple. We could loop through the x1 values from 0 to 13, calculate the real x2 value by solving the equation
x2 = (806.45 - 64.20 * x1)/107
and finding the floor(x2) and ceil(x2). Slightly better is looping through the x2 which run from 0 to 8 solving for x1
x1 = (806.45 - 107 * x2)/64.20.
Another approach might be some kind of zero following routine. If you are at a given grid point (x1,x2) calculate f(x1,x2) if it is less than 0 we need to consider (x1+1,x2) or (x1,x2+1), if f(x1,x2) is greater than zero consider (x1-1,x2) or (x1,x2-x1). I don't think the complication of this approach really brings any benefit.
If we now move to 3D, we have an equation in three variables, x1, x2, x3
f(x1,x2,x3) = 64.20 * x1 + 107 * x2 + 535 * x3 - 806.45
This defines a plane in 3D, requiring all variables to be non-negative restricts it to a triangular part of the plane.
To find candidate points here we could loop through possible interger pairs (x1,x2), then find the exact x3 value
x3 = (806.45 - 64.20 * x1 - 107 * x2)/ 535
and its floor and ceiling. The candidate (x1,x2) only line in a candidate triangle so we can use the following procedure.
// First find max possible x1
x1_max = ceil(806.45/64.20)
for(x1 = 0; x1< x1_max; ++x1) {
// for a given x1 find the maximum x2
x2_max = ceil((806.45 - 64.20*x1)/107)
for(x2=0;x2<x2_max;++x2) {
// for a given (x1,x2) find the exact x3
x3 = (806.45 - 64.20 * x1 - 107 * x2)/ 535
// and its floor and ceiling
x3l = floor(x3); x3h = ceil(x3);
add_to_candidates(x1,x2,x3l);
add_to_candidates(x1,x2,x3h);
}
}
Once you can candidates simply select the one with the smallest absolute value.
A similar idea would extend to more variables.

Randomly split up elements from a stream of data without knowing the total number of elements

Given a "split ratio", I am trying to randomly split up a dataset into two groups. The catch is, that I do not know beforehand how many items the dataset contains. My library receives the data one by one from an input stream and is expected to return the data to two output streams. The resulting two datasets should ideally be exactly split into the given split ratio.
Illustration:
┌─► stream A
input stream ──► LIBRARY ──┤
└─► stream B
For example, given a split ratio of 30/70, stream A would be expected to receive 30% of the elements from the input stream and stream B the remaining 70%. The order must remain.
My ideas so far:
Idea 1: "Roll the dice" for each element
The obvious approach: For each element the algorithm randomly decides if the element should go into stream A or B. The problem is, that the resulting data sets might be far off from the expected split ratio. Given a split ratio of 50/50, the resulting data split might be something far off (could even be 100/0 for very small data sets). The goal is to keep the resulting split ratio as close as possible to the desired split ratio.
Idea 2: Use a cache and randomize the cached data
Another idea is to cache a fixed number of elements before passing them on. This would result in caching 1000 elements and shuffling the data (or their corresponding indices to keep the order stable), splitting them up and passing the resulting data sets on. This should work very well, but I'm unsure if the randomization is really random for large data sets (I imagine there will patterns when looking at the distribution).
Both algorithms are not optimal, so I hope you can help me.
Background
This is about a layer-based data science tool, where each layer receives data from the previous layer via a stream. This layer is expected to split the data (vectors) up into a training and test set before passing them on. The input data can range from just a few elements to a never ending stream of data (hence, the streams). The code is developed in JavaScript, but this question is more about the algorithm than the actual implementation.
You could adjust the probability as it shifts away from the desired rate.
Here's an example along with tests for various levels of adjusting the probability. As we increase the adjustments, we see the stream splitter deviates less from the ideal ratio, but it also means its less random (knowing the previous values, you can predict the next values).
// rateStrictness = 0 will lead to "rolling the dice" for each invocations
// higher values of rateStrictness will lead to strong "correcting" forces
function* splitter(desiredARate, rateStrictness = .5) {
let aCount = 0, bCount = 0;
while (true) {
let actualARate = aCount / (aCount + bCount);
let aRate = desiredARate + (desiredARate - actualARate) * rateStrictness;
if (Math.random() < aRate) {
aCount++;
yield 'a';
} else {
bCount++;
yield 'b';
}
}
}
let test = (desiredARate, rateStrictness) => {
let s = splitter(desiredARate, rateStrictness);
let values = [...Array(1000)].map(() => s.next().value);
let aCount = values.map((_, i) => values.reduce((count, v, j) => count + (v === 'a' && j <= i), 0));
let aRate = aCount.map((c, i) => c / (i + 1));
let deviation = aRate.map(a => a - desiredARate);
let avgDeviation = deviation.reduce((sum, dev) => sum + dev, 0) / deviation.length;
console.log(`inputs: desiredARate = ${desiredARate}; rateStrictness = ${rateStrictness}; average deviation = ${avgDeviation}`);
};
test(.5, 0);
test(.5, .25);
test(.5, .5);
test(.5, .75);
test(.5, 1);
test(.5, 10);
test(.5, 100);
How about rolling the dice twice: First of all decide wether the stream should be chosen randomly or if the ratio should be taken into account. Then for the first case, roll the dice, for the second case take the ratio. Some pseudocode:
const toA =
Math.random() > 0.5 // 1 -> totally random, 0 -> totally equally distributed
? Math.random() > 0.7
: (numberA / (numberA + numberB) > 0.7);
That's just an idea I came up with, I haven't tried that ...
Here is a way that combines both of your ideas: It uses a cache. As long as the amount of elements in cache can handle that if the stream ends, we can still approach target distribution, we just roll a dice. If not, we add it to the cache. When input stream ends, we shuffle elements in cache and send them trying to approach distribution. I am not sure if there is any gain in this over just forcing element to go to x if distribution is straying off too much in terms of randomness.
Beware that this approach does not preserve order from original input stream. A few other things could be added such as cache limit and relaxing distribution error (using 0 here). If you need to preserve order, it can be done by sending cache value and pushing to cache current one instead of just sending current one when there are still elements in cache.
let shuffle = (array) => array.sort(() => Math.random() - 0.5);
function* generator(numElements) {
for (let i = 0; i < numElements;i++) yield i;
}
function* splitter(aGroupRate, generator) {
let cache = [];
let sentToA = 0;
let sentToB = 0;
let bGroupRate = 1 - aGroupRate;
let maxCacheSize = 0;
let sendValue = (value, group) => {
sentToA += group == 0;
sentToB += group == 1;
return {value: value, group: group};
}
function* retRandomGroup(value, expected) {
while(Math.random() > aGroupRate != expected) {
if (cache.length) {
yield sendValue(cache.pop(), !expected);
} else {
yield sendValue(value, !expected);
return;
}
}
yield sendValue(value, expected);
}
for (let value of generator) {
if (sentToA + sentToB == 0) {
yield sendValue(value, Math.random() > aGroupRate);
continue;
}
let currentRateA = sentToA / (sentToA + sentToB);
if (currentRateA <= aGroupRate) {
// can we handle current value going to b group?
if ((sentToA + cache.length) / (sentToB + sentToA + 1 + cache.length) >= aGroupRate) {
for (val of retRandomGroup(value, 1)) yield val;
continue;
}
}
if (currentRateA > aGroupRate) {
// can we handle current value going to a group?
if (sentToA / (sentToB + sentToA + 1 + cache.length) <= aGroupRate) {
for (val of retRandomGroup(value, 0)) yield val;
continue;
}
}
cache.push(value);
maxCacheSize = Math.max(maxCacheSize, cache.length)
}
shuffle(cache);
let totalElements = sentToA + sentToB + cache.length;
while (sentToA < totalElements * aGroupRate) {
yield {value: cache.pop(), group: 0}
sentToA += 1;
}
while (cache.length) {
yield {value: cache.pop(), group: 1}
}
yield {cache: maxCacheSize}
}
function test(numElements, aGroupRate) {
let gen = generator(numElements);
let sentToA = 0;
let total = 0;
let cacheSize = null;
let split = splitter(aGroupRate, gen);
for (let val of split) {
if (val.cache != null) cacheSize = val.cache;
else {
sentToA += val.group == 0;
total += 1
}
}
console.log("required rate for A group", aGroupRate, "actual rate", sentToA / total, "cache size used", cacheSize);
}
test(3000, 0.3)
test(5000, 0.5)
test(7000, 0.7)
Let's say you have to maintain a given ratio R for data items going to stream A, e.g. R = 0.3 as per your example. Then on receiving each data item count the
total number of items and the items passed on to stream A and decide for each item if it goes to A based on what choice keeps you closer to your target ratio R.
That should be about the best you can do for any size of the data set. As for randomness the resulting streams A and B should be about as random as your input stream.
Let's see how this plays out for the first couple of iterations:
Example: R = 0.3
N : total number of items processed so far (initially 0)
A : numbers passed on to stream A so far (initially 0)
First Iteration
N = 0 ; A = 0 ; R = 0.3
if next item goes to stream A then
n = N + 1
a = A + 1
r = a / n = 1
else if next item goes to stream B
n = N + 1
a = A
r = a / n = 0
So first item goes to stream B since 0 is closer to 0.3
Second Iteration
N = 1 ; A = 0 ; R = 0.3
if next item goes to stream A then
n = N + 1
a = A + 1
r = a / n = 0.5
else if next item goes to stream B
n = N + 1
a = A
r = a / n = 0
So second item goes to stream A since 0.5 is closer to 0.3
Third Iteration
N = 2 ; A = 1 ; R = 0.3
if next item goes to stream A then
n = N + 1
a = A + 1
r = a / n = 0.66
else if next item goes to stream B
n = N + 1
a = A
r = a / n = 0.5
So third item goes to stream B since 0.5 is closer to 0.3
Fourth Iteration
N = 3 ; A = 1 ; R = 0.3
if next item goes to stream A then
n = N + 1
a = A + 1
r = a / n = 0.5
else if next item goes to stream B
n = N + 1
a = A
r = a / n = 0.25
So third item goes to stream B since 0.25 is closer to 0.3
So this here would be the pseudo code for deciding each data item:
if (((A + 1) / (N + 1)) - R) < ((A / (N + 1)) - R ) then
put the next data item on stream A
A = A + 1
N = N + 1
else
put the next data item on B
N = N + 1
As discussed in the comments below, that is not random in the sense intended by the OP. So once we know the correct target stream for the next item we flip a coin to decide if we actually put it there, or introduce an error.
if (((A + 1) / (N + 1)) - R) < ((A / (N + 1)) - R ) then
target_stream = A
else
target_stream = B
if random() < 0.5 then
if target_stream == A then
target_stream = B
else
target_stream = A
if target_stream == A then
put the next data item on stream A
A = A + 1
N = N + 1
else
put the next data item on B
N = N + 1
Now that could lead to an arbitrarily large error overall. So we have to set an error limit L and check how far off the resulting ratio is from the target R when errors are about to be introduced:
if (((A + 1) / (N + 1)) - R) < ((A / (N + 1)) - R ) then
target_stream = A
else
target_stream = B
if random() < 0.5 then
if target_stream == A then
if abs((A / (N + 1)) - R) < L then
target_stream = B
else
if abs(((A + 1) / (N + 1)) - R) < L then
target_stream = A
if target_stream == A then
put the next data item on stream A
A = A + 1
N = N + 1
else
put the next data item on B
N = N + 1
So here we have it: Processing data items one by one we know the correct stream to put the next item on, then we introduce random local errors and we are able to limit the overall error with L.
Looking at the two numbers you wrote (chunk size of 1000, probability split of 0.7) you might not have any problem with the simple approach of just rolling the dice for every element.
Talking about probability and high numbers, you have the law of large numbers.
This means, that you do have a risk of splitting the streams very unevenly into 0 and 1000 elements, but in practice this is veeery unlikely to happen. As you are talking about testing and training sets I also do not expect your probability split to be far off of 0.7. And in case you are allowed to cache, you can still use this for the first 100 elements, so that you are sure to have enough data for the law of large numbers to kick in.
This is the binomial distribution for n=1000, p=.7
In case you want to reproduce the image with other parameters
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import binom
index = np.arange(binom.ppf(0.01, n, p), binom.ppf(0.99, n, p))
pd.Series(index=index, data=binom.pmf(x, n, p)).plot()
plt.show()

Codility Ladder javascript - not understanding a detail that jumps the answer from 37 to 100%

I'm trying to solve all the lessons on codility but I failed to do so on the following problem: Ladder by codility
I've searched all over the internet and I'm not finding a answer that satisfies me because no one answers why the max variable impacts so much the result.
So, before posting the code, I'll explain the thinking.
By looking at it I didn't need much time to understand that the total number of combinations it's a Fibonacci number, and removing the 0 from the Fibonacci array, I'd find the answer really fast.
Now, afterwards, they told that we should return the number of combinations modulus 2^B[i].
So far so good, and I decided to submit it without the var max, then I got a score of 37%.. I searched all over the internet and the 100% result was similar to mine but they added that max = Math.pow(2,30).
Can anyone explain to me how and why that max influences so much the score?
My Code:
// Powers 2 to num
function pow(num){
return Math.pow(2,num);
}
// Returns a array with all fibonacci numbers except for 0
function fibArray(num){
// const max = pow(30); -> Adding this max to the fibonaccy array makes the answer be 100%
const arr = [0,1,1];
let current = 2;
while(current<=num){
current++;
// next = arr[current-1]+arr[current-2] % max;
next = arr[current-1]+arr[current-2]; // Without this max it's 30 %
arr.push(next);
}
arr.shift(); // remove 0
return arr;
}
function solution(A, B) {
let f = fibArray(A.length + 1);
let res = new Array(A.length);
for (let i = 0; i < A.length; ++i) {
res[i] = f[A[i]] % (pow(B[i]));
}
return res;
}
console.log(solution([4,4,5,5,1],[3,2,4,3,1])); //5,1,8,0,1
// Note that the console.log wont differ in this solution having max set or not.
// Running the exercise on Codility shows the full log with all details
// of where it passed and where it failed.
The limits for input parameters are:
Assume that:
L is an integer within the range [1..50,000];
each element of array A is an integer within the range [1..L];
each element of array B is an integer within the range [1..30].
So the array f in fibArray can be 50,001 long.
Fibonacci numbers grow exponentially; according to this page, the 50,000th Fib number has over 10,000 digits.
Javascript does not have built-in support for arbitrary precision integers, and even doubles only offer ~14 s.f. of precision. So with your modified code, you will get "garbage" values for any significant value of L. This is why you only got 30%.
But why is max necessary? Modulo math tells us that:
(a + b) % c = ([a % c] + [b % c]) % c
So by applying % max to the iterative calculation step arr[current-1] + arr[current-2], every element in fibArray becomes its corresponding Fib number mod max, without any variable exceeding the value of max (or built-in integer types) at any time:
fibArray[2] = (fibArray[1] + fibArray[0]) % max = (F1 + F0) % max = F2 % max
fibArray[3] = (F2 % max + F1) % max = (F2 + F1) % max = F3 % max
fibArray[4] = (F3 % max + F2 % max) = (F3 + F2) % max = F4 % max
and so on ...
(Fn is the n-th Fib number)
Note that as B[i] will never exceed 30, pow(2, B[i]) <= max; therefore, since max is always divisible by pow(2, B[i]), applying % max does not affect the final result.
Here is a python 100% answer that I hope offers an explanation :-)
In a nutshell; modulus % is similar to 'bitwise and' & for certain numbers.
eg any number % 10 is equivalent to the right most digit.
284%10 = 4
1994%10 = 4
FACTS OF LIFE:
for multiples of 2 -> X % Y is equivalent to X & ( Y - 1 )
precomputing (2**i)-1 for i in range(1, 31) is faster than computing everything in B when super large arrays are given as args for this particular lesson.
Thus fib(A[i]) & pb[B[i]] will be faster to compute than an X % Y style thingy.
https://app.codility.com/demo/results/trainingEXWWGY-UUR/
And for completeness the code is here.
https://github.com/niall-oc/things/blob/master/codility/ladder.py
Here is my explanation and solution in C++:
Compute the first L fibonacci numbers. Each calculation needs modulo 2^30 because the 50000th fibonacci number cannot be stored even in long double, it is so big. Since INT_MAX is 2^31, the summary of previously modulo'd numbers by 2^30 cannot exceed that. Therefore, we do not need to have bigger store and/or casting.
Go through the arrays executing the lookup and modulos. We can be sure this gives the correct result since modulo 2^30 does not take any information away. E.g. modulo 100 does not take away any information for subsequent modulo 10.
vector<int> solution(vector<int> &A, vector<int> &B)
{
const int L = A.size();
vector<int> fibonacci_numbers(L, 1);
fibonacci_numbers[1] = 2;
static const int pow_2_30 = pow(2, 30);
for (int i = 2; i < L; ++i) {
fibonacci_numbers[i] = (fibonacci_numbers[i - 1] + fibonacci_numbers[i - 2]) % pow_2_30;
}
vector<int> consecutive_answers(L, 0);
for (int i = 0; i < L; ++i) {
consecutive_answers[i] = fibonacci_numbers[A[i] - 1] % static_cast<int>(pow(2, B[i]));
}
return consecutive_answers;
}

Categories