Picking best combination of items that have target sums

Picking best combination of items that have target sums - javascript

I am creating a Meal Plan Generator (NodeJS/Javascript with MongoDB) that also has a randomized function in it. I have in a MongoDB 400k Recipe objects.
Recipe object sample:
{"name": "Roasted Beef", "calories": 450, "carbs":10 , "fats": 20.2 , "proteins": 55.4}
(Here is a 50 records JSON sample of the Recipes: http://gofile.io/?c=h0xZ5C)
(And here is a real JSON dump for over 400k recipes: http://gofile.io/?c=0Utnej You can ignore the "_id" and "_v" fields)
Say a user needs to generate Meal Plan (a combination of MINIMUM 3 and MAXIMUM 6 Recipe objects) to have the following totals (summation of all values in all of its objects) to be each within a target range approximately. But also you can change the serving_size of each Recipe, which is a multiplier for all of its values, where it could be any value of: 1, 1.5, 2, 2.5, 3, 3.5 or 4. This multiplier is needed because we can't find meals that will always match the targets when added, so we need to change the serving size to reach the targets.
Example: Generate a Meal Plan that has:
Total calories: 2,000
carbs: [20-50]
fats: [40-70]
proteins: [40-90]
So say we do 3 servings of a Recipe, and 2.5 of another, and 1 serving of a third, to reach the target.
I had a solution in mind to iterate over say a set of randomly selected 1,000 Recipes, then try all combinations with various serving_size for each to match the targets. OR a recursive function that picks a random meal, then tries to find another meal and tries to change its serving_size, etc. until we reach the targets.
Meal Plans should be as random as possible, so goal here is not just optimize for the targets but also keep things random as much as possible.
Any known solution to such problem? Just to avoid re-inventing the wheel if any algorithm is known to solve this problem.
Thanks for any help!

Hmm don't know a good algorithm for that. But just a warning for the recursive approach, it should work reasonably well for easy constraints. But when choosing extreme constraints (say we take a recipe in your list which is a pareto minimum, multiply the values by 3 and set them as maximum values, then 3 times the recipe might be the only solution) just trying random combinations can go through an extreme number of combinations.
You should try to limit the options as much as possible without excluding any viable combinations (if you want to retain maximum randomness). If you can make it likely that there is a viable result it can cut down on the recipes that have to be tested (or if the conditions are thorough enough to ensure it you can avoid backtracking entirely). Like for single parameter it is easy: If you have k max for parameter a and start with a recipe that has m in the parameter then min(parameter a)*2+m<=k has to be fulfilled otherwise there can't be a combination including the recipe that can honor the maximum bound. (And for a min of k I suppose m+max(parameter a)*5*4>=k would have to be true, but with the option of going up to 6 recipes and multiplying serving size by up to 4 that should happen less often than violation maxima.)
The tricky part is ensuring that the combination of restrictions can be fulfilled... (Though the single parameter check should serve as a viable first check to lower the numbers of recipes to consider.) I haven't entirely thought this through but for the max part: For it to be fulfill able with the recipe you want to try it has to be fulfill able by adding two recipes, so one of the pareto minima for two recipes has to be able to fulfill it. I think you should get the pareto minima for a combo of two recipes by going through all combinations of pareto minima for a single recipe. I am unsure how many pareto minima will be in the database but if it is a reasonable number you could pre generate the ones for 2 recipes and then try whether there is a pareto minima that when added to the recipe in question would not violate the maximum constraints. (Other way around for the minima but this approach doesn't guarantee that you can fulfill both boundaries at once.)
(Wanted to just write this as comment since it is not much of an answer but it got too long.)

Related

Shuffling cards in a repeater using images from a dataset in Wix

I am creating a page on Wix where I have a repeater that only brings 3 items from my dataset at a time when clicking on the shuffling button (there are 22 cards in the dataset) that is supposed to shuffle and bring different combinations.
What I expect:
Click on the button, then it brings 3 random cards (that are images of cards in my data set) from a deck of 22 cards.
What is happening:
It is bringing the same few combinations of cards and it is not actually random and some cards never shows up.
Here is my code:
export function button7_click(event) {
// clear any filters in the dataset
$w("#dynamicDataset").setFilter( wixData.filter() );
// get size of collection that is connected to the dataset
let count = $w("#dynamicDataset").getTotalCount();
// get random number using the size as the maximum
let idx = Math.floor(Math.random() * count-1);
// set the current item index of the dataset
$w("#dynamicDataset").setCurrentItemIndex(idx);
}
What can I do to bring really random spread of 3 cards?

JavaScript's Math.random() function is fast, but it has issues. First, it's not seedable, second, its randomness leaves little to be desired. According to the Birthday Paradox, the existence of duplicate values does not mean that they are random. Although there are many ways to change the Random number pattern in JavaScript, here are three different methods adopted by the community:
Seedable Random Number Generator Algorithm: This algorithm allows to generate a seedable random number generator in Javascript that you can tweak to generate a deterministic random number sequence. Browsers do not provide a built-in way to seed Math.random(), so this solution is useful both when you need a completely predictable repeatable pseudo-random sequence and when you need a robust seed that is much more unpredictable than your browser's.
Mersenne Twister: This algorithm compensates for not being allowed to specify an initial value for Math.random().
Alea, PRNG Algorithm: A Pseudo-Random Number Generator (PRNG) is an algorithm for generating a sequence of numbers whose properties approximate those of sequences of random numbers. The Alea package implements this algorithm.

Using K means to map people to a group based on their score of a leader

I am attempting to create an algorithm to match people to a leader of a group. I've discovered K means clustering, and think this is the way to go. The project is in javascript so I've found a package on npm that implements K-means. Now I am confused as there aren't any examples I can find similar to this, but if I have 20 people who give scores to 4 people based on their ability to lead, how do I format my data to be used by the k-means to assign the 20 people to groups?
A screenshot google sheets of my data
To be precise: based on that screenshot I am trying to map the followers 2-20, to leaders L1-L4, based on their scores of the leaders 0,0.5,1,1.5 with 1.5 being the highest score (shortest distance). Ideally having similar sized groups.
What I've tried:
var data = [[0.5,0.5,0,0],
[1.5,0,0.5,0],
[1.5,0,1.5,1],
[1.5,0.5,0,0],
[0.5,1.5,0,1],
[0.5,1.5,0.5,1],
[0.5,0.5,1,0],
[1,0,1,1],
[1.5,1.5,1,0.5],
[0.5,1,0.5,1],
[1,1,1,1],
[1.5,1.5,0.5,1],
[1,1.5,1,0.5],
[0,1.5,0.5,1.5],
[1.5,1,0.5,0],
[0.5,0,0,1.5],
[0.5,0,0,1.5],
[1.5,0.5,1.5,1],
[0.5,1.5,1,1]];
var res = skmeans(data,4);
But this just grouped the followers amongst themselves based on who scored the leaders similarly, instead of using the leaders as centroids. Open to other clustering formats, or if I'm completely off target info on better algorithms to accomplish this task.

What K-means clustering does here is to get 4 arbitrary points and calculate shortest distance to each data point to create 4 clusters as you requested. Then it will get the MEAN value of each cluster formed after the first iteration to define centroids for the next iteration. Since the first iteration takes arbitrary points, the result you got is obvious.
Defining expected leaders as centroids instead of letting the algorithm to get arbitrary points as centroids might help.
skmeans(data,k,[centroids],[iterations])
Reference: https://www.npmjs.com/package/skmeans#skmeansdatakcentroidsiterations

How to generically solve the problem of generating incremental integer IDs in JavaScript

I have been thinking about this for a few days trying to see if there is a generic way to write this function so that you don't ever need to worry about it breaking again. That is, it is as robust as it can be, and it can support using up all of the memory efficiently and effectively (in JavaScript).
So the question is about a basic thing. Often times when you create objects in JavaScript of a certain type, you might give them an ID. In the browser, for example with virtual DOM elements, you might just give them a globally unique ID (GUID) and set it to an incrementing integer.
GUID = 1
let a = createNode() // { id: 1 }
let b = createNode() // { id: 2 }
let c = createNode() // { id: 3 }
function createNode() {
return { id: GUID++ }
}
But what happens when you run out of integers? Number.MAX_SAFE_INTEGER == 2⁵³ - 1. That is obviously a very large number: 9,007,199,254,740,991 quadrillions perhaps. Many billions of billions. But if JS can reach 10 million ops per second lets say in a pick of the hat way, then that is about 900,719,925s to reach that number, or 10416 days, or about 30 years. So in this case if you left your computer running for 30 years, it would eventually run out of incrementing IDs. This would be a hard bug to find!!!
If you parallelized the generation of the IDs, then you could more realistically (more quickly) run out of the incremented integers. Assuming you don't want to use a GUID scheme.
Given the memory limits of computers, you can only create a certain number of objects. In JS you probably can't create more than a few billion.
But my question is, as a theoretical exercise, how can you solve this problem of generating the incremented integers such that if you got up to Number.MAX_SAFE_INTEGER, you would cycle back from the beginning, yet not use the potentially billions (or just millions) that you already have "live and bound". What sort of scheme would you have to use to make it so you could simply cycle through the integers and always know you have a free one available?
function getNextID() {
if (i++ > Number.MAX_SAFE_INTEGER) {
return i = 0
} else {
return i
}
}
Random notes:
The fastest overall was Chrome 11 (under 2 sec per billion iterations, or at most 4 CPU cycles per iteration); the slowest was IE8 (about 55 sec per billion iterations, or over 100 CPU cycles per iteration).
Basically, this question stems from the fact that our typical "practical" solutions will break in the super-edge case of running into Number.MAX_SAFE_INTEGER, which is very hard to test. I would like to know some ways where you could solve for that, without just erroring out in some way.

But what happens when you run out of integers?
You won't. Ever.
But if JS can reach 10 million ops per second [it'll take] about 30 years.
Not much to add. No computer will run for 30 years on the same program. Also in this very contrived example you only generate ids. In a realistic calculation you might spend 1/10000 of the time to generate ids, so the 30 years turn into 300000 years.
how can you solve this problem of generating the incremented integers such that if you got up to Number.MAX_SAFE_INTEGER, you would cycle back from the beginning,
If you "cycle back from the beginning", they won't be "incremental" anymore. One of your requirements cannot be fullfilled.
If you parallelized the generation of the IDs, then you could more realistically (more quickly) run out of the incremented integers.
No. For the ids to be strictly incremental, you have to share a counter between these parallelized agents. And access to shared memory is only possible through synchronization, so that won't be faster at all.
If you still really think that you'll run out of 52bit, use BigInts. Or Symbols, depending on your usecase.

Logic for grouping similar parameters

I am trying to figure out the best logic for grouping parameters within a certain tolerance. It's easier to explain with an example...
Task1: parameter1=140
Task2: parameter1=137
Task3: parameter1=142
Task4: parameter1=139
Task5: parameter1=143
If I want to group tasks if they are within 2 of each other, I think I need to do several passes. For the example, the desired result would be this:
Task4 covers Task1, Task2, and Task4
Task3 covers Task3 and Task5
There are multiple possibilities because Task1 could also cover 3 and 4 but then 2 and 5 would be two additional tasks that are by themselves. Basically, I would want the fewest number of tasks that are within 2 of each other.
I am currently trying to do this in excel VBA but I may port the code to php later. I really just don't know where to start because it seems pretty complex.

You'l need a clustering algorithm i'd assume. Consider the following parameters-
Task1: parameter1=140
Task2: parameter1=142
Task3: parameter1=144
Task4: parameter1=146
Task5: parameter1=148
Depending on your logic the clustering will get weird here. If you simply check each number for numbers near it all of these will be clustered. But does 140 and 148 deserve to be in the same cluster? Try kmeans clustering. There will be some gray area but the result will be relatively accurate.
http://en.wikipedia.org/wiki/K-means_clustering

You can group tasks in a single pass if you decide the group boundaries before looking at the tasks. Here's a simple example using buckets of width 4, based on your goal to group tasks within +/-2 of each other:
Dim bucket As Integer
For Each parameter In parameters
bucket = Round(parameter / 4, 0)
' ... do something now that you know what bucket the task is in
Next parameter
If the groups provided by fixed buckets don't fit the data closely enough for your needs, you will need to use an algorithm that makes multiple passes. Since the data in your example is one-dimensional, you can (and should!) use simpler techniques than k-means clustering.
A good next place to look might be Literate Jenks Natural Breaks and How The Idea Of Code is Lost, with a very well commented Jenks Natural Breaks Optimization in JavaScript.

Getting the index of an object in an ordered list in Firebase

I'm building a leaderboard using Firebase. The player's position in the leaderboard is tracked using Firebase's priority system.
At some point in my program's execution, I need to know what position a given user is at in the leaderboard. I might have thousands of users, so iterating through all of them to find an object with the same ID (thus giving me the index) isn't really an option.
Is there a more performant way to determine the index of an object in an ordered list in Firebase?
edit: I'm trying to figure out the following:
/
---- leaderboard
--------user4 {...}
--------user1 {...}
--------user3 {...} <- what is the index of user3, given a snapshot of user3?
--------...

If you are processing tens or hundreds of elements and don't mind taking a bandwidth hit, see Katos answer.
If you're processing thounsands of records, you'll need to follow an approach outlined in principle in pperrin's answer. The following answer details that.
Step 1: setup Flashlight to index your leaderboard with ElasticSearch
Flashlight is a convenient node script that syncs elasticsearch with Firebase data.
Read about how to set it up here.
Step 2: modify Flashlight to allow you to pass query options to ElasticSearch
as of this writing, Flashlight gives you no way to tell ElasticSearch you're only interested in the number of documents matched and not the documents themselves.
I've submitted this pull request which uses a simple one-line fix to add this functionality. If it isn't closed by the time you read this answer, simply make the change in your copy/fork of flashlight manually.
Step 3: Perform the query!
This is the query I sent via Firebase:
{
index: 'firebase',
type: 'allTime',
query: {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"range": {
"points": {
"gte": minPoints
}
}
}
}
},
options: {
"search_type": "count"
}
};
Replace points with the name of the field tracking points for your users, and minPoints with the number of points of the user whose rank you are interested in.
The response will look something like:
{
max_score: 0,
total: 2
}
total is the number of users who have the same or greater number of points -- in other words, the user's rank!

Since Firebase stores object, not arrays, the elements do not have an "index" in the list--JavaScript and by extension JSON objects are inherently unordered. As explained in Ordered Docs and demonstrated in the leaderboard example, you accomplish ordering by using priorities.
A set operation:
var ref = new Firebase('URL/leaderboard');
ref.child('user1').setPriority( newPosition /*score?*/ );
A read operation:
var ref = new Firebase('URL/leaderboard');
ref.child('user1').once('value', function(snap) {
console.log('user1 is at position', snap.getPriority());
});

To get the info you want, at some point a process is going to have to enumerate the nodes to count them. So the question is then where/when the couting takes place.
Using .count() in the client will mean it is done every time it is needed, it will be pretty accurate, but procesing/traffic heavy.
If you keep a separate index of the count it will need regular refreshing, or constant updating (each insert causeing a shuffling up of the remaining entries).
Depending on the distribution and volume of your data I would be tempted to go with a background process that just updates(/rebuilds) the index every (say) ten or twenty additions. And indexes every (say) 10 positions.
"Leaderboard",$UserId = priority=$score
...
"Rank",'10' = $UserId,priority=$score
"Rank",'20' = $UserId,priority=$score
...
From a score you get the rank within ten and then using a startat/endat/count on your "Leaderboard" get it down to the unit.
If your background process is monitoring the updates to the leaderboard, it could be more inteligent about its updates to the index either updating only as requried.

I know this is an old question, but I just wanted to share my solutions for future reference. First of all, the Firebase ecosystem has changed quite a bit, and I'm assuming the current best practices (i.e. Firestore and serverless functions). I personally considered these solutions while building a real application, and ended up picking the scheduled approximated ranks.
Live ranks (most up-to-date, but expensive)
When preparing a user leaderboard I make a few assumptions:
The leaderboard ranks users based on a number which I'll call 'score' from now on
New users rank lowest on the leaderboard, so upon user creation, their rank is set to the total user count (with a Firebase function, which sets the rank, but also increases the 'total user' counter by 1).
Scores can only increase (with a few adaptations decreasing scores can also be supported).
Deleted users keep a 'ghost' spot on the leaderboard.
Whenever a user gets to increase their score, a Firebase function responds to this change by querying all surpassed users (whose score is >= the user's old score but < the user's new score) and have their rank decreased by 1. The user's own rank is increased by the size of the before-mentioned query.
The rank is now immediately available on client reads. However, the ranking updates inside of the proposed functions are fairly read- and write-heavy. The exact number of operations depends greatly on your application, but for my personal application a great frequency of score changes and relative closeness of scores rendered this approach too inefficient. I'm curious if anyone has found a more efficient (live) alternative.
Scheduled ranks (simplest, but expensive and periodic)
Schedule a Firebase function to simply sort the entire user collection by ascending score and write back the rank for each (in a batch update). This process can be repeated daily, or more frequent/infrequent depending on your application. For N users, the function always makes N reads and N writes.
Scheduled approximated ranks (cheapest, but non-precise and periodic)
As an alternative for the 'Scheduled ranks' option, I would suggest an approximation technique: instead of writing each user's exact rank upon for each scheduled update, the collection of users (still sorted as before) is simply split into M chunks of equal size and the scores that bound these chunks are written to a separate 'stats' collection.
So, for example: if we use M = 3 for simplicity and we read 60 users sorted by ascending score, we have three chunks of 20 users. For each of the (still sorted chunks) we get the score of the last (lowest score of chunk) and the first user (highest score of chunk) (i.e. the range that contains all scores of that chunk). Let's say that the chunk with the lowest scores has scores ranging from 20-120, the second chunk has scores from 130-180 and the chunk with the highest scores has scores 200-350. We now simply write these ranges to a 'stats' collection (the write-count is reduced to 1, no matter how many users!).
Upon rank retrieval, the user simply reads the most recent 'stats' document and approximates their percentile rank by comparing the ranges with their own score. Of course it is possible that a user scores higher than the greatest score or lower than the lowest score from the previous 'stats' update, but I would just consider them belonging to the highest scoring group and the lowest scoring group respectively.
In my own application I used M = 20 and could therefore show the user percentile ranks by 5% accuracy, and estimate even within that range using linear interpolation (for example, if the user score is 450 and falls into the 40%-45%-chunk ranging from 439-474, we estimate the user's percentile rank to be 40 + (450 - 439) / (474 - 439) * 5 = 41.57...%).
If you want to get real fancy you can also estimate exact percentile ranks by fitting your expected score distribution (e.g. normal distribution) to the measured ranges.
Note: all users DO need to read the 'stats' document to approximate their rank. However, in most applications not all users actually view the statistics (as they are either not active daily or just not interested in the stats). Personally, I also used the 'stats' document (named differently) for storing other DB values that are shared among users, so this document is already retrieved anyways. Besides that, reads are 3x cheaper than writes. Worst case scenario is 2N reads and 1 write.

We Keep Coding

JavaScript is the programming language of the Web.