Logic for grouping similar parameters - javascript

I am trying to figure out the best logic for grouping parameters within a certain tolerance. It's easier to explain with an example...
Task1: parameter1=140
Task2: parameter1=137
Task3: parameter1=142
Task4: parameter1=139
Task5: parameter1=143
If I want to group tasks if they are within 2 of each other, I think I need to do several passes. For the example, the desired result would be this:
Task4 covers Task1, Task2, and Task4
Task3 covers Task3 and Task5
There are multiple possibilities because Task1 could also cover 3 and 4 but then 2 and 5 would be two additional tasks that are by themselves. Basically, I would want the fewest number of tasks that are within 2 of each other.
I am currently trying to do this in excel VBA but I may port the code to php later. I really just don't know where to start because it seems pretty complex.

You'l need a clustering algorithm i'd assume. Consider the following parameters-
Task1: parameter1=140
Task2: parameter1=142
Task3: parameter1=144
Task4: parameter1=146
Task5: parameter1=148
Depending on your logic the clustering will get weird here. If you simply check each number for numbers near it all of these will be clustered. But does 140 and 148 deserve to be in the same cluster? Try kmeans clustering. There will be some gray area but the result will be relatively accurate.
http://en.wikipedia.org/wiki/K-means_clustering

You can group tasks in a single pass if you decide the group boundaries before looking at the tasks. Here's a simple example using buckets of width 4, based on your goal to group tasks within +/-2 of each other:
Dim bucket As Integer
For Each parameter In parameters
bucket = Round(parameter / 4, 0)
' ... do something now that you know what bucket the task is in
Next parameter
If the groups provided by fixed buckets don't fit the data closely enough for your needs, you will need to use an algorithm that makes multiple passes. Since the data in your example is one-dimensional, you can (and should!) use simpler techniques than k-means clustering.
A good next place to look might be Literate Jenks Natural Breaks and How The Idea Of Code is Lost, with a very well commented Jenks Natural Breaks Optimization in JavaScript.

Related

I am unable to understand how to code a logic for money splitting among group of people using MEAN

I am working on a project named "Splitter", where I need to create a group and split the money among N number of people in short similar to splitwise app but I am confused at the logic part, where every individual can update the amount and get the overall amount an individual gets or owes others.
Please let me know if you need more clarification on my question. Any leads would be appreciated.
A pseudo-code would be:
amount_every_person_owes = total_spent / number_of_people
for person in group:
amount_this_person_owes = amount_every_person_owes - amount_this_person_spent
Of course, after arriving at the amount each person owes, you'll have to create a mapping as to who owes how much to the other. I'll leave that part to you.

Picking best combination of items that have target sums

I am creating a Meal Plan Generator (NodeJS/Javascript with MongoDB) that also has a randomized function in it. I have in a MongoDB 400k Recipe objects.
Recipe object sample:
{"name": "Roasted Beef", "calories": 450, "carbs":10 , "fats": 20.2 , "proteins": 55.4}
(Here is a 50 records JSON sample of the Recipes: http://gofile.io/?c=h0xZ5C)
(And here is a real JSON dump for over 400k recipes: http://gofile.io/?c=0Utnej You can ignore the "_id" and "_v" fields)
Say a user needs to generate Meal Plan (a combination of MINIMUM 3 and MAXIMUM 6 Recipe objects) to have the following totals (summation of all values in all of its objects) to be each within a target range approximately. But also you can change the serving_size of each Recipe, which is a multiplier for all of its values, where it could be any value of: 1, 1.5, 2, 2.5, 3, 3.5 or 4. This multiplier is needed because we can't find meals that will always match the targets when added, so we need to change the serving size to reach the targets.
Example: Generate a Meal Plan that has:
Total calories: 2,000
carbs: [20-50]
fats: [40-70]
proteins: [40-90]
So say we do 3 servings of a Recipe, and 2.5 of another, and 1 serving of a third, to reach the target.
I had a solution in mind to iterate over say a set of randomly selected 1,000 Recipes, then try all combinations with various serving_size for each to match the targets. OR a recursive function that picks a random meal, then tries to find another meal and tries to change its serving_size, etc. until we reach the targets.
Meal Plans should be as random as possible, so goal here is not just optimize for the targets but also keep things random as much as possible.
Any known solution to such problem? Just to avoid re-inventing the wheel if any algorithm is known to solve this problem.
Thanks for any help!
Hmm don't know a good algorithm for that. But just a warning for the recursive approach, it should work reasonably well for easy constraints. But when choosing extreme constraints (say we take a recipe in your list which is a pareto minimum, multiply the values by 3 and set them as maximum values, then 3 times the recipe might be the only solution) just trying random combinations can go through an extreme number of combinations.
You should try to limit the options as much as possible without excluding any viable combinations (if you want to retain maximum randomness). If you can make it likely that there is a viable result it can cut down on the recipes that have to be tested (or if the conditions are thorough enough to ensure it you can avoid backtracking entirely). Like for single parameter it is easy: If you have k max for parameter a and start with a recipe that has m in the parameter then min(parameter a)*2+m<=k has to be fulfilled otherwise there can't be a combination including the recipe that can honor the maximum bound. (And for a min of k I suppose m+max(parameter a)*5*4>=k would have to be true, but with the option of going up to 6 recipes and multiplying serving size by up to 4 that should happen less often than violation maxima.)
The tricky part is ensuring that the combination of restrictions can be fulfilled... (Though the single parameter check should serve as a viable first check to lower the numbers of recipes to consider.) I haven't entirely thought this through but for the max part: For it to be fulfill able with the recipe you want to try it has to be fulfill able by adding two recipes, so one of the pareto minima for two recipes has to be able to fulfill it. I think you should get the pareto minima for a combo of two recipes by going through all combinations of pareto minima for a single recipe. I am unsure how many pareto minima will be in the database but if it is a reasonable number you could pre generate the ones for 2 recipes and then try whether there is a pareto minima that when added to the recipe in question would not violate the maximum constraints. (Other way around for the minima but this approach doesn't guarantee that you can fulfill both boundaries at once.)
(Wanted to just write this as comment since it is not much of an answer but it got too long.)

How can I reliably subsort arrays using DOM methods?

UP-FRONT NOTE: I am not using jQuery or another library here because I want to understand what I’ve written and why it works (or doesn’t), so please don’t answer this with libraries or plugins for libraries. I have nothing against libraries, but for this project they’re inimical to my programming goals.
That said…
Over at http://meyerweb.com/eric/css/colors/ I added some column sorting using DOM functions I wrote myself. The problem is that while it works great for, say, the simple case of alphabetizing strings, the results are inconsistent across browsers when I try to sort on multiple numeric terms—in effect, when I try to do a sort with two subsorts.
For example, if you click “Decimal RGB” a few times in Safari or Firefox on OS X, you get the results I intended. Do the same in Chrome or Opera (again, OS X) and you get very different results. Yes, Safari and Chrome diverge here.
Here’s a snippet of the JS I’m using for the RGB sort:
sorter.sort(function(a,b){
return a.blue - b.blue;
});
sorter.sort(function(a,b){
return a.green - b.green;
});
sorter.sort(function(a,b){
return a.red - b.red;
});
(sorter being the array I’m trying to sort.)
The sort is done in the tradition of another StackOverflow question “How does one sort a multi dimensional array by multiple columns in JavaScript?” and its top answer. Yet the results are not what I expected in two of the four browsers I initially tried out.
I sort (ha!) of get that this has to do with array sorts being “unstable”—no argument here!—but what I don’t know is how to overcome it in a consistent, reliable manner. I could really use some help both understanding the problem and seeing the solution, or at least a generic description of the solution.
I realize there are probably six million ways to optimize the rest of the JS (yes, I used a global). I’m still a JS novice and trying to correct that through practice. Right now, it’s array sorting that’s got me confused, and I could use some help with that piece of the script before moving on to cleaning up the code elsewhere. Thanks in advance!
UPDATE
In addition to the great explanations and suggestions below, I got a line on an even more compact solution:
function rgbSort(a,b) {
return (a.red - b.red || a.green - b.green || a.blue - b.blue);
}
Even though I don’t quite understand it yet, I think I’m beginning to grasp its outlines and it’s what I’m using now. Thanks to everyone for your help!
OK. So, as you've discovered, your problem is that the default JavaScript sort is not guaranteed to be stable. Specifically, I think that in your mind it works like this: I'll sort by blueness, and then when I sort by greenness the sorter will just move entries in my array up and down but keep them ordered by blueness. Sadly, the universe is not so conveniently arranged; the built-in JS sort is allowed to do the sort how it likes. In particular, it's allowed to just throw the contents of the array into a big bucket and then pull them out sorted by what you asked for, completely ignoring how it was arranged before, and it looks like at least some browsers do precisely that.
There are a couple of ways around this, for your particular example. Firstly, you could still do the sort in three separate calls, but make sure those calls do the sort stably: this would mean that after sorting by blueness, you'd stably sort by greenness and that would give you an array sorted by greenness and in blueness order within that (i.e., precisely what you're looking for). My sorttable library does this by implementing a "shaker sort" or "cocktail sort" method (http://en.wikipedia.org/wiki/Cocktail_sort); essentially, this style of sorting walks through the list a lot and moves items up and down. (In particular, what it does not do is just throw all the list items into a bucket and pull them back out in order.) There's a nice little graphic on the Wikipedia article. This means that "subsorts" stay sorted -- i.e., that the sort is stable, and that will give you what you want.
However, for this use case, I wouldn't worry about doing the sort in three different calls and ensuring that they're stable and all that; instead, I'd do all the sorting in one go. We can think of an rgb colour indicator (255, 192, 80) as actually being a big number in some strange base: to avoid too much math, imagine it's in base 1000 (if that phrase makes no sense, ignore it; just think of this as converting the whole rgb attribute into one number encompassing all of it, a bit like how CSS computes precedences in the cascade). So that number could be thought of as actually 255,192,080. If you compute this number for each of your rows and then sort by this number, it'll all work out, and you'll only have to do the sort once: so instead of doing three sorts, you could do one: sorter.sort(function(a,b) { return (a.red*1000000 + a.green*1000 + a.blue) - (b.red*1000000 + b.green*1000 + b.blue) } and it'll all work out.
Technically, this is slightly inefficient, because you have to compute that "base 1000 number" every time that your sort function is called, which may be (is very likely to be) more than once per row. If that's a big problem (which you can work out by benchmarking it), then you can use a Schwartzian transform (sorry for all the buzzwords here): basically, you work out the base-1000-number for each row once, put them all in a list, sort the list, and then go through the sorted list. So, create a list which looks like [ [255192080, <table row 1>], [255255255, <table row 2>], [192000000, <table row 3>] ], sort that list (with a function like mylist.sort(function(a,b) { return a[0]-b[0]; })), and then walk through that list and appendChild each of the s onto the table, which will sort the whole table in order. You probably don't need this last paragraph for the table you've got, but it may be useful and it certainly doesn't hurt to know about this trick, which sorttable.js also uses.
I would approach this problem in a different manner. It appears you're trying to reconstruct all the data by extracting it from the markup, which can be a perilous task; a more straightforward approach would be to represent all the data you want to render out to the page in a format your programs can understand from the start, and then simply regenerate the markup first on page load and then on each subsequent sort.
For instance:
var colorsData = [
{
keyword: 'mediumspringgreen',
decimalrgb: {
r: 0,
g: 250,
b: 154
},
percentrgb: {
r: 0,
g: 98,
b: 60.4
},
hsl: {
h: 157,
s: 100,
l: 49
}
hex: '00FA9A',
shorthex: undefined
},
{
//next color...
}
];
That way, you can run sorts on this array in whatever way you'd like, and you're not trying to rip data out from markup and split it and reassign it and all that.
But really, it seems you're maybe hung up on the sort functions. Running multiple sorts one after the other will get unintended results; you have to run a single sort function that compares the next 'column' in the case the previous one is found to be equal. An RGB sort could look like:
var decimalRgbForwards = function(a,b) {
var a = a.decimalrgb,
b = b.decimalrgb;
if ( a.r === b.r ) {
if ( a.g === b.g ) {
return a.b - b.b;
} else {
return a.g - b.g;
}
} else {
return a.r - b.r;
}
};
So two colors with matching r and g values would return for equality on the b value, which is just what you're looking for.
Then, you can apply the sort:
colorsData.sort(decimalRgbForwards);
..and finally iterate through that array to rebuild the markup inside the table.
Hope it helps, sir-

Charting thousands of points with dojo

I need to plot thousands of points, perhaps close to 50,000 with the dojo charting library. It works, but it's definitely very slow and lags the browser. Is there any way I can get better performance?
EDIT:
I solved by applying a render filter to the data. Essentially, I have a new item parameter called "render" which is set to false by my json source if the point is expected to overlap others. My DataSeries then queries for all points where render:true. This way all of the data is there still for non-visual sources that want all of the points, while my charts now run smoothly.
Psuedocode:
def is_overlapped(x, y, x_round, y_round)
rounded_x = round(x, x_round)
rounded_y = round(y, y_round)
hash = hash_xy(rounded_x, rounded_y)
if(#overlap_filter[hash].nil?)
#overlap_filter[hash] = true
return false
end
return true
end
x_round and y_round can be determined by the x and y ranges, say for example range / 100
I know this isn't probably exactly the answer you're looking for, but have you considered simply reducing the number of points you are plotting? I don't know the specific function of the graph(s), but I'd imagine most graphs with that many points are unnecessary; and no observer is going to be able to take that level of detail in.
Your solution could lie with graphing techniques rather than JavaScript. E.g. you could most likely vastly reduce the number of points and use a line graph instead of a scatter plot while still communicating similar levels of information to your intended target.

Solver for TSP-like Puzzle, perhaps in Javascript

I have created a puzzle which is a derivative of the travelling salesman problem, which I call Trace Perfect.
It is essentially an undirected graph with weighted edges. The goal is to traverse every edge at least once in any direction using minimal weight (unlike classical TSP where the goal is to visit every vertex using minimal weight).
As a final twist, an edge is assigned two weights, one for each direction of traversal.
I create a new puzzle instance everyday and publish it through a JSON interface.
Now I know TSP is NP-hard. But my puzzles typically have only a good handful of edges and vertices. After all they need to be humanly solvable. So a brute force with basic optimization might be good enough.
I would like to develop some (Javascript?) code that retrieves the puzzle from the server, and solves with an algorithm in a reasonable amount of time. Additionally, it may even post the solution to the server to be registered in the leader board.
I have written a basic brute force solver for it in Java using my back-end Java model on the server, but the code is too fat and runs out of heap-space quick, as expected.
Is a Javascript solver possible and feasible?
The JSON API is simple. You can find it at: http://service.traceperfect.com/api/stov?pdate=20110218 where pdate is the date for the puzzle in yyyyMMdd format.
Basically a puzzle has many lines. Each line has two vertices (A and B). Each line has two weights (timeA for when traversing A -> B, and timeB for when traversing B -> A). And this should be all you need to construct a graph data structure. All other properties in the JSON objects are for visual purposes.
If you want to become familiar with the puzzle, you can play it through a flash client at http://www.TracePerfect.com/
If anyone is interested in implementing a solver for themselves, then I will post detail about the API for submitting the solution to the server, which is also very simple.
Thank you for reading this longish post. I look forward to hear your thoughts about this one.
If you are running out of heap space in Java, then you are solving it wrong.
The standard way to solve something like this is to do a breadth-first search, and filter out duplicates. For that you need three data structures. The first is your graph. The next is a queue named todo of "states" for work you have left to do. And the last is a hash that maps the possible "state" you are in to the pair (cost, last state).
In this case a "state" is the pair (current node, set of edges already traversed).
Assuming that you have those data structures, here is pseudocode for a full algorithm that should solve this problem fairly efficiently.
foreach possible starting_point:
new_state = state(starting_point, {no edges visited})
todo.add(new_state)
seen[new_state] = (0, null)
while todo.workleft():
this_state = todo.get()
(cost, edges) = seen[this_state]
foreach directed_edge in graph.directededges(this_state.current_node()):
new_cost = cost + directed_edge.cost()
new_visited = directed_edge.to()
new_edges = edges + directed_edge.edge()
new_state = state(new_visited, new_edges)
if not exists seen[new_state] or new_cost < seen[new_state][0]:
seen[new_state] = (new_cost, this_state)
queue.add(new_state)
best_cost = infinity
full_edges = {all possible edges}
best_state
foreach possible location:
end_state = (location, full_edges)
(cost, last_move) = seen[end_state]
if cost < best_cost:
best_state = end_state
best_cost = cost
# Now trace back the final answer.
path_in_reverse = []
current_state = best_state
while current_state[1] is not empty:
previous_state = seen[current_state][1]
path_in_reverse.push(edge from previous_state[0] to current_state[0])
current_state = previous_state
And now reverse(path_in_reverse) gives you your optimal path.
Note that the hash seen is critical. It is what prevents you from getting into endless loops.
Looking at today's puzzle, this algorithm will have a maximum of a million or so states that you need to figure out. (There are 2**16 possible sets of edges, and 14 possible nodes you could be at.) That is likely to fit into RAM. But most of your nodes only have 2 edges connected. I would strongly advise collapsing those. This will reduce you to 4 nodes and 6 edges, for an upper limit of 256 states. (Not all are possible, and note that multiple edges now connect two nodes.) This should be able to run very quickly with little use of memory.
For most parts of graph you can apply http://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg.
This way you can obtain number of lines that you should repeat in order to solve.
At beginning you should not start at nodes which has short vertices over which you should travel two times.
If I summarize:
start at node whit odd number of edges.
do not travel over lines which sit on even node more than once.
use shortest path to travel from one odd node to another.
Simple recursive brute force solver whit this heuristic might be good way to start.
Or another way.
Try to find shortest vertices that if you remove them from graph remining graph will have only two odd numbered nodes and will be considered solvable as Koningsberg bridge. Solution is solving graph without picking up pencil on this reduced graph and once you hit node whit "removed" edge you just go back and forward.
On your java backend you might be able to use this TSP code (work in progress) which uses Drools Planner (open source, java).

Categories