Closest Pair Algorithm in JavaScript

Closest Pair Algorithm in JavaScript - javascript

I'm trying to implement a divide and conquer algorithm to find the closest pair of points in a randomly-generated set of points using JavaScript. This algorithm should be running in O(n log n) time, but it is taking considerably longer to run than a simple brute force algorithm, which should be O(n^2).
I've created two jsfiddles that time the algorithms for an array of 16000 points:
Divide and Conquer
Brute Force
My hypothesis is that the divide and conquer is so slow because JavaScript arrays are actually hash tables. Is it possible to significantly speed up the algorithm in JavaScript? If so, what would be the best way to go about doing this?

At a glance, your merge function is allocating way too much memory (roughly order O(n^2)). I made a hacky way to measure this here. The basic idea is I just have a counter in the global scope, and add the size of the array generated by merge each time it is called. Hopefully this is enough info for you to fix it, if you encounter any more problems I'd be happy to provide a few additional pointers.
Also, just by playing with the numbers it's pretty easy to rule out it being a hash table issue* - an algorithm slowed by hash table would not exhibit a faster growth rate than O(n log n), it would just start slow and grow along the same lines. If you try a range of values out though, it should become apparent that it's growing faster than that, suggesting a different issue.
*the internal implementation of javascript arrays is a bit more complicated than them just being objects, but for the point I'm trying to make it doesn't really matter

Related

After shuffling the positions of an array traversing it becomes a lot slower, why is that?

I have a big array with teams (1M). Each of those teams has 10 elements.
The pool of elements is stored in a map (id -> { id, points }).
If I calculate the points for each team it's pretty fast (as expected).
But if then I alter the array the calculation of the points performance decreases for around 5x.
Does someone has an idea what could be the issue? Here's the code if you want to have a look: https://playcode.io/656462/

When you create an array for the first time, most of the memory is alloted consecutively.
Each element points towards the immediate next which is probably consecutively stored in memory. Once you shuffle the array, the values stay at the same addresses. Only the corresponding pointers change. So now your memory is not sequential. It's randomly spread.
Now when you try to traverse it, computer hardware takes more time to find those randomly spread memory locations than it would in a sequential memory. You wouldn't notice this with small arrays but this will be a factor when you're working on huge data like in your case.
I didn't really find a source on the internet to back my answer, I would have commented this rather than answering but I don't have enough reputation. This answer is based on my understanding of CS

In Javascript, what is the most compact, elegant and efficent way to bring the last element of an array to the beginning?

I'm programming a Snake game
and the logic of the snake's movement dictates that if I have a Javascript array
var links = [elem_0, elem_1, ..., elem_n];
of elements representing links of the snake, then the way for the snake to move is to pop out elem_n, change its position to be that of the elem_0 plus the translation units dx and dy, and then put it at the beginning of the array:
[elem_0, elem_1, ..., elem_n] ---> [elem_n, elem_0, ..., elem_(n-1)]
(with some internal properties of elem_n changed in the process)
What is the way to do this that makes no compromise between
optimally efficient in number of operations and memory usage
readable
maintainable
clever (optional)
elegant
compact
????

optimally efficient in number of operations and memory usage
You're asking for two optimisations that usually counter one another. e.g. more speed == more memory.
That said, I'd probably choose a (doubly) linked list to store my snake, because removal or addition at the front or tail are very cheap, and with games, faster is way preferable to less memory (within reason, but I wonder how long your snake would have to be before you run into memory issues... well beyond what's playable... and some).
Of course, I assume you've measured and found the standard array based methods to be too slow (seems unlikely).

You can rotate an array in two ways :
links.unshift(links.pop()); or
links.push(links.shift());
first method solves your issue.

For any version of Javascript from ES3 forward:
links.unshift(links.pop());

Finding a polygonal approximation of a Closed Path

I'd like to be able to find the best fitting polygonal approximation of a closed path (could be any path as they're being pulled out of images) but am having issues with how to approach coding an algorithm to find it.
I can think of a naive approach: every x amount of pixels along the path, choose the best fit line for those pixels, then brute force for different starting offsets and lengths and find the one that minimizes the least-square error with the minimum amount of lines.
There's got to be something more elegant. Anyone know of anything? Also, (cringe) but this is going to be implemented in javascript unless I get really desperate, so nice libraries that do things for you are pretty much ruled out, (opencv has a polygonal fitter for instance).

D3.js1 has some adaptive resampling code that you might be able to use. There's also an illustrated description of the algorithm used (Visvalingam’s algorithm).

The Ramer–Douglas–Peucker algorithm seems appropriate here, and is simple to implement.
Note that the acceptable error is an input to this algorithm, so if you have a target number of lines you can binary-search using the error parameter to hit the target.

Grade Sudoku difficulty level

I am building a Sudoku game for fun, written in Javascript.
Everything works fine, board is generated completely with a single solution each time.
My only problem is, and this is what's keeping me from having my project released to public
is that I don't know how to grade my boards for difficulty levels. I've looked EVERYWHERE,
posted on forums, etc. I don't want to write the algorithms myself, thats not the point of this
project,and beside, they are too complex for me, as i am no mathematician.
The only thing i came close to was is this website that does grading via JSbut the problem is, the code is written in such a lousy undocumented, very ad-hoc manner,therefor cannot be borrowed...
I'll come to the point -Can anyone please point me to a place which offers a source code for Sudoku grading/rating?
Thanks
Update 22.6.11:
This is my Sudoku game, and I've implemented my own grading system which relies
on basic human logic solving techniques, so check it out.

I have considered this problem myself and the best I can do is to decide how difficult the puzzle is to solve by actually solving it and analyzing the game tree.
Initially:
Implement your solver using "human rules", not with algorithms unlikely to be used by human players. (An interesting problem in its own right.) Score each logical rule in your solver according to its difficulty for humans to use. Use values in the hundreds or larger so you have freedom to adjust the scores relative to each other.
Solve the puzzle. At each position:
Enumerate all new cells which can be logically deduced at the current game position.
The score of each deduction (completely solving one cell) is the score of the easiest rule that suffices to make that deduction.
EDIT: If more than one rule must be applied together, or one rule multiple times, to make a single deduction, track it as a single "compound" rule application. To score a compound, maybe use the minimum number of individual rule applications to solve a cell times the sum of the scores of each. (Considerably more mental effort is required for such deductions.) Calculating that minimum number of applications could be a CPU-intensive effort depending on your rules set. Any rule application that completely solves one or more cells should be rolled back before continuing to explore the position.
Exclude all deductions with a score higher than the minimum among all deductions. (The logic here is that the player will not perceive the harder ones, having perceived an easier one and taken it; and also, this promises to prune a lot of computation out of the decision process.)
The minimum score at the current position, divided by the number of "easiest" deductions (if many exist, finding one is easier) is the difficulty of that position. So if rule A is the easiest applicable rule with score 20 and can be applied in 4 cells, the position has score 5.
Choose one of the "easiest" deductions at random as your play and advance to the next game position. I suggest retaining only completely solved cells for the next position, passing no other state. This is wasteful of CPU of course, repeating computations already done, but the goal is to simulate human play.
The puzzle's overall difficulty is the sum of the scores of the positions in your path through the game tree.
EDIT: Alternative position score: Instead of completely excluding deductions using harder rules, calculate overall difficulty of each rule (or compound application) and choose the minimum. (The logic here is that if rule A has score 50 and rule B has score 400, and rule A can be applied in one cell but rule B can be applied in ten, then the position score is 40 because the player is more likely to spot one of the ten harder plays than the single easier one. But this would require you to compute all possibilities.)
EDIT: Alternative suggested by Briguy37: Include all deductions in the position score. Score each position as 1 / (1/d1 + 1/d2 + ...) where d1, d2, etc. are the individual deductions. (This basically computes "resistance to making any deduction" at a position given individual "deduction resistances" d1, d2, etc. But this would require you to compute all possibilities.)
Hopefully this scoring strategy will produce a metric for puzzles that increases as your subjective appraisal of difficulty increases. If it does not, then adjusting the scores of your rules (or your choice of heuristic from the above options) may achieve the desired correlation. Once you have achieved a consistent correlation between score and subjective experience, you should be able to judge what the numeric thresholds of "easy", "hard", etc. should be. And then you're done!

Donald Knuth studied the problem and came up with the Dancing Links algorithm for solving sudoku, and then rating the difficulty of them.
Google around, there are several implementations of the Dancing Links engine.

Perhaps you could grade the general "constrainedness" of a puzzle? Consider that a new puzzle (with only hints) might have a certain number of cells which can be determined simply by eliminating the values which it cannot contain. We could say these cells are "constrained" to a smaller number of possible values than the typical cell and the more highly constrained cells that exist the more progress one can make on the puzzle without guessing. (Here we consider the requirement for "guessing" to be what makes a puzzle hard.)
At some point, however, the player must start guessing and, again, the constrainedness of a cell is important because with fewer values to choose between for a given cell the easier it is to find the correct value (and increase the constrainedness of other cells).
Of course, I don't actually play Sudoku (I just enjoy writing games and solvers for it), so I have no idea if this is a valid metric, just thinking out loud =)

I have a simple solver that looks for only unique possibilities in rows, columns and squares. When it has solved the few cells solvable by this method, it then picks a remaining candidate tries it and sees if the simple solver then leads to either a solution or a cell empty of possibilities. In the first case the puzzle is solved, in the second, one possibility has shown to be infeasible and thus eliminated. In the third case, which leads to neither a final solution nor an infeasibility, no
deduction can be reached.
The primary result of cycling through this procedure is to eliminate possiblities until picking
a correct cell entry leads to a solution. So far this procedure has solved even the hardest
puzzles without fail. It solves without difficulty puzzles with multiple solutions. If the
trial candidates are picked a random, it will generate all possilbe solutions.
I then generate a difficulty for the puzzle based on the number of illegal candidates that must
be eliminated before the simple solver can find a solution.
I know that this is like guessing, but if simple logic can eliminated a possible candidate, then one
is closer to the final solution.
Mike

I've done this in the past.
The key is that you have to figure out which rules to use from a human logic perspective. The example you provide details a number of different human logic patterns as a list on the right-risde.
You actually need to solve the puzzle using these rules instead of computer rules (which can solve it in milliseconds using simple pattern replacement). Every time you change the board, you can start over from the 'easiest' pattern (say, single open boxes in a cell or row), and move down the chain until you find one the next logical 'rule' to use.
When scoring the sodoku, each methodology is assigned some point value, which you would add up for every field you needed to fill out. While 'single empty cell' might get a 0, 'XY Chain' might get 100. You tabulate all of the methods needed (and frequency) and you wind up with a final weighting. There are plenty of places that list expected values for those weightings, but they are all fairly empirical. You're trying to model human logic, so feel free to come up with your own weightings or enhance the system (if you really only use XY chains, the puzzle is probably easier than if it requires more advanced mechanisms).
You may also find that even though you have a unique sodoku, that it is unsolvable through human logic.
And also note that this is all far more CPU intensive than solving it in a standard, patterned way. Some years ago when I wrote my code, it was taking multiple (I forget exactly, but maybe even up to 15) seconds to solve some of the generated puzzles I'd created.

Assuming difficulty is directly proportional to the time it takes a user to solve the puzzle, here is an Artificially Intelligent solution that approaches the results of the ideal algorithm over time.
Randomly generate a fixed number of starting puzzle layouts, say 100.
Initially, offer a random difficulty section that let's a user play random puzzles from the available layouts.
Keep an average random solution time for each user. I would probably make a top 10/top X leaderboard for this to generate interest in playing random puzzles.
Keep an average solution time multiplier for each puzzle solution (if the user normally solves the puzzle in 5 minutes and solves it in 20 minutes, 4 should be figured in to the puzzles average solution time multiplier)
Once a puzzle has been played enough times to get a base difficulty for the puzzle, say 5 times, add that puzzle to your list of rated puzzles and add another randomly generated puzzle to your available puzzle layouts.
Note: You should keep the first puzzle in your random puzzles list so that you can get better and better statistics on it.
Once you have enough base-rated puzzles, say 50, allow users to access the "Rated Difficulty" portion of your application. The difficulty for each puzzle will be the average time multiplier for that puzzle.
Note: When users choose to play puzzles with rated difficulty, this should NOT affect the average random solution time or average solution time multiplier, unless you want to get into calculating weighted averages (otherwise if a user plays a lot of harder puzzles, their average time and time multipliers will be skewed).
Using the method above, a solution would be rated from 0 (already solved/no time to solve) to 1 (users will probably solve this puzzle in their average time) to 2 (users will probably take twice as long to solve this puzzle than their average time) to infinity (users will take forever to find a solution to this puzzle).

How to prevent a javascript stack overflow?

I've been building Conway's Life with javascript / jquery in order to run it in a browser Here. Chrome, Firefox and Opera or Safari do this pretty fast so preferably don't use IE for this. IE9 is ok though.
While generating the new generations of Life I am storing the previous generations in order to be able to walk back through the history. This works fine until a certain point when memory fills up, which makes the browser(tab) crash.
So my question is: how can I detect when memory is filling up? I am storing an array for each generation in an array which forms the history of generations. This takes massive amounts of memory which crashes the browser after a few thousands of generations, depending on available memory.
I am aware of the fact that javascript can't check the amount of available memory but there must be a way...

I doubt that there is a way to do it. Even if there is, it would probably be browser-specific. I can suggest a different way, though.
Instead of storing all the data for each generation, store snapshots taken every once in a while. Since the Conway's Game of Life is deterministic, you can easily re-generate future frames from a given snapshot. You'll probably want to keep a buffer of a few frames so that you can make rewinding nice and smooth.
In reality, this doesn't actually solve the problem, since you'll run out of space eventually. However, if you store every n frames, your application will last n times longer, which might just be long enough. I would recommend that you impose some hard limits on how far into the past you can rewind so that you have a cap on how much you have to store. Determine that how many frames that would be (10 minutes at 30 FPS = 18000 frames). Then, divide frames by how many frames you can store (profile various web browsers to figure this out) and that is the interval between snapshots you should use.

Dogbert pretty much nailed it. You can't know exactly how much available memory there is but you can know how potentially large your dataset will be.
So, take the size of each object stored in the array, multiply by array dimensions and that's the size of one iteration. Multiply that by the desired number of iterations to see how much space total it will take, and adjust accordingly.
Or, inspired by Travis, simply run the pattern in reverse from the last known array. It is deterministic after all.

We Keep Coding

JavaScript is the programming language of the Web.