How to identify breakpoints (trend lines edges) in a dataset?

How to identify breakpoints (trend lines edges) in a dataset? - javascript

I've been performing some research, in order to find the best approach to identify break points (trend direction change) in a dataset (with pairs of x/y coordinates), that allow me to identify the trend lines behind my data collections.
However I had no luck, finding anything that brings me some light.
The yellow dots in the following image, represent the breakpoints I need to detect.
Any suggestion about an article, algorithm, or implementation example (typescript prefered) would be very helpful and appreciated.

Usually, people tend to filter the data by looking only maximums (support) or only minimums (resistance). A trend line could be the average of those. The breakpoints are when the data crosses the trend, but this gives a lot of false breakpoints. Because images are better than words, you can look at page 2 of http://www.meacse.org/ijcar/archives/128.pdf.
There are a lot of scripts available look for "ZigZag" in
https://www.tradingview.com/
e.g. https://www.tradingview.com/script/lj8djt1n-ZigZag/ https://www.tradingview.com/script/prH14cfo-Trend-Direction-Helper-ZigZag-and-S-R-and-HH-LL-labels/
Also you can find an interesting blog post here (but code in in python):
https://towardsdatascience.com/programmatic-identification-of-support-resistance-trend-lines-with-python-d797a4a90530
with code available: https://pypi.org/project/trendln/

If you can identify trend lines then can't you just identify a breakpoint as when the slope changes? If you can't identify trend lines, then can you for example, take a 5-day moving average and see when that changes slope?

This might sound strange, or even controversial, but -- there are no "breakpoints". Even looking at your image, the fourth breakpoint might as well be on the local maximum immediately before its current position. So, different people might call "breakpoints" different points on the same graph (and, indeed, they do).
What you have in the numbers are several possible moving averages (calculated on a varying interval, so you might consider MA5 for five-day average, or MA7 for a week average) and their first and maybe second derivatives (if you feel fancy you can experiment with third derivatives). If you plot all these lines, suitably smoothed, over your data, you will notice that the salient points of some of them will roughly intersect near the "breakpoints". Those are the parameters that your brain considers when you "see" the breakpoints; it is why you see there the breakpoints, and not somewhere else.
Another method that the human vision employs to recognize features is trimming outliers: you discard in the above calculations either any value outside a given tolerance, or a fixed percentage of all values starting from those farther from the average. You can also experiment not trimming those values that are outliers for longer-period moving averages but are not for shorter periods (this gives better responsivity but will find more "breakpoints"). Then you run the same calculations on the remaining data.
Finally you can attribute a "breakpoint score" based on weighing the distance from nearby salient points. Then, choose a desired breakpoint distance and call "breakpoint" the highest scoring point in that interval; repeat for all subsequent breakpoints. Again, you may want to experiment with different intervals. This allows for a conveniently paced breakpoint set.
And, finally, you will probably notice that different kinds of signal sources have different "best" breakpoint parameters, so there is no one "One-Size-Fits-All" parameter set.
If you're building an interface to display data, leaving the control of these parameters to the user might be a good idea.

Related

Smooth all data-points in a chart in nodejs/javascript

I have a chart which line I wish to be able to make as smooth as possible. The line should still keep the overall pattern and as close to the original line as possible - but I need to be able to smooth all "bumbs" 100% away / to the degree I wish.
When I say "100% smooth" - I mean something like this (try to draw a curved line in the square): http://soswow.github.io/fit-curve/demo/
The line must only go up or down (while the main trend is up/down-wards) - E.g. like a Sine curve. Now imaging you added a lot of noise/bumps of different sizes/freq. to the Sine curve - but you like to "restore" the curve without changing the curve's overall pattern. That is exactly my need. The ideal: If I could filter away exactly the selected level of noise/freq. I wish to remove from the main trend.
SMA is lagging in nature and I need something which is a lot closer to the actual data-points in time.
I know the lagging feature of SMA is normally accepted - but I don't accept it ;) I strongly believe it would be possible to do better than that :) DMA can shift the data-points itself - but has no effect of the data-points info in real time which is what I'm looking for as well...I know I have to hack/compensate - and I can also come up with 100s of ways myself (mixing all the algos I know, running them multiple times etc.) But I guess someone out there is way smarter than me and that it has already been solved - and I would definitely wonder if not a standard algorithm for exactly this issue exist?
I have looked into many different algorithms - but none of them worked satisfyingly (Moving Averages, Median, polynomial regression, Savitzky Golay etc.). But the result is still way too "bumby" and "pixelated" and otherwise it again becomes too lagging.
Lastly I have found: Bezier Cubic and quadratic which seems pretty interesting but I don't know how apply it on all my data-points and I can't find a suitable NPM (I can only find libraries like this: https://www.npmjs.com/package/bezier-easing which only takes 1 data-point which is not what I'm looking for).
Savitzky G. is better than regular MA - but I still believe it lags too much when it is as smooth as I consider acceptable.
The task is pre-processing and noise-reduction of temperature, price and similar charts in real-time before it is handled over to an IA which looks for abnormalizes (too much noise seems to confuse the AI and is also unnecessary for the most parts). The example with the drawing was only an example - just as well as me mentioning a "Sine curve" (to illustrate my point). The chart is in general very arbitrary and doesn't follow any pre-defined patterns.
I like to emphasize again that the primary prerequisite of the selected algorithm/procedure must be - that it generates a chart-line which minimizes lagging from the main chart's overall trend to an absolutely minimum and at the same time makes it possible to adjust at what level the noise-reduction should take place :-)
I have also made this small drawing in paint - just so you easily would understand my point :-) screencast.com/t/jFq2sCAOu The algo should remove and replace all instances/areas in a given chart which matches the selected frequency - in the drawing is only shown one of each - but normally there would exist many different areas of the chart with the same level of noise.
Please let me know if all this makes sense to you guys - otherwise please pin-point what I need to elaborate more about.
All help, ideas and suggestions are highly appreciated.

Find the closest coordinate from a set of coordinates

I have about 1000 set of geographical coordinates (lat, long).
Given one coordinate i want to find the closest one from that set. My approach was to measure the distance but on hundreds requests per second can be a little rough to the server doing all that math.
What is the best optimized solution for this?
Thanks

You will want to use the 'Nearest Neighbor Algorithm'.

You can use this library sphere-knn, or look at something like PostGIS.

Why not select the potential closest points from the set (eg set a threshold, say, 0.1 and filter the set so that you have any points with +-0.1 in both axes from your target point). Then do that actual calcs on this set.
If none are within the first range, just enlarge it (0,2) and repeat (0.3, 0.4...) until you've got a match. Obviously you would tune the threshold so it best matched your likely results.
(I'm assuming the time-consulming bit is the actual distance calculation, so the idea is to limit the number of calculations.)

An Algorithmic Response
Your approach is already O(n) in time. It's algorithmically very fast, and fairly simple to implement.
If that is not enough, you should consider taking a look at R-trees. The idea behind R-trees is roughly paraphrased as follows:
You already have a set of n elements. You can preprocess this data to form rough 'squares' of regions each containing a set of points, with an established boundary.
Now say a new element comes in. Instead of comparing across every coordinate, you identify which 'square' it belongs in by just comparing whether the point is smaller than the boundaries, and then measure the distance with only the points inside that square.
You can see at once the benefits:
You are no longer comparing against all coordinates, but instead only the boundaries (strictly less than the number of all elements) and then against the number of coordinates within your chosen boundary (also less than the number of all elements).
The upper bound of such an algorithm is O(n) time. The lower bound may, on average, be O(log n).
The main improvement is mostly in the pre-processing step (which is 'free' in that it's a one-time cost) and in the reduced number of comparisons needed.
A Systemic Response
Just buy another server, and distribute the requests and the elements using a load balancer such as Haproxy.
Servers are fairly cheap, especially if they are critical to your business, and if you want to be fast, it's an easy way to scale.

Best approach for collision detection with HTML5 and JavaScript?

I'm trying to make a little platform game with pure HTML5 and JavaScript. No frameworks.
So in order to make my character jump on top of enemies and floors/walls etc., it needs some proper collision detection algorithms.
Since I'm not usually into doing this. I really have no clue on how to approach the problem.
Should I do a re-check in every frame (it runs in 30 FPS) for all obstacles in the Canvas and see if it collides with my player, or is there a better and faster way to do so?
I even thought of making dynamic maps. So the width, height, x- and y coordinates of the obstacle are stored in an object. Would that make it faster to check if it's colliding with the player?

1. Should I re-check in every frame (it runs on 30 FPS)?
Who says it runs in 30 FPS? I found no such thing in the HTML5 specification. Closest you'll get to have anything to say about the framerate at all is to programmatically call setInterval or the newish, more preferred, requestAnimationFrame function.
However, back to the story. You should always look for collisions as much as you can. Usually, writing games on other platforms where one have a greater ability to measure CPU load, this could be one of those things you might find favorable to scale back some if the CPU has a hard time to follow suit. In JavaScript though, you're out of luck trying to implement advanced solutions like this one.
I don't think there's a shortcut here. The computer has no way of knowing what collided, how, when- and where, if you don't make that computation yourself. And yes, this is usually, if not at all times even, done just before each new frame is painted.
2. A dynamic map?
If by "map" you mean an array-like object or multidimensional array that maps coordinates to objects, then the short answer has to be no. But please do have an array of all objects on the scene. The width, height and coordinates of the object should be stored in variables in the object. Leaking these things would quickly become a burden; rendering the code complex and introduce bugs (please see separation of concerns and cohesion).
Do note that I just said "array of all objects on the scene" =) There is a subtle but most important point in this quote:
Whenever you walk through objects to determine their position and whether they have collided with someone or not. Also have a look at your viewport boundaries and determine whether the object are still "on the scene" or not. For instance, if you have a space craft simulator of some kind and a star just passed the player's viewport from one side to the other and then of the screen, and there is no way for the star to return and become visible again, then there is no reason for the star to be left behind in the system any more. He should be deleted and removed. He should definitely not be stored in an array and become part of a future collision detection with the player's avatar! Such things could dramatically slow down your game.
Bonus: Collision quick tips
Divide the screen into parts. There is no reason for you to look for a collision between two objects if one of them are on left side of the screen, and the other one is on the right side. You could split up the screen into more logical units than just left and right too.
Always strive to have a cheap computation made first. We kind of already did that in the last tip. But even if you now know that two objects just might be in collision with each other, draw two logical squares around your objects. For instance, say you have two 2D airplanes, then there is no reason for you to first look if some part of their wings collide. Draw a square around each airplane, effectively capturing their largest width and their largest height. If these two squares do not overlap, then just like in the last tip, you know they cannot be in collision with each other. But, if your first-phase cheap computation hinted that they might be in collision, pass those two airplanes to another more expensive computation to really look into the matter a bit more.

I am still working on something i wanted to make lots of divs and make them act on physics. I will share somethings that weren't obvious to me at first.
Detect collisions in data first. I was reading the x and y of boxes on screen then checking against other divs. After a week it occurred to me how stupid this was. I mean first i would assign a new value to div, then read it from div. Accessing divs is expensive. Think dom as a rendering stage.
Use webworkers if possible easy.
Use canvas if possible.
And if possible make elements carry a list of elements they should be checked against for collision.(this would be helpful in only certain cases).
I learned that interactive collisions are way more expensive. Because you have to check for changes in environment while in normal interaction you simulate what is going to happen in future, and therefore your animation would be more fluid and more cpu available.
i made something very very early stage just for fun: http://www.lastnoob.com/

Grade Sudoku difficulty level

I am building a Sudoku game for fun, written in Javascript.
Everything works fine, board is generated completely with a single solution each time.
My only problem is, and this is what's keeping me from having my project released to public
is that I don't know how to grade my boards for difficulty levels. I've looked EVERYWHERE,
posted on forums, etc. I don't want to write the algorithms myself, thats not the point of this
project,and beside, they are too complex for me, as i am no mathematician.
The only thing i came close to was is this website that does grading via JSbut the problem is, the code is written in such a lousy undocumented, very ad-hoc manner,therefor cannot be borrowed...
I'll come to the point -Can anyone please point me to a place which offers a source code for Sudoku grading/rating?
Thanks
Update 22.6.11:
This is my Sudoku game, and I've implemented my own grading system which relies
on basic human logic solving techniques, so check it out.

I have considered this problem myself and the best I can do is to decide how difficult the puzzle is to solve by actually solving it and analyzing the game tree.
Initially:
Implement your solver using "human rules", not with algorithms unlikely to be used by human players. (An interesting problem in its own right.) Score each logical rule in your solver according to its difficulty for humans to use. Use values in the hundreds or larger so you have freedom to adjust the scores relative to each other.
Solve the puzzle. At each position:
Enumerate all new cells which can be logically deduced at the current game position.
The score of each deduction (completely solving one cell) is the score of the easiest rule that suffices to make that deduction.
EDIT: If more than one rule must be applied together, or one rule multiple times, to make a single deduction, track it as a single "compound" rule application. To score a compound, maybe use the minimum number of individual rule applications to solve a cell times the sum of the scores of each. (Considerably more mental effort is required for such deductions.) Calculating that minimum number of applications could be a CPU-intensive effort depending on your rules set. Any rule application that completely solves one or more cells should be rolled back before continuing to explore the position.
Exclude all deductions with a score higher than the minimum among all deductions. (The logic here is that the player will not perceive the harder ones, having perceived an easier one and taken it; and also, this promises to prune a lot of computation out of the decision process.)
The minimum score at the current position, divided by the number of "easiest" deductions (if many exist, finding one is easier) is the difficulty of that position. So if rule A is the easiest applicable rule with score 20 and can be applied in 4 cells, the position has score 5.
Choose one of the "easiest" deductions at random as your play and advance to the next game position. I suggest retaining only completely solved cells for the next position, passing no other state. This is wasteful of CPU of course, repeating computations already done, but the goal is to simulate human play.
The puzzle's overall difficulty is the sum of the scores of the positions in your path through the game tree.
EDIT: Alternative position score: Instead of completely excluding deductions using harder rules, calculate overall difficulty of each rule (or compound application) and choose the minimum. (The logic here is that if rule A has score 50 and rule B has score 400, and rule A can be applied in one cell but rule B can be applied in ten, then the position score is 40 because the player is more likely to spot one of the ten harder plays than the single easier one. But this would require you to compute all possibilities.)
EDIT: Alternative suggested by Briguy37: Include all deductions in the position score. Score each position as 1 / (1/d1 + 1/d2 + ...) where d1, d2, etc. are the individual deductions. (This basically computes "resistance to making any deduction" at a position given individual "deduction resistances" d1, d2, etc. But this would require you to compute all possibilities.)
Hopefully this scoring strategy will produce a metric for puzzles that increases as your subjective appraisal of difficulty increases. If it does not, then adjusting the scores of your rules (or your choice of heuristic from the above options) may achieve the desired correlation. Once you have achieved a consistent correlation between score and subjective experience, you should be able to judge what the numeric thresholds of "easy", "hard", etc. should be. And then you're done!

Donald Knuth studied the problem and came up with the Dancing Links algorithm for solving sudoku, and then rating the difficulty of them.
Google around, there are several implementations of the Dancing Links engine.

Perhaps you could grade the general "constrainedness" of a puzzle? Consider that a new puzzle (with only hints) might have a certain number of cells which can be determined simply by eliminating the values which it cannot contain. We could say these cells are "constrained" to a smaller number of possible values than the typical cell and the more highly constrained cells that exist the more progress one can make on the puzzle without guessing. (Here we consider the requirement for "guessing" to be what makes a puzzle hard.)
At some point, however, the player must start guessing and, again, the constrainedness of a cell is important because with fewer values to choose between for a given cell the easier it is to find the correct value (and increase the constrainedness of other cells).
Of course, I don't actually play Sudoku (I just enjoy writing games and solvers for it), so I have no idea if this is a valid metric, just thinking out loud =)

I have a simple solver that looks for only unique possibilities in rows, columns and squares. When it has solved the few cells solvable by this method, it then picks a remaining candidate tries it and sees if the simple solver then leads to either a solution or a cell empty of possibilities. In the first case the puzzle is solved, in the second, one possibility has shown to be infeasible and thus eliminated. In the third case, which leads to neither a final solution nor an infeasibility, no
deduction can be reached.
The primary result of cycling through this procedure is to eliminate possiblities until picking
a correct cell entry leads to a solution. So far this procedure has solved even the hardest
puzzles without fail. It solves without difficulty puzzles with multiple solutions. If the
trial candidates are picked a random, it will generate all possilbe solutions.
I then generate a difficulty for the puzzle based on the number of illegal candidates that must
be eliminated before the simple solver can find a solution.
I know that this is like guessing, but if simple logic can eliminated a possible candidate, then one
is closer to the final solution.
Mike

I've done this in the past.
The key is that you have to figure out which rules to use from a human logic perspective. The example you provide details a number of different human logic patterns as a list on the right-risde.
You actually need to solve the puzzle using these rules instead of computer rules (which can solve it in milliseconds using simple pattern replacement). Every time you change the board, you can start over from the 'easiest' pattern (say, single open boxes in a cell or row), and move down the chain until you find one the next logical 'rule' to use.
When scoring the sodoku, each methodology is assigned some point value, which you would add up for every field you needed to fill out. While 'single empty cell' might get a 0, 'XY Chain' might get 100. You tabulate all of the methods needed (and frequency) and you wind up with a final weighting. There are plenty of places that list expected values for those weightings, but they are all fairly empirical. You're trying to model human logic, so feel free to come up with your own weightings or enhance the system (if you really only use XY chains, the puzzle is probably easier than if it requires more advanced mechanisms).
You may also find that even though you have a unique sodoku, that it is unsolvable through human logic.
And also note that this is all far more CPU intensive than solving it in a standard, patterned way. Some years ago when I wrote my code, it was taking multiple (I forget exactly, but maybe even up to 15) seconds to solve some of the generated puzzles I'd created.

Assuming difficulty is directly proportional to the time it takes a user to solve the puzzle, here is an Artificially Intelligent solution that approaches the results of the ideal algorithm over time.
Randomly generate a fixed number of starting puzzle layouts, say 100.
Initially, offer a random difficulty section that let's a user play random puzzles from the available layouts.
Keep an average random solution time for each user. I would probably make a top 10/top X leaderboard for this to generate interest in playing random puzzles.
Keep an average solution time multiplier for each puzzle solution (if the user normally solves the puzzle in 5 minutes and solves it in 20 minutes, 4 should be figured in to the puzzles average solution time multiplier)
Once a puzzle has been played enough times to get a base difficulty for the puzzle, say 5 times, add that puzzle to your list of rated puzzles and add another randomly generated puzzle to your available puzzle layouts.
Note: You should keep the first puzzle in your random puzzles list so that you can get better and better statistics on it.
Once you have enough base-rated puzzles, say 50, allow users to access the "Rated Difficulty" portion of your application. The difficulty for each puzzle will be the average time multiplier for that puzzle.
Note: When users choose to play puzzles with rated difficulty, this should NOT affect the average random solution time or average solution time multiplier, unless you want to get into calculating weighted averages (otherwise if a user plays a lot of harder puzzles, their average time and time multipliers will be skewed).
Using the method above, a solution would be rated from 0 (already solved/no time to solve) to 1 (users will probably solve this puzzle in their average time) to 2 (users will probably take twice as long to solve this puzzle than their average time) to infinity (users will take forever to find a solution to this puzzle).

Algorithm problem: Packing rods into a row

Alright, this might be a tricky problem. It is actually an analogy for another similar problem relating to my actual application, but I've simplified it into this hypothetical problem for clarity. Here goes:
I have a line of rods I need to be sorted. Because it is a line, only 1 dimension needs to be of concern.
Rods are different lengths and different weights. There is no correlation between weight and length. A small rod can be extremely heavy, while a large rod can be very light.
The rods need to be sorted by weight.
The real catch is, however, some rods can only be placed no more than certain distances from the start of the line, regardless of their weight. Anywhere before that is fine, though.
No guarantee is given that constraints will be spaced enough away from each other to prevent the possibility of constrained rods being squeezed into overlapping. In this (hopefully rare) case, either the rods need to be re-arranged somehow within their constraints to create the needed space, or an ideal compromise solution may need to be found (such as violating a constraint of the least light rod, for example).
It is possible at a future date that additional constraints may be added *in addition to the length constraint to indicate specific (and even non-compromising) boundaries within the line where rods cannot overlap into.
My current solution does not account for the latter situations, and they sound like they'll involve some complex work to resolve them.
Note that this is for a client-side web application, so making the solution apply to Javascript would be helpful!

If it is possible I'd suggest formulating this as a mixed integer program. If you can encode the constraints in this was you can use a solver to satisfy the constraints.
See this page for some more info on this type of approach:
http://en.wikipedia.org/wiki/Linear_programming
If you can interface this to Javascript somehow then it might prove to be an elegant solution.

At first, I tried to approach this as a sorting problem. But I think it is better to think of it as an optimization problem. Let me try to formalize the problem. Given:
wi: weight of rod i
li: length of rod i
mi: maximum distance of rod i from origin. If there is no constraint, you can set this value to sum(i=1,n, li)
The problem is to find a permutation ai, such that the cost function:
J=sum(i=1,n, wai*sum(j=1,i-1, laj))
is minimized and the constraints:
sum(j=1,i-1, laj) <= mi, 1 <= i<n
are satisfied.
I am not sure this is a correct formulation, though. Without any constraints, the optimal solution is not always the rods sorted by weight. For example, let l={1,4}, and w={1,3}. If a={1,2}, then J is 1*0+3*1=3, and if a={2,1} (sorted by weight), J is 3*0+1*4=4. Clearly, the unsorted solution minimizes the cost function, but I am not sure this is what you want.
Also, I don't know how to solve the problem yet. You could try a heuristic search of some kind in the short term. I am writing this reformulation so that someone else can provide a solution while I think more about the solution. If it is correct, of course.
Another thing to note is that you don't have to find the complete solution to see if there is a solution. You can ignore the rods without position constraints, and try to solve the problem with only the constrained rods. If there is a solution to this, then the problem does have a solution (an obvious suboptimal solution is to sort the unconstrained rods, and append them to the solution of the reduced problem).
After saying all this, I think the algorithm below would do the trick. I will describe it a bit visually to make it easier to understand. The idea is to place rods on a line segment from left to right (origin is the leftmost point of the line segment) as per your problem description.
Separate out the rods with position constraints on them. Then, place them such that they are at the limit of their constrained positions.
If there are no overlapping rods, goto step 4
For each overlapping pair of rods, move the one closer to origin towards the origin so that they are no longer overlapping. This step may require other rods on the line to be shifted towards the origin to open up some space. You detect this by checking if the moved rod now overlaps with the one just to the left of it. If you cannot create enough space (moving the rod closest to origin to 0 still doesn't free up enough space), then there is no solution to the problem. Here, you have the opportunity to find a solution by relaxing the constraint on the rightmost rod of the original overlapping pair: just move it away from origin until there is no overlap (you may need to push preceding rods right until all overlaps are fixed before you do this).
Now, we have some rods placed, and some free spaces around them. Start filling up the free space with the heaviest rods (including the ones with constraints which are to the right of the free space) that would fit in it. If you cannot find any rods that would fit, simply shift the next rod on the right of the free space to close the gap.
Repeat step 4 until you reach the rightmost constrained rod. The remaining line segment is all free space.
Sort all left over rods by weight, and place them in the remaining free space.
A few notes about the algorithm:
It doesn't solve the problem I stated earlier. It tries to sort the rods according to their weights only.
I think there are some lost opportunities to do better, because we slide some rods towards the origin to make them all fit (in step 3), and sometimes pick the heavy rods from these "squeezed in" places, and put them closer to origin (in step 4). This frees up some room, but we don't slide the pushed away rods back to the limits of their constrained positions. It may be possible to do this, but I will have to revise the algorithm when my brain is working better.
It is not a terribly efficient algorithm. I have a feeling that it can be done in O(n^2), but anything better would require creative data structures. You need to be able to find the heaviest rod with length less than a given L faster than O(n) to do better.

I am not very good at solving algos. But here goes my attempt:
Relate this to a Knapsack problem
Instead of the return cost or value
of a box, let them be assigned the
higher value to the ones having
lesser limit of going farther.
Some thing like you are trying to
pack everything closer to the
starting point rather than into a
Knapsack as per the Knapsack problem.
As for the future date & modification
is concerned, I believe,using constraints which
are similar would require a modification in
the return value or cost of the box
only.

I'm 99% certain this can be cast as an integer knapsack problem with an extra constraint which, I think, can be accommodated by first considering the rods with the distance-from-start condition.
Here's a link to an explanation of the knapsack problem: http://www.g12.cs.mu.oz.au/wiki/doku.php?id=simple_knapsack

We Keep Coding

JavaScript is the programming language of the Web.