Suppose if you are given a bunch of points in (x,y) values and you need to generate points by linearly interpolate between the 2 nearest values in the x axis, what is the fastest implementation to do so?
I searched around but I was unable to find a satisfactory answer, I feel its because I wasnt searching for the right words.
For example, if I was given (0,0) (0.5 , 1) (1, 0.5), then I want to get a value at 0.7; it would be (0.7-0.5)/(1-0.5) * (0.5-1) + 1; but what data structure would allow me to find the 2 nearest key values to interpolate in between? Is a simple linear search/ binary search if I have many key values the best I could do?
The way I usually implement O(1) interpolation is by means of an additional data structure, which I call IntervalSelector that in time O(1) will give the two surrounding values of the sequence that have to be interpolated.
An IntervalSelector is a class that, when given a sequence of n abscissas builds and remembers a table that will map any given value of x to the index i such that sequence[i] <= x < sequence[i+1] in time O(1).
Note: In what follows arrays are 1 based.
The algorithm that builds the table proceeds as follow:
Find delta to be the minimum distance between two consecutive elements in the input sequence of abscissas.
Set count := (b-a)/delta + 1, where a and b are respectively the first and last of the (ascending) sequence and / stands for the integer quotient of the division.
Define table to be an Array of count elements.
For i between 1 and n set table[(sequence[j]-a)/delta + 1] := j.
Repeat every entry of table visited in 4 to the unvisited positions that come right after it.
On output, table maps j to i if (j-1)*d <= sequence[i] - a < j*d.
Here is an example:
Since elements 3rd and 4th are the closest ones, we divide the interval in subintervals of this smallest length. Now, we remember in the table the positions of the left end of each of these deta-intervals. Later on, when an input x is given, we compute the delta-interval of such x as (x-a)/delta + 1 and use the table to deduce the corresponding interval in the sequence. If x falls to the left of the ith sequence element, we choose the (i-1)th.
More precisely:
Given any input x between a and b calculate j := (x-a)/delta + 1 and i := table[j]. If x < sequence[i] put i := i - 1. Then, the index i satisfies sequence[i] <= x < sequence[i+1]; otherwise the distance between these two consecutive elements would be smaller than delta, which is not.
Remark: Be aware that if the minimum distance delta between consecutive elements in sequence is too small the table will have too many entries. The simple description I've presented here ignores these pathological cases, which require additional work.
Yes, a simple binary search should do well and will typically suffice.
If you need to get better, you might try interpolation search (has nothing to do with your value interpolation).
If your points are distributed at fixed intervals (like in your example, 0 0.5 1), you can also simply store the values in an array and access them in constant time via their index.
Related
I was trying to solve a 2D matrix problem, and stumble upon the matrix-array convert formula:
r * c matrix convert to an array => matrix[x][y] => arr[x * c + y]
an array convert to r * c matrix => arr[x] =>matrix[x / c][x % c]
But I can't seem to understand why it works like this. Can someone explain how the conversion works? I mean why the conversions manipulate the columns(c)? And why using subtraction and remainder([x / c][x % c])? And why x * c + y?
I was solving the problem in JavaScript.
First have a look at "Row Major Vs Column Major Order: 2D arrays access in programming languages" on Computing Science Stack Overflow as an insight to the main ways an array can be serialized:
Row order stores all the values for a row contiguously in serialized output, starting with first row, before proceeding to the next row.
Column order stores all the values for a column in serialized output, starting with the first column before proceeding to the next.
Formula Confusion
The quoted formula creates output in row order because it multiplies one of the indexes by the number of columns, not the number of rows, which means index x is being used as the row index. This is nonsensical in a spread sheet context where "x" means column and "y" means row.
JavaScript Formula for row order storage of a row/column matrix
matrix[row][column] is serialized to arr[row * c + col]
arr[index] is restored to matrix[ Math.floor(index/c), index % c]
where calculating the row in de-serialization requires truncation of the division result since there is no integral division operator in JavaScript to match that used in the quoted formula.
Typical JavaScript Application
imageData store canvas pixel data in row order:
Pixels then proceed from left to right, then downward, throughout the [serialized] array.
It's not that you can't use x and y variable names to access image pixel data, but bear in mind x is a column index, and y is the row number from the top.
I have an ordered data set of decimal numbers. This data is always similar - but not always the same. The expected data is a few, 0 - 5 large numbers, followed by several (10 - 90) average numbers then follow by smaller numbers. There are cases where a large number may be mixed into the average numbers' See the following arrays.
let expectedData = [35.267,9.267,9.332,9.186,9.220,9.141,9.107,9.114,9.098,9.181,9.220,4.012,0.132];
let expectedData = [35.267,32.267,9.267,9.332,9.186,9.220,9.141,9.107,30.267,9.114,9.098,9.181,9.220,4.012,0.132];
I am trying to analyze the data by getting the average without high numbers on front and low numbers on back. The middle high/low are fine to keep in the average. I have a partial solution below. Right now I am sort of brute forcing it but the solution isn't perfect. On smaller datasets the first average calculation is influenced by the large number.
My question is: Is there a way to handle this type of problem, which is identifying patterns in an array of numbers?
My algorithm is:
Get an average of the array
Calculate an above/below average value
Remove front (n) elements that are above average
remove end elements that are below average
Recalculate average
In JavaScript I have: (this is partial leaving out below average)
let total= expectedData.reduce((rt,cur)=> {return rt+cur;}, 0);
let avg = total/expectedData.length;
let aboveAvg = avg*0.1+avg;
let remove = -1;
for(let k=0;k<expectedData.length;k++) {
if(expectedData[k] > aboveAvg) {
remove=k;
} else {
if(k==0) {
remove = -1;//no need to remove
}
//break because we don't want large values from middle removed.
break;
}
}
if(remove >= 0 ) {
//remove front above average
expectedData.splice(0,remove+1);
}
//remove belows
//recalculate average
I believe you are looking for some outlier detection Algorithm. There are already a bunch of questions related to this on Stack overflow.
However, each outlier detection algorithm has its own merits.
Here are a few of them
https://mathworld.wolfram.com/Outlier.html
High outliers are anything beyond the 3rd quartile + 1.5 * the inter-quartile range (IQR)
Low outliers are anything beneath the 1st quartile - 1.5 * IQR
Grubbs's test
You can check how it works for your expectations here
Apart from these 2, the is a comparison calculator here . You can visit this to use other Algorithms per your need.
I would have tried to get a sliding window coupled with an hysteresis / band filter in order to detect the high value peaks, first.
Then, when your sliding windows advance, you can add the previous first value (which is now the last of analyzed values) to the global sum, and add 1 to the number of total values.
When you encounter a peak (=something that causes the hysteresis to move or overflow the band filter), you either remove the values (may be costly), or better, you set the value to NaN so you can safely ignore it.
You should keep computing a sliding average within your sliding window in order to be able to auto-correct the hysteresis/band filter, so it will reject only the start values of a peak (the end values are the start values of the next one), but once values are stabilized to a new level, values will be kept again.
The size of the sliding window will set how much consecutive "stable" values are needed to be kept, or in other words how much UNstable values are rejected when you reach a new level.
For that, you can check the mode of the values (rounded) and then take all the numbers in a certain range around the mode. That range can be taken from the data itself, for example by taking the 10% of the max - min value. That helps you to filter your data. You can select the percent that fits your needs. Something like this:
let expectedData = [35.267,9.267,9.332,9.186,9.220,9.141,9.107,9.114,9.098,9.181,9.220,4.012,0.132];
expectedData.sort((a, b) => a - b);
/// Get the range of the data
const RANGE = expectedData[ expectedData.length - 1 ] - expectedData[0];
const WINDOW = 0.1; /// Window of selection 10% from left and right
/// Frequency of each number
let dist = expectedData.reduce((acc, e) => (acc[ Math.floor(e) ] = (acc[ Math.floor(e) ] || 0) + 1, acc), {});
let mode = +Object.entries(dist).sort((a, b) => b[1] - a[1])[0][0];
let newData = expectedData.filter(e => mode - RANGE * WINDOW <= e && e <= mode + RANGE * WINDOW);
console.log(newData);
I'm writing a map editor that converts a 3D space into JavaScript arrays so that it can be exported to a JSON file.
Each map will have a 2D plane acting as a ground layer (the user will have to specify X and Y size), then to add height, the user can place blocks on top of this 2D plane, following a X & Y grid (similar to Minecraft).
My idea was to have an array for each Z layer, and fill it with the information about which blocks are placed there. Because the X and Y sizes of the map must be specified, a simple array should do the trick, as to read the map you would simply loop for each Z layer array and fill the map with its contents, which would be another array. Creating rows defined by the X and Y size of the ground layer.
I know you can fill arrays like layer[165] = grassBlock
after you declare them, Which would make everything before index 165 empty and thus save space. But in a JSON format, wouldn't that array have 164 zeroes or nulls before it reaches this index?
Is this even the most efficient way to store a 3D space? I'm trying to minimize map size and speed up load time as much as possible.
If you only have block/empty then a single bit is sufficient and you can use a single array Javascript typed array for the matrix.
Assuming size of the matrix is X, Y and Z then the conversion from coordinates (x, y, z) to array index could be:
index = (x*Y + y)*Z + z;
then the map could be stored as a single Uint8Array object initialized with length (X*Y*Z + 7) >> 3 (each of the bytes will give you 8 bits but you need to round up).
To read/write a single bit you can finally use
bit = (matrix[index >> 3] >> (index & 7)) & 1; // Read element
matrix[index >> 3] |= 1 << (index & 7); // Set element to 1
matrix[index >> 3] &= ~(1 << (index & 7)); // Clear element to 0
If instead you need to store a logical ID and there are no more than 256 distinct values (including "empty") then a single byte per element is enough. The index computation is as above but you can use as size X*Y*Z and then simply read/write element with matrix[index].
If more than 256 but less than 65537 distinct values are needed then a Uint16Array can be used.
If most of the elements do not carry specific data except the class (e.g. they're just "air", "land", "water") and only a small percentage require much more then may be a byte map with a value for "other" and then just a dictionary mapping (x,y,z) to data only for "other" blocks is a resonable approach (very simple code, still fast access and update).
Note that while Javascript has data types to store binary data efficiently unfortunately JSON doesn't provide a type to send/receive arbitrary bytes (not text) over the network and you'll need to convert to and load from base64 encoding or something similar (if you want to use JSON).
I'm implementing a purpose-built regular expression engine using finite automata. I will have to store thousands of states, each state with its own transition table from unicode code points (or UTF-16 code units; I haven't decided) to state IDs.
In many cases, the table will be extremely sparse, but in other cases it will be nearly full. In most cases, most of the entries will fall into several contiguous ranges with the same value.
The simplest implementation would be a lookup table, but each such table would take up a great deal of space. A list of (range, value) pairs would be much smaller, but slower. A binary search tree would be faster than a list.
Is there a better approach, perhaps leveraging built-in functionality?
Unfortunately, JavaScript's built-in data-types - especially Map - are not of great help in accomplishing this task, as they lack the relevant methods.
In most cases, most of the entries will fall into several contiguous
ranges with the same value.
We can however exploit this and use a binary search strategy on sorted arrays, assuming the transition tables won't be modified often.
Encode contiguous input ranges leading to the same state by storing each input range's lowest value in a sorted array. Keep the states at corresponding indices in a separate array:
let inputs = [0, 5, 10]; // Input ranges [0,4], [5,9], [10,∞)
let states = [0, 1, 0 ]; // Inputs [0,4] lead to state 0, [5,9] to 1, [10,∞) to 0
Now, given an input, you need to perform a binary search on the inputs array similar to Java's floorEntry(k):
// Returns the index of the greatest element less than or equal to
// the given element, or undefined if there is no such element:
function floorIndex(sorted, element) {
let low = 0;
let high = sorted.length - 1;
while (low <= high) {
let mid = low + high >> 1;
if (sorted[mid] > element) {
high = mid - 1;
} else if (sorted[mid] < element) {
low = mid + 1;
} else {
return mid
}
}
return low - 1;
}
// Example: Transition to 1 for emoticons in range 1F600 - 1F64F:
let transitions = {
inputs: [0x00000, 0x1F600, 0x1F650],
states: [0, 1, 0 ]
};
let input = 0x1F60B; // 😋
let next = transitions.states[floorIndex(transitions.inputs, input)];
console.log(`transition to ${next}`);
This search completes in O(log n) steps where n is the number of contiguous input ranges. The transition table for a single state then has a space requirement of O(n). This approach works equally well for sparse and dense transition tables as long as our initial assumption - the number of contiguous input ranges leading to the same state is small - holds.
Sounds like you have two very different cases ("in many cases, the table will be extremely sparse, but in other cases it will be nearly full").
For the sparse case, you could possibly have a separate sparse index (or several layers of indexes), then your actual data could be stored in a typed array. Because the index(es) would be mapping from integers to integers, they could be represented as typed arrays as well.
Looking up a value would look like this:
Binary search the index. The index stores pairs as consecutive entries in the typed array – the first element is the search value, the second is the position in the data set (or the next index).
If you have multiple indexes, repeat 1 as necessary.
Start iterating your dataset at the position given by the last index. Because the index is sparse, this position might not be the one where the value is stored, but it is a good starting point as the correct value is guaranteed to be nearby.
The dataset itself is represented as a typed array where consecutive pairs hold the key and the value.
I cannot think of anything better to use in JavaScript. Typed arrays are pretty fast and having indexes should increase the speed drastically. That being said, if you only have a couple thousand entries, don't bother with indexes, do a binary search directly on the typed array (described in 4. above).
For the dense case, I am not sure. If the dense case happens to be a case where repeated values across ranges of keys are likely, consider using something like run-length encoding – identical consecutive values are represented simply as their number of occurrences and then the actual value. Once again, use typed arrays and binary search, possibly even indexes to make this faster.
Basically I can only plot 1000 values on a chart but my dataset frequently has more than 1000 values.
So... let's say I have 3000 values - that's easy, every 3rd point is plotted (if i / 3 == 1). What about when it's a number like 2106? I'm trying to plot evenly.
for(var i = 0; i < chartdata.node.length; i++){
//something in here
}
Since your may have more or less than 1000 I would go with something like this
var inc = Math.floor(chartdata.node.length / 1000);
if ( inc==0 )
inc=1;
for ( var i=0; i<chartdata.node.length; i+=inc )
{
}
Exactly 1000 points, slightly irregular spacing
Let A be the number of data points you have (i.e. 2106) and B be the number of data points you want to use (i.e. 1000). In a continuous case, you'd space your plot points at every A/B data points. With discrete data points, you can do the following: maintain a counter C, initialized to zero. For every one of the A input data points, you add B to that counter. If the resulting value is larger than A, you plot the data point and subtract A from the counter. On the whole, you'll have added the value B A times, and subtracted A B times, so you should end up with a zero counter again, having plotted exactly B data items.
You can tweak this to obtain different behaviour at the end points, e.g. to always include the first and last data point. Simply plot the first point unconditionally, then do the above scheme for the remaining points, i.e. with A=2105 and B=999. One benefit of this whole approach is that all of this works in integer arithmetic, so rounding errors will be of no concern to you.
Perfectly regular spacing, but less data points
If even spacing is more important, then you can simply compute the amount by which you want to increment your index for every plot using floor(A/B). Due to the floor function, this will be a smaller number than the fractional resoult would be. In the worst case, a number which is almost two will get rounded down to one, resulting in only slightly more than 500 data points being actually plotted. These will be evenly spaced, though.
You could try something like this (in pseudo-code):
var desired_data_length = 1000;
for (var i = 0; i < desired_data_length; i++)
{
var actual_i = int(float(i) / float(desired_data_length) * float(chartdata.length));
// do something with actual_i as the index
}
This will use desired_data_length number of indices, and will linearly interpolate from [0,desired_data_length) to [0,chartdata.length), which is what you want.
If the data is purely numerical you may try Typed Arrays.