efficiently finding an object that falls within a certain number range

efficiently finding an object that falls within a certain number range - javascript

Here's my basic problem: I'm given a currentTime. For example, 750 seconds. I also have an array which contains 1000 to 2000 objects, each of which have a startTime, endTime, and an _id attribute. Given the currentTime, I need to find the object that has a startTime and endTime that falls within that range -- for example, startTime : 740, endTime : 755.
What is the most efficient way to do this in Javascript?
I've simply been doing something like this, for starters:
var arrayLength = array.length;
var x = 0;
while (x < arrayLength) {
if (currentTime >= array[x].startTime && currentTime <= array[x].endTime) {
// then I've found my object
}
x++;
};
But I suspect that looping isn't the best option here. Any suggestions?
EDIT: For clarity, the currentTime has to fall within the startTime and endTime
My solution: The structure of my data affords me certain benefits that allows me to simplify things a bit. I've done a basic binary search, as suggested, since the array is already sorted by startTime. I haven't fully tested the speed of this thing, but I suspect it's a fair bit faster, especially with larger arrays.
var binarySearch = function(array, currentTime) {
var low = 0;
var high = array.length - 1;
var i;
while (low <= high) {
i = Math.floor((low + high) / 2);
if (array[i].startTime <= currentTime) {
if (array[i].endTime >= currentTime ){
// this is the one
return array[i]._id;
} else {
low = i + 1;
}
}
else {
high = i - 1;
}
}
return null;
}

The best way to tackle this problem depends on the number of times you will have to call your search function.
If you call your function just a few times, let's say m times, go for linear search. The overall complexity for the calls of this function will be O(mn).
If you call your function many times, and by many I mean more than log(n) times, you should:
Sort your array in O(nlogn) by startTime, then by endTime if you have several items with equal values of startTime
Do binary search to find the range of elements with startTime <= x. This means doing two binary searches: one for the start of the range and one for the end of the range. This is done in O(logn)
Do linear search inside [start, end]. You have to do linear search because the order of startTimes tells you nothing about the endTimes. This can be anywhere between O(1) and O(n) and it depends on the distribution of your segments and the value of x.
Average case: O(nlogn) for initialization and O(logn) for each search.
Worst case: an array containing many equal segments, or segments that have a common interval, and searching in this interval. In that case you will do O(nlogn) for initialization and O(n + logn) = O(n) for search.

Sounds like a problem for binary search.

Assuming that your search array is long-lived and relatively constant, the first iteration would be to sort all the array elements by start time (or create an index of sorted start times pointing to the array elements if you don't want them sorted).
Then you can efficiently (with a binary chop) discount ones that start too late. A sequential search of the others would then be faster.
For even more speed, maintain separate sorted indexes for start and end times. Then do the same operation as mentioned previously to throw away those that start too late.
Then, for the remaining ones, use the end time index to throw away those that end too early, and what you have left is your candidate list.
But, make sure this is actually needed. Two thousand elements doesn't seem like a huge amount so you should time the current approach and only attempt optimisation if it is indeed a problem.

From the information given it is not possible to tell what would be the best solution. If the array is not sorted, looping is the best way for single queries. A single scan along the array only takes O(N) (where N is the length of the array), whereas sorting it and then doing a binary search would take O(N log(N) + log(N)), thus it would in this case take more time.
The analysis will look much different if you have a great number of different queries on the same large array. If you have about N queries on the same array, sorting might actually improve the performance, as each Query will take O(log(N)). Thus for N queries it will require O(N log(N)) (the remaining log(N) gets dropped now) whereas the unsorted search will also take O(N^2) which is clearly larger. When sorting starts to make an impact exactly also depends on the size of the array.
The situation is also different again, when you update the array fairly often. Updating an unsorted array can be done in O(1) amortized, whereas updating a sorted array takes O(N). So if you have fairly frequent updates sorting might hurt.
There are also some very efficient data structures for range queries, however again it depends on the actual usage if they make sense or not.

If the array is not sorted, yours is the correct way.
Do not fall into the trap of thinking to sort the array first, and then apply your search.
With the code you tried, you have a complexity of O(n), where n is the number of elements.
If you sort the array first, you first fall into a complexity of O(n log(n)) (compare to Sorting algorithm), in the average case.
Then you have to apply the binary search, which executes at an average complexity of O(log_ 2(n) - 1).
So, you will end up by spending, in the average case:
O(n log(n) + (log_2(n) - 1))
instead of just O(n).

An interval tree is a data structure that allows answering such queries in O(lg n) time (both average and worst-case), if there are n intervals total. Preprocessing time to construct the data structure is O(n lg n); space is O(n). Insertion and deletion times are O(lg n) for augmented interval trees. Time to answer all-interval queries is O(m + lg n) if m intervals cover a point. Wikipedia describes several kinds of interval trees; for example, a centered interval tree is a tertiary tree with each node storing:
• A center point
• A pointer to another node containing all intervals completely to the left of the center point
• A pointer to another node containing all intervals completely to the right of the center point
• All intervals overlapping the center point sorted by their beginning point
• All intervals overlapping the center point sorted by their ending point
Note, an interval tree has O(lg n) complexity for both average and worst-case queries that find one interval to cover a point. The previous answers have O(n) worst-case query performance for the same. Several previous answers claimed that they have O(lg n) average time. But none of them offer evidence; instead they merely assert that average performance is O(lg n). The main feature of those previous answers is using a binary search for begin times. Then some say to use a linear search, and others say to use a binary search, for end times, but without making clear what set of intervals the latter search is over. They claim to have O(lg n) average performance, but that is merely wishful thinking. As pointed out in the wikipedia article under the heading Naive Approach,
A naive approach might be to build two parallel trees, one ordered by the beginning point, and one ordered by the ending point of each interval. This allows discarding half of each tree in O(log n) time, but the results must be merged, requiring O(n) time. This gives us queries in O(n + log n) = O(n), which is no better than brute-force.

Related

What is the Big o notation of the longest consecutive sequence algorithm below

I google the solution for the longest consecutive sequence algorithm here is the link to the question in leetcode leetcodeQuestion and below is the algorithm solution. I was confused by some of the comments on the website where i got the solution some where saying that the solution is not O(n) time complexity since there is a for loop nested inside of another for loop, that make sense to me but when i run the question on leetcode it works without any issues even though one of the constrains is that the solution must be O(n) time complexity, which makes me assume it in fact is O(n). so is this algorithm O(n) or not?. If my question is missing information let me know and i will update it quickly
var longestConsecutive = function(nums) {
if (nums == null || nums.length === 0) return 0;
const set = new Set(nums);
let longest = 0;
for (let num of nums) {
if (!set.has(num - 1)) {
let count = 0;
while (set.has(count+num)) {
count++;
}
longest = Math.max(longest,count);
}
}
return longest;
};

The algorithm is O(n). Pasting the explaination from the same site below :
Complexity Analysis
Time complexity : O(n).
Although the time complexity appears to be quadratic due to the while
loop nested within the for loop, closer inspection reveals it to be
linear. Because the while loop is reached only when currentNum marks
the beginning of a sequence (i.e. currentNum-1 is not present in
nums), the while loop can only run for nnn iterations throughout the
entire runtime of the algorithm. This means that despite looking like
O(n⋅n) complexity, the nested loops actually run in
O(n+n)=O(n) time. All other computations
occur in constant time, so the overall runtime is linear.
Space complexity : O(n).
In order to set up O(1) containment lookups, we allocate
linear space for a hash table to store the O(n) numbers in
nums. Other than that, the space complexity is identical to that of
the brute force solution.
Also, sometimes in Leetcode you can submit a solution with higher complexity than the asked complexity but it does not mean that the accepted solution is efficient. Generally for the non efficient solution the site says 'Your solution beats x % of solution' where x is a pretty low number .

How to find the max subset of an array faster than it takes to sort?

I currently have a very large array that I would like to have the top n items from, faster than it takes to sort the array. Conceptually I'm pretty sure it's possible to beat whatever sorting algorithm the JS interpreter is using.
Here is the code I'm currently using, which sorts an array of tuples by the second element, and then returns the top 100 tuples. The first item in the tuple is the label, so the desired output is the top 100 items with the highest value.
// Sort the array based on the second element
items.sort(function(first, second) {
return second[1] - first[1];
});
// Create a new array with only the first 100 items
const topItems = items.slice(0, 100);

You could use the quickselect algorithm, which has average and best case complexity O(n) with worst case complexity of O(n²).
Getting the top N elements can be achieved by using quickselect to get the N-th element, and since the algorithm also leaves the array halfway sorted, you can just take all the elements over (to the right of) the selected element.
See: Quickselect algorithm

We can definitely beat O(n log n) average complexity. For consistent results, since you've declared that we need just the top 100, that is considered a constant (especially since it is small) and using a heap and traversing the array once, we can have O(n log 100) ≈ O(n * 6.6) = O(n). We can be more adventurous and risky by using the introselect algorithm to select the (N - 100)th element (where N is the array length) in average O(n) time but worst case O(n log n). Then traverse again to select the 100 elements greater than that one.

How to determine Big-o complexity if it only depends on values of input rather than input size?

I just saw javascript code about sorting which uses setTimeout as shown
var list = [2, 5, 10, 4, 8, 32];
var result = [];
list.forEach( n => setTimeout(() => result.push(n), n));
It is interesting because in js setTimeout is asynchronous so if you wait for sufficient time, result will be sorted array. It is deterministic depends on only values of data but not the size of the input so I have no idea how to determine Big-O (time complexity) of this approach.

TLDR; it depends on how you define the complexity of setTimeout()
When discussing algorithmic complexity, we have to answer the following questions:
What are my inputs?
What is a unit of work in the hypothetical machine that my algorithm runs in?
In some cases, how we define our inputs is dependent on what the algorithm is doing and how we defined our unit of work. The problem is complicated when using built-in functions as we have to define the complexity of those functions so we can take them into account and calculate the overall complexity of the algorithm.
What is the complexity of setTimeout()? That's up for interpretation. I find it helpful to give setTimeout() a complexity of O(n), where n is the number of milliseconds passed to the function. In this case I've decided that each millisecond that is counted internally by setTimeout() represents one unit of work.
Given that setTimeout() has complexity O(n), we must now determine how it fits into the rest of our algorithm. Because we are looping through list and calling setTimeout() for each member of the list, we multiply n with another variable, let's call it k to represent the size of the list.
Putting it all together, the algorithm has complexity O(k * n), where k is the length of the numbers given, and n is the maximum value in the list.
Does this complexity make sense? Let's do a sanity check by interpreting the results of our analysis:
Our algorithm takes longer as we give it more numbers ✓
Our algorithm takes longer as we give it larger numbers ✓
Notice that the key to this conclusion was determining the complexity of setTimeout(). Had we given it a constant O(1) complexity, our end result would have been O(k), which IMO is misleading.
Edit:
Perhaps a more correct interpretation of setTimeout()'s contribution to our complexity is O(n) for all inputs, where n is the maximum value of a given list, regardless of how many times it is called.
In the original post, I made the assumption that setTimeout() would run n times for each item in the list, but this logic is slightly flawed as setTimeout() conceptually "caches" previous values, so if it is called with setTimeout(30), setTimeout(50), and setTimeout(100), it will run 100 units of work (as opposed to 180 units of work, which was the case in the original post).
Given this new "cached" interpretation of setTimeout(), the complexity is O(k + n), where k is the length of the list, and n is the maximum value in the list.
Fun fact:
This happens to have the same complexity as Counting Sort, whose complexity is also a function of list size and max list value

Why is using a loop to iterate from start of array to end faster than iterating both start to end and end to start?

Given an array having .length 100 containing elements having values 0 to 99 at the respective indexes, where the requirement is to find element of of array equal to n : 51.
Why is using a loop to iterate from start of array to end faster than iterating both start to end and end to start?
const arr = Array.from({length: 100}, (_, i) => i);
const n = 51;
const len = arr.length;
console.time("iterate from start");
for (let i = 0; i < len; i++) {
if (arr[i] === n) break;
}
console.timeEnd("iterate from start");
const arr = Array.from({length: 100}, (_, i) => i);
const n = 51;
const len = arr.length;
console.time("iterate from start and end");
for (let i = 0, k = len - 1; i < len && k >= 0; i++, k--) {
if (arr[i] === n || arr[k] === n) break;
}
console.timeEnd("iterate from start and end");
jsperf https://jsperf.com/iterate-from-start-iterate-from-start-and-end/1

The answer is pretty obvious:
More operations take more time.
When judging the speed of code, you look at how many operations it will perform. Just step through and count them. Every instruction will take one or more CPU cycles, and the more there are the longer it will take to run. That different instructions take a different amount of cycles mostly does not matter - while an array lookup might be more costly than integer arithmetic, both of them basically take constant time and if there are too many, it dominates the cost of our algorithm.
In your example, there are few different types of operations that you might want to count individually:
comparisons
increments/decrements
array lookup
conditional jumps
(we could be more granular, such as counting variable fetch and store operations, but those hardly matter - everything is in registers anyway - and their number basically is linear to the others).
Now both of your code iterate about 50 times - they element on which they break the loop is in the middle of the array. Ignoring off-by-a-few errors, those are the counts:
| forwards | forwards and backwards
---------------+------------+------------------------
>=/===/< | 100 | 200
++/-- | 50 | 100
a[b] | 50 | 100
&&/||/if/for | 100 | 200
Given that, it's not unexpected that doing twice the works takes considerably longer.
I'll also answer a few questions from your comments:
Is additional time needed for the second object lookup?
Yes, every individual lookup counts. It's not like they could be performed at once, or optimised into a single lookup (imaginable if they had looked up the same index).
Should there be two separate loops for each start to end and end to start?
Doesn't matter for the number of operations, just for their order.
Or, put differently still, what is the fastest approach to find an element in an array?
There is no "fastest" regarding the order, if you don't know where the element is (and they are evenly distributed) you have to try every index. Any order - even random ones - would work the same. Notice however that your code is strictly worse, as it looks at each index twice when the element is not found - it does not stop in the middle.
But still, there are a few different approaches at micro-optimising such a loop - check these benchmarks.
let is (still?) slower than var, see Why is using `let` inside a `for` loop so slow on Chrome? and Why is let slower than var in a for loop in nodejs?. This tear-up and tear-down (about 50 times) of the loop body scope in fact does dominate your runtime - that's why your inefficient code isn't completely twice as slow.
comparing against 0 is marginally faster than comparing against the length, which puts looping backwards at an advantage. See Why is iterating through an array backwards faster than forwards, JavaScript loop performance - Why is to decrement the iterator toward 0 faster than incrementing and Are loops really faster in reverse?
in general, see What's the fastest way to loop through an array in JavaScript?: it changes from engine update to engine update. Don't do anything weird, write idiomatic code, that's what will get optimised better.

#Bergi is correct. More operations is more time. Why? More CPU clock cycles.
Time is really a reference to how many clock cycles it takes to execute the code.
In order to get to the nitty-gritty of that you need to look at the machine level code (like assembly level code) to find the true evidence. Each CPU (core?) clock cycle can execute one instruction, so how many instructions are you executing?
I haven't counted the clock cycles in a long time since programming Motorola CPUs for embedded applications. If your code is taking longer then it is in fact generating a larger instruction set of machine code, even if the loop is shorter or runs an equal amount of times.
Never forget that your code is actually getting compiled into a set of commands that the CPU is going to execute (memory pointers, instruction-code level pointers, interrupts, etc.). That is how computers work and its easier to understand at the micro controller level like an ARM or Motorola processor but the same is true for the sophisticated machines that we are running on today.
Your code simply does not run the way you write it (sounds crazy right?). It is run as it is compiled to run as machine level instructions (writing a compiler is no fun). Mathematical expression and logic can be compiled in to quite a heap of assembly, machine level code and that is up to how the compiler chooses to interpret it (it is bit shifting, etc, remember binary mathematics anyone?)
Reference:
https://software.intel.com/en-us/articles/introduction-to-x64-assembly
Your question is hard to answer but as #Bergi stated the more operations the longer, but why? The more clock cycles it takes to execute your code. Dual core, quad core, threading, assembly (machine language) it is complex. But no code gets executed as you have written it. C++, C, Pascal, JavaScript, Java, unless you are writing in assembly (even that compiles down to machine code) but it is closer to actual execution code.
A masters in CS and you will get to counting clock cycles and sort times. You will likely make you own language framed on machine instruction sets.
Most people say who cares? Memory is cheap today and CPUs are screaming fast and getting faster.
But there are some critical applications where 10 ms matters, where an immediate interrupt is needed, etc.
Commerce, NASA, a Nuclear power plant, Defense Contractors, some robotics, you get the idea . . .
I vote let it ride and keep moving.
Cheers,
Wookie

Since the element you're looking for is always roughly in the middle of the array, you should expect the version that walks inward from both the start and end of the array to take about twice as long as one that just starts from the beginning.
Each variable update takes time, each comparison takes time, and you're doing twice as many of them. Since you know it will take one or two less iterations of the loop to terminate in this version, you should reason it will cost about twice as much CPU time.
This strategy is still O(n) time complexity since it only looks at each item once, it's just specifically worse when the item is near the center of the list. If it's near the end, this approach will have a better expected runtime. Try looking for item 90 in both, for example.

Selected answer is excellent. I'd like to add another aspect: Try findIndex(), it's 2-3 times faster than using loops:
const arr = Array.from({length: 900}, (_, i) => i);
const n = 51;
const len = arr.length;
console.time("iterate from start");
for (let i = 0; i < len; i++) {
if (arr[i] === n) break;
}
console.timeEnd("iterate from start");
console.time("iterate using findIndex");
var i = arr.findIndex(function(v) {
return v === n;
});
console.timeEnd("iterate using findIndex");

The other answers here cover the main reasons, but I think an interesting addition could be mentioning cache.
In general, sequentially accessing an array will be more efficient, particularly with large arrays. When your CPU reads an array from memory, it also fetches nearby memory locations into cache. This means that when you fetch element n, element n+1 is also probably loaded into cache. Now, cache is relatively big these days, so your 100 int array can probably fit comfortably in cache. However, on an array of much larger size, reading sequentially will be faster than switching between the beginning and the end of the array.

Maximum size of an Array in Javascript

Context: I'm building a little site that reads an rss feed, and updates/checks the feed in the background. I have one array to store data to display, and another which stores ID's of records that have been shown.
Question: How many items can an array hold in Javascript before things start getting slow, or sluggish. I'm not sorting the array, but am using jQuery's inArray function to do a comparison.
The website will be left running, and updating and its unlikely that the browser will be restarted / refreshed that often.
If I should think about clearing some records from the array, what is the best way to remove some records after a limit, like 100 items.

The maximum length until "it gets sluggish" is totally dependent on your target machine and your actual code, so you'll need to test on that (those) platform(s) to see what is acceptable.
However, the maximum length of an array according to the ECMA-262 5th Edition specification is bound by an unsigned 32-bit integer due to the ToUint32 abstract operation, so the longest possible array could have 232-1 = 4,294,967,295 = 4.29 billion elements.

No need to trim the array, simply address it as a circular buffer (index % maxlen). This will ensure it never goes over the limit (implementing a circular buffer means that once you get to the end you wrap around to the beginning again - not possible to overrun the end of the array).
For example:
var container = new Array ();
var maxlen = 100;
var index = 0;
// 'store' 1538 items (only the last 'maxlen' items are kept)
for (var i=0; i<1538; i++) {
container [index++ % maxlen] = "storing" + i;
}
// get element at index 11 (you want the 11th item in the array)
eleventh = container [(index + 11) % maxlen];
// get element at index 11 (you want the 11th item in the array)
thirtyfifth = container [(index + 35) % maxlen];
// print out all 100 elements that we have left in the array, note
// that it doesn't matter if we address past 100 - circular buffer
// so we'll simply get back to the beginning if we do that.
for (i=0; i<200; i++) {
document.write (container[(index + i) % maxlen] + "<br>\n");
}

Like #maerics said, your target machine and browser will determine performance.
But for some real world numbers, on my 2017 enterprise Chromebook, running the operation:
console.time();
Array(x).fill(0).filter(x => x < 6).length
console.timeEnd();
x=5e4 takes 16ms, good enough for 60fps
x=4e6 takes 250ms, which is noticeable but not a big deal
x=3e7 takes 1300ms, which is pretty bad
x=4e7 takes 11000ms and allocates an extra 2.5GB of memory
So around 30 million elements is a hard upper limit, because the javascript VM falls off a cliff at 40 million elements and will probably crash the process.
EDIT: In the code above, I'm actually filling the array with elements and looping over them, simulating the minimum of what an app might want to do with an array. If you just run Array(2**32-1) you're creating a sparse array that's closer to an empty JavaScript object with a length, like {length: 4294967295}. If you actually tried to use all those 4 billion elements, you'll definitely crash the javascript process.

You could try something like this to test and trim the length:
http://jsfiddle.net/orolo/wJDXL/
var longArray = [1, 2, 3, 4, 5, 6, 7, 8];
if (longArray.length >= 6) {
longArray.length = 3;
}
alert(longArray); //1, 2, 3

I have built a performance framework that manipulates and graphs millions of datasets, and even then, the javascript calculation latency was on order of tens of milliseconds. Unless you're worried about going over the array size limit, I don't think you have much to worry about.

It will be very browser dependant. 100 items doesn't sound like a large number - I expect you could go a lot higher than that. Thousands shouldn't be a problem. What may be a problem is the total memory consumption.

I have shamelessly pulled some pretty big datasets in memory, and altough it did get sluggish it took maybe 15 Mo of data upwards with pretty intense calculations on the dataset. I doubt you will run into problems with memory unless you have intense calculations on the data and many many rows. Profiling and benchmarking with different mock resultsets will be your best bet to evaluate performance.

We Keep Coding

JavaScript is the programming language of the Web.