Search list for objects valid in a time range - javascript

I have the following data structure which describes an object and the time period which it's valid. Pretend the numbers below are unix timestamps.
{
"id": 1234,
"valid_from": 2000
"valid_to": 4000
},
{
"id": 1235,
"valid_from": 1000,
"valid_to": 2200,
}
...
I want to quickly be able to store these items in JavaScript and then query for items which are valid at a certain time.
For example if I were to query for objects valid at 2100 I would get [1234, 1235]. If I were to query for objects valid at 3999 I would get [1234], and at 4999 nothing.
I will have a size of about 50-100k items in the structure and I'd like fast lookup speeds but inserts, and deletes could be slower.
Items will have duplicate valid_from and valid_to values so it needs to support duplicates. Items will overlap.
I will need to be continually inserting data into the structure (probably by bulk for initial load, and then one off updates as data changes). I will also be periodically modifying records so likely a remove and insert.
I am not sure what the best approach to this is in a highly efficient manner?
Algorithms are not my strong suit but if I just know the correct approach I can research the algorithms themselves.
My Idea:
I was originally thinking a modified binary search tree to support duplicate keys and closest lookup, but this would only allow me to query objects that are > valid_from or < valid_to.
This would involve me bisecting the array or tree to find all items > valid_from and then manually checking each one for valid_to.
I suppose I could have two search trees, one for valid_to and valid_from, then I could check which id's from the results overlap and return those id's?
This still seems kind of hacky to me? Is there a better approach someone can recommend or is this how it's done.

Imagine you have two lists: initiation/begin and expiration/end. Both are sorted by TIME.
Given a particular time, you can find where in each list the first item is that meets the criteria by binary search. You can also do inserts by binary search into each list.
For example, if there are 1000 items and begin location is 342, then items 1-342 are possible, and if end location is 901, then items 901-1000 in the termination list are possible. You now need to intersect both sublists.
Take item IDs from 1-342 in begin and 901-1000 in end, and put them in a separate array (allocated ahead of time). Sort the array. Traverse the array. Whenever the same ID appears twice in a row, it is a hit, a valid match.

Related

Firebase get a range data

If I have a collection, and collection contain several documents.
Every document contain field age
The id of collection is data
There are 20 documents.
db.collection('data').orderBy('age')
My question is:
How to get certain range of that documents which is already order by age.
Example:
After order by age
[doc5, doc12, doc9, doc4, doc1, doc15, doc7, doc14, doc11, doc17,
doc3, doc2, doc13, doc8, doc6, doc18, doc20, doc16, doc19, doc10]
Fourth to Sixth (doc4, doc1, doc15)
Eleventh to Twelfth (doc3, doc2)
Fourteenth (doc8)
Firestore does not offer (actually, no longer offers) client-side queries that let you jump to an offset within the query results. You have to start from the beginning, and manually skip each result that you're not interested in, keeping track of the current index as you go. Yes, this will cost you excess document reads, but you don't have an alternative if you're not able to assign index values of your own for each document.
If you want to perform the query on a backend, you have offset() available, but the documents skipped are still counted as reads, and is neither an efficient or cheap way of skipping unwanted results.

Is there mongodb query which will insert document if field is unique otherwise execute custom function

Trying to create an activation code which should be unique, but it only consists of specific characters.
So, this is solution which i build
function findByActivationId() {
return Activation
.findOne({activationId})
.lean()
.exec();
}
let activationId = buildActivationId();
while (await findByActivationId(activationId)) {
activationId = buildActivationId();
}
This makes too many db calls, is there any better way to make query to mongodb?
Well, the major problem of checking if key is unique is based on how you are creating those.
Choose the best way for you to avoid bunch of problems later.
Your own generated string as a key
Well, you can do this but it's important to understand few disclaimers
If you want to generate your own key by the code and then compare if it is unique
in the database with all other currently created it can be done. Just create key by your
algorithm then select all keys from db and check if array of selected rows contains this freshly created string
Problems of this solution
As we can see we need to select all keys from DB and then compare each one to freshly created one. Problem can appear when your database is storing big amount of data. Every time application have to "download" big amount of data and then compare it to new one so in addition this might produce some freezes.
But if you are sure that your database will store not that much amount of unique rows, it is cool to work with.
Then it is important to create those keys properly. Now we talking about complexity, more symbols key is created from, harder to get same ones.
Shall we take a look at this example?
If you are creating keys based on letters a-z and numbers 1-9
and the length of key is for example 5, the complexity of this key is 35^5
which generates more than 52 milions possibilities.
Same keys can be generated but it is like a win on a lottery, almost impossible
And then you can just check if generated key is really unique, if not. (oh cmon) Repeat.
Other ways
Use mongodb _id which is always unique
Use UNIX timestamp to create unique key

Firebase Firestore - Filter data with multiple 'array-contains'

I am struggling to find good material on best practices for filtering data using firebase firestore. I want to filter my data based on the categories selected by the user. I have a collection of documents stored on my firestore database and each document have an array which has all the appropriate categories for that single document. For the sake of filtering, I'm keeping a local array with a user's preferred categories as well. All I want to do is to filter the data based on the user's preferred categories.
firestore categories field
consider I have the user's preferred categories stored as an array of strings ( ["Film", "Music"] ) .I was planning on using firestore's 'array-contains' method like
db.collection(collectioname)
.where('categoriesArray', 'array-contains', ["Film", "Music"])
Later I found out that I can't use 'array-contains' against an array itself and after investigating on this issue, I decided to change my data structure as mentioned here.
categories changed to Map
Once I changed the categories from an array to map, I thought I could use multiple where conditions to filter the documents
let query = db.collection(collectionName)
.where(somefield, '==', true)
this.props.data.filterCategories.forEach((val) => {
query = query.where(`categories.${val}`, '==', true);
});
query = query
.orderBy(someOtherField, "desc")
.limit(itemsPerPage)
const snapshot = await query.get()
Now problem number 2, firebase requires to add indexes for compound queries. The categories I have saved within each document is dynamic and there's no way I can add these indexes in advance. What would be the ideal solution in such cases? Any help would be deeply appreciated.
This is a new feature of Firebase JavaScript SDK launched at November 7, 2019:
Version 7.3.0 - November 7, 2019
array-contains-any
"array-contains-any operator to combine up to 10 array-contains clauses on the same field with a logical OR. An array-contains-any query returns documents where the given field is an array that contains one or more of the comparison values"
citiesRef.where('regions', 'array-contains-any',
['west_coast', 'east_coast']);
Instead of iterating through each category that you wish to query and appending clauses to a single query object, each iteration should be its own independent query. And you can keep the categories in an array.
<document>
- itemId: abc123
- categories: [film, music, television]
If you wish to perform an OR query, you would make n-loops where each loop would query for documents where array-contains that category. Then on your end, you would dedup (remove duplicates) from the results based on the item's identifier. So if you wanted to query film or music, you would make 2 loops where the first iteration queried documents where array-contains film and the second loop queried documents where array-contains music. The results would be placed into the same collection and then you would simply remove all duplicates with the same itemId.
This also does not pose a problem with the composite-index limit because categories is a static field. The real problem comes with pagination because you would need to keep a record of all fetched itemId in case a future page of results returns an item that was already fetched and this would create an O(N^2) scenario (more on big-o notation: https://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/). And because you're deduping locally, pagination blocks as the user sees them are not guaranteed to be even. If each pagination block is set to 25 documents, for example, some pages may end up displaying 24, some 21, others 14, depending on how many duplicates were removed from each block.
Are you planning on retrieving documents with the exact category array? Say, your user preference is listed as ["Film", "Music"]. Do you wish to retrieve only those documents with Film AND Music, or do you wish to retrieve documents having Film OR music?
If it's the latter, then maybe you can query for all documents with "Film" and then query for all documents with "Music", then merge it. However, the drawback here is some redundant document reads, when such document has both "Film" and "Music" in the categoryArray field.
You can also explore using Algolia to enable full-text search. In this case, you'd probably store the category list as a string maybe separated by commas, then update the whole string when the user changes their preferences.
For the former case, I have not come across sa workable solution other than maybe storing it as a concatenated string in alphabetical order? Others might have a more solid solution than mine.
Hope this helps!
Your query includes an orderBy clause. This, in combination with any equality filter, requires that you create an index to support that query. There is no way to avoid this.
If you remove the orderBy, you will be able to have flexible, dynamic filters for equality using the map properties in the document. This is the only way you will be able to have a dynamic filter without creating an index. This of course means that you will have to order and page the query results on the client.

Use _.filter on object created by _.indexBy

I started off with an array of objects and used _.filter to filter down on some search criteria and _.findWhere to select out asingle object based on ID.
Unfortunately the amount of data has increased so much so that it's much more efficient to use _.indexBy to index by ID so I can just do data[ID] = id for the _.findWhere's.
However I am stumped on how to replace the _.filter method without looping through all the keys in data.
Is there a better way?!
Edit
The IDs are always unique.
I can't show any real data as it is sensitive but the structure is
data = {
1: {id: 1, name: 'data1', date: 20/1/2016}
2: {id: 2, name: 'data2', date: 21/1/2016},
3: {....
}
and I need to something like:
var recentData = _.filter(data, function(d){d.date > 1/1/2016});
To get an array of data or ids.
(n.b. the dates are all in epoch times)
This is really an optimization question, rather than simply which function to use.
One thing to go about this would be if we could rely on sort order of the whole collection. If it's already sorted, you go with something like binary search to find the border elements of your date range and then splice everything from this point. (side note: array would probably work better for this).
If the array is not sorted you could also consider sorting it first on your end - but that makes sense only if you need to retrieve such information several times from the same dataset.
But if all you got is just the data, unsorted and you need to pick all elements starting from a certain date - no other way that iterate through it all with something like _.filter().
Alternatively you could revert back to the source of your data and check whether you can improve the results that way - if you're using some kind of API, maybe there are extra params for sorting or narrowing down the date selection (generally speaking database engines are really efficient at what you're trying to do)? Or if you're using a huge static JSON as the source - maybe consider improving that source object with sort order?
Just some ideas. Hard to give you the best resolution without knowing all the story behind what you're trying to do.

Performance of an ordered list in Firebase

If I want to maintain an ordered list in Firebase, it seems like the best way to do it is to manually assign a priority to each item in my list. That means if I insert or remove an item from the list, I have to update the priorities of all the items following it. For an item at the beginning of the list, this means updating every item in the list. Is there a better performing data structure or algorithm to use in this case?
You can create an ordered list by setting the priority of elements appropriately. Items in a list are ordered lexigraphically by priority, or if the priority can be parsed to a number, by numeric value.
If you want to insert items into the middle of an existing list, modifying the priorities of the existing items would work, but would be horribly inefficient. A better approach is just to pick a priority between the two items where you want to insert the value and set that priority for the new item.
For example, if you had element 1 with priority "a", and element 2 with priority "b", you could insert element 3 between the two with priority "aa" (or "aq", "az", etc).
In our experience, most times when you create an ordered list, you don't necessarily know the position in the list you want to insert the item beforehand. For example, if you're creating a Leader Board for a game, you don't know in advance that you want to place a new score 3rd in the list, rather you know you want to insert it at whatever position score 10000 gets you (which might happen to be third). In this case, simply setting the priority to the score will accomplish this. See our Leader Board example here:
https://www.firebase.com/tutorial/#example-leaderboard
The Ruby gem ranked_model has an interesting approach to this problem. It uses a position integer like many other "acts as list" implementations, but it doesn't rely on re-writing all the integers on each position move. Instead, it spaces the integers widely apart, and so each update may only affect one or two rows. Might be worth looking through the readme and code to see if this approach could fit here.

Categories