I want to retrieve the last 20 documents in my large collection in an efficient manner.
This SO post offered this performant solution - but it does not answer my question because my question is specifically dealing with _id index - :
db.collectionName.find().min(minCriteria).hint(yourIndex).limit(N)
However, my collection just contains the default index (_id). I'm just not sure what min criteria would be - I obviously don't want to hardcode an _id value, as the collection is periodically emptied.
itemsCollection.find().min(<minCriteria>).hint({_id:1}).limit(20)
Is there any way to use min with the _id index? Or is my only option creating a new index?
Yes, you can use min with the _id index, as long as your <minCriteria> only reference the _id field.
If your min criteria is on something other than _id, you will need to create an index on that criteria in order to avoid this query being a full collection scan.
The min() cursor method is for establishing a lower bound for the index scan that will service the query. This is probably not what you are looking for to retrieve the most recently added documents.
Assuming each document's _id field contains an ObjectId or some other value that sorts in the order they were inserted, then you can, as noted in the comments, do a reverse sort on _id and limit to the number of documents desired, which can be very efficient.
This query should automatically use the _id index:
db.itemsCollection.find().sort({_id:-1}).limit(20)
The date part of the ObjectId is determined by the system creating the value, which in some cases is a client/application server. This means that clock drift may affect the ordering.
If you want to get the documents that were most recently inserted into the collection, you can use natural order:
db.itemsCollection.find().sort({$natural:-1}).limit(20)
This doesn't use an index, but it should still be fairly performant because it will only scan the number of documents you want to return.
Related
I have a set of fields with variable match and sort criteria, i.e. a Smart Query. I run an aggregation that matches and sorts, but I'd like to implement a cursor style pagination. For my normal lists I can use a greater than the order field of a ObjectID passed into my API, but Smart Queries won't have a hard coded order field.
How can I use an ObjectID to get the next (n) number of items in a list that I've already matched and sorted. Using javascript and mongoose here's some general code for my current query
const plist = await Media.aggregate([
...smartMatch,
...smartSort,
//somehow match and limit next (n) items after ObjectID,
]).allowDiskUse(true);
Smart Match and Sort being an array of operations defined by the user.
If I have a collection, and collection contain several documents.
Every document contain field age
The id of collection is data
There are 20 documents.
db.collection('data').orderBy('age')
My question is:
How to get certain range of that documents which is already order by age.
Example:
After order by age
[doc5, doc12, doc9, doc4, doc1, doc15, doc7, doc14, doc11, doc17,
doc3, doc2, doc13, doc8, doc6, doc18, doc20, doc16, doc19, doc10]
Fourth to Sixth (doc4, doc1, doc15)
Eleventh to Twelfth (doc3, doc2)
Fourteenth (doc8)
Firestore does not offer (actually, no longer offers) client-side queries that let you jump to an offset within the query results. You have to start from the beginning, and manually skip each result that you're not interested in, keeping track of the current index as you go. Yes, this will cost you excess document reads, but you don't have an alternative if you're not able to assign index values of your own for each document.
If you want to perform the query on a backend, you have offset() available, but the documents skipped are still counted as reads, and is neither an efficient or cheap way of skipping unwanted results.
I am struggling to find good material on best practices for filtering data using firebase firestore. I want to filter my data based on the categories selected by the user. I have a collection of documents stored on my firestore database and each document have an array which has all the appropriate categories for that single document. For the sake of filtering, I'm keeping a local array with a user's preferred categories as well. All I want to do is to filter the data based on the user's preferred categories.
firestore categories field
consider I have the user's preferred categories stored as an array of strings ( ["Film", "Music"] ) .I was planning on using firestore's 'array-contains' method like
db.collection(collectioname)
.where('categoriesArray', 'array-contains', ["Film", "Music"])
Later I found out that I can't use 'array-contains' against an array itself and after investigating on this issue, I decided to change my data structure as mentioned here.
categories changed to Map
Once I changed the categories from an array to map, I thought I could use multiple where conditions to filter the documents
let query = db.collection(collectionName)
.where(somefield, '==', true)
this.props.data.filterCategories.forEach((val) => {
query = query.where(`categories.${val}`, '==', true);
});
query = query
.orderBy(someOtherField, "desc")
.limit(itemsPerPage)
const snapshot = await query.get()
Now problem number 2, firebase requires to add indexes for compound queries. The categories I have saved within each document is dynamic and there's no way I can add these indexes in advance. What would be the ideal solution in such cases? Any help would be deeply appreciated.
This is a new feature of Firebase JavaScript SDK launched at November 7, 2019:
Version 7.3.0 - November 7, 2019
array-contains-any
"array-contains-any operator to combine up to 10 array-contains clauses on the same field with a logical OR. An array-contains-any query returns documents where the given field is an array that contains one or more of the comparison values"
citiesRef.where('regions', 'array-contains-any',
['west_coast', 'east_coast']);
Instead of iterating through each category that you wish to query and appending clauses to a single query object, each iteration should be its own independent query. And you can keep the categories in an array.
<document>
- itemId: abc123
- categories: [film, music, television]
If you wish to perform an OR query, you would make n-loops where each loop would query for documents where array-contains that category. Then on your end, you would dedup (remove duplicates) from the results based on the item's identifier. So if you wanted to query film or music, you would make 2 loops where the first iteration queried documents where array-contains film and the second loop queried documents where array-contains music. The results would be placed into the same collection and then you would simply remove all duplicates with the same itemId.
This also does not pose a problem with the composite-index limit because categories is a static field. The real problem comes with pagination because you would need to keep a record of all fetched itemId in case a future page of results returns an item that was already fetched and this would create an O(N^2) scenario (more on big-o notation: https://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/). And because you're deduping locally, pagination blocks as the user sees them are not guaranteed to be even. If each pagination block is set to 25 documents, for example, some pages may end up displaying 24, some 21, others 14, depending on how many duplicates were removed from each block.
Are you planning on retrieving documents with the exact category array? Say, your user preference is listed as ["Film", "Music"]. Do you wish to retrieve only those documents with Film AND Music, or do you wish to retrieve documents having Film OR music?
If it's the latter, then maybe you can query for all documents with "Film" and then query for all documents with "Music", then merge it. However, the drawback here is some redundant document reads, when such document has both "Film" and "Music" in the categoryArray field.
You can also explore using Algolia to enable full-text search. In this case, you'd probably store the category list as a string maybe separated by commas, then update the whole string when the user changes their preferences.
For the former case, I have not come across sa workable solution other than maybe storing it as a concatenated string in alphabetical order? Others might have a more solid solution than mine.
Hope this helps!
Your query includes an orderBy clause. This, in combination with any equality filter, requires that you create an index to support that query. There is no way to avoid this.
If you remove the orderBy, you will be able to have flexible, dynamic filters for equality using the map properties in the document. This is the only way you will be able to have a dynamic filter without creating an index. This of course means that you will have to order and page the query results on the client.
I have a query with a where() method with an equality operator and then an orderBy() method and I can't figure out why it requires an index.
The where method checks for a value in an object (a map) and the order by is with a number.
The documentation says
If you have a filter with a range comparison (<, <=, >, >=), your first ordering must be on the same field
So I would have thought that an equality filter would be fine.
Here is my query code:
this.afs.collection('posts').ref
.where('tags.' + this.courseID,'==',true)
.orderBy("votes")
.limit(5)
.get().then(snap => {
snap.forEach(doc => {
console.log(doc.data());
});
});
Here is an example of the database structure
Why does this Firestore query require an index?
As you probably noticed, queries in Cloud Firestore are very fast and this is because Firestore automatically creates an index for any field you have in your document. So when you simply filter with a range comparison, Firestore creates the required index automatically. If you also try to order your results, another index is required. This kind of index is not created automatically. You should create it yourself. This can be done, by creating it manually in your Firebase Console or you'll find in your logs a message that sounds like this:
FAILED_PRECONDITION: The query requires an index. You can create it here: ...
You can simply click on that link or copy and paste the URL into a web browser and your index will be created automatically.
So Firestore requires an index so you can have very fast queries.
An index is simply a database inventory or a record of what is where. And each index is a specific inventory of a specific thing—for example, how many propertyX fields exist in a collection and what their values are, sorted (the fact that they are sorted is critical).
If this inventory didn't exist, to query for documents where propertyX is someValue, the machine would have to iterate over the entire collection to determine (1) which documents contain propertyX and (2) which documents contain propertyX equal to someValue. By keeping an inventory (or index) of queried properties, when a query is performed on propertyX, the machine can go straight to the propertyX index and gather the locations of all the documents that equal someValue and then fetch those documents from the collection and return them. Not only does the machine not need to touch the collection to know where the documents are but it doesn't even need to iterate over the entire index because it's always in order.
Indexes are why collection sizes have no impact on the performance of Firestore queries and why we only need to index properties that are ever queried.
Let's say we have a large collection of students, and each student's height is stored.
Is there a way to obtain the 50th (or N-th) tallest student, without fetching all the students and locally sorting them by height, using mongoose?
I would post my attempts, but this is such a simple query to explain that I feel it would just make the question unnecessarily complicated.
You can use skip(n) to skip over the first n docs and use limit(m) to limit the results from there to m number of docs. Adding a sort determines the order of the docs.
So in this case it would be:
Students.find().sort({height: -1}).skip(49).limit(1).exec(function(err, students) {
...
});