Why does this firestore query require an index? - javascript

I have a query with a where() method with an equality operator and then an orderBy() method and I can't figure out why it requires an index.
The where method checks for a value in an object (a map) and the order by is with a number.
The documentation says
If you have a filter with a range comparison (<, <=, >, >=), your first ordering must be on the same field
So I would have thought that an equality filter would be fine.
Here is my query code:
this.afs.collection('posts').ref
.where('tags.' + this.courseID,'==',true)
.orderBy("votes")
.limit(5)
.get().then(snap => {
snap.forEach(doc => {
console.log(doc.data());
});
});
Here is an example of the database structure

Why does this Firestore query require an index?
As you probably noticed, queries in Cloud Firestore are very fast and this is because Firestore automatically creates an index for any field you have in your document. So when you simply filter with a range comparison, Firestore creates the required index automatically. If you also try to order your results, another index is required. This kind of index is not created automatically. You should create it yourself. This can be done, by creating it manually in your Firebase Console or you'll find in your logs a message that sounds like this:
FAILED_PRECONDITION: The query requires an index. You can create it here: ...
You can simply click on that link or copy and paste the URL into a web browser and your index will be created automatically.
So Firestore requires an index so you can have very fast queries.

An index is simply a database inventory or a record of what is where. And each index is a specific inventory of a specific thing—for example, how many propertyX fields exist in a collection and what their values are, sorted (the fact that they are sorted is critical).
If this inventory didn't exist, to query for documents where propertyX is someValue, the machine would have to iterate over the entire collection to determine (1) which documents contain propertyX and (2) which documents contain propertyX equal to someValue. By keeping an inventory (or index) of queried properties, when a query is performed on propertyX, the machine can go straight to the propertyX index and gather the locations of all the documents that equal someValue and then fetch those documents from the collection and return them. Not only does the machine not need to touch the collection to know where the documents are but it doesn't even need to iterate over the entire index because it's always in order.
Indexes are why collection sizes have no impact on the performance of Firestore queries and why we only need to index properties that are ever queried.

Related

react-native-firebase/firestore: query limit from result x to y [duplicate]

I would like to create two queries, with pagination option. On the first one I would like to get the first ten records and the second one I would like to get the other all records:
.startAt(0)
.limit(10)
.startAt(9)
.limit(null)
Can anyone confirm that above code is correct for both condition?
Firestore does not support index or offset based pagination. Your query will not work with these values.
Please read the documentation on pagination carefully. Pagination requires that you provide a document reference (or field values in that document) that defines the next page to query. This means that your pagination will typically start at the beginning of the query results, then progress through them using the last document you see in the prior page.
From CollectionReference:
offset(offset) → {Query}
Specifies the offset of the returned results.
As Doug mentioned, Firestore does not support Index/offset - BUT you can get similar effects using combinations of what it does support.
Firestore has it's own internal sort order (usually the document.id), but any query can be sorted .orderBy(), and the first document will be relative to that sorting - only an orderBy() query has a real concept of a "0" position.
Firestore also allows you to limit the number of documents returned .limit(n)
.endAt(), .endBefore(), .startAt(), .startBefore() all need either an object of the same fields as the orderBy, or a DocumentSnapshot - NOT an index
what I would do is create a Query:
const MyOrderedQuery = FirebaseInstance.collection().orderBy()
Then first execute
MyOrderedQuery.limit(n).get()
or
MyOrderedQuery.limit(n).get().onSnapshot()
which will return one way or the other a QuerySnapshot, which will contain an array of the DocumentSnapshots. Let's save that array
let ArrayOfDocumentSnapshots = QuerySnapshot.docs;
Warning Will Robinson! javascript settings is usually by reference,
and even with spread operator pretty shallow - make sure your code actually
copies the full deep structure or that the reference is kept around!
Then to get the "rest" of the documents as you ask above, I would do:
MyOrderedQuery.startAfter(ArrayOfDocumentSnapshots[n-1]).get()
or
MyOrderedQuery.startAfter(ArrayOfDocumentSnapshots[n-1]).onSnapshot()
which will start AFTER the last returned document snapshot of the FIRST query. Note the re-use of the MyOrderedQuery
You can get something like a "pagination" by saving the ordered Query as above, then repeatedly use the returned Snapshot and the original query
MyOrderedQuery.startAfter(ArrayOfDocumentSnapshots[n-1]).limit(n).get() // page forward
MyOrderedQuery.endBefore(ArrayOfDocumentSnapshots[0]).limit(n).get() // page back
This does make your state management more complex - you have to hold onto the ordered Query, and the last returned QuerySnapshot - but hey, now you're paginating.
BIG NOTE
This is not terribly efficient - setting up a listener is fairly "expensive" for Firestore, so you don't want to do it often. Depending on your document size(s), you may want to "listen" to larger sections of your collections, and handle more of the paging locally (Redux or whatever) - Firestore Documentation indicates you want your listeners around at least 30 seconds for efficiency. For some applications, even pages of 10 can be efficient; for others you may need 500 or more stored locally and paged in smaller chucks.

Firestore query either by phrase/string in document ID or contents of array in document

Trying to find a way to find all documents in a collection, whose ID contains a certain string or by comparing their array with another array.
Basically I have a DB structure of products like:
computers_laptops_macbook-pro: { ...someData, categories: ["computers", "laptops"] }
And I am sending to my Cloud Function an array of categories: ["computers", "laptops"].
Is there a way I can either search for all documents whose IDs containt computers_laptops or whose categories contain both "computers" and "laptops"?
I tried await marketplace_products_ref.where("categories", "array-contains", categories_array).get() but doesnt find anything, even tho it should (if I am understanding its purpose correctly).
I suggest reviewing the documentation for array membership queries. array-contains only finds a single value within an array. If you have a multiple values to check for, you should instead use an array-contains-any query instead.
marketplace_products_ref
.where("categories", "array-contains-any", categories_array)

Mongodb: using min function on collection with only default index (_id)

I want to retrieve the last 20 documents in my large collection in an efficient manner.
This SO post offered this performant solution - but it does not answer my question because my question is specifically dealing with _id index - :
db.collectionName.find().min(minCriteria).hint(yourIndex).limit(N)
However, my collection just contains the default index (_id). I'm just not sure what min criteria would be - I obviously don't want to hardcode an _id value, as the collection is periodically emptied.
itemsCollection.find().min(<minCriteria>).hint({_id:1}).limit(20)
Is there any way to use min with the _id index? Or is my only option creating a new index?
Yes, you can use min with the _id index, as long as your <minCriteria> only reference the _id field.
If your min criteria is on something other than _id, you will need to create an index on that criteria in order to avoid this query being a full collection scan.
The min() cursor method is for establishing a lower bound for the index scan that will service the query. This is probably not what you are looking for to retrieve the most recently added documents.
Assuming each document's _id field contains an ObjectId or some other value that sorts in the order they were inserted, then you can, as noted in the comments, do a reverse sort on _id and limit to the number of documents desired, which can be very efficient.
This query should automatically use the _id index:
db.itemsCollection.find().sort({_id:-1}).limit(20)
The date part of the ObjectId is determined by the system creating the value, which in some cases is a client/application server. This means that clock drift may affect the ordering.
If you want to get the documents that were most recently inserted into the collection, you can use natural order:
db.itemsCollection.find().sort({$natural:-1}).limit(20)
This doesn't use an index, but it should still be fairly performant because it will only scan the number of documents you want to return.

Firebase Firestore - Filter data with multiple 'array-contains'

I am struggling to find good material on best practices for filtering data using firebase firestore. I want to filter my data based on the categories selected by the user. I have a collection of documents stored on my firestore database and each document have an array which has all the appropriate categories for that single document. For the sake of filtering, I'm keeping a local array with a user's preferred categories as well. All I want to do is to filter the data based on the user's preferred categories.
firestore categories field
consider I have the user's preferred categories stored as an array of strings ( ["Film", "Music"] ) .I was planning on using firestore's 'array-contains' method like
db.collection(collectioname)
.where('categoriesArray', 'array-contains', ["Film", "Music"])
Later I found out that I can't use 'array-contains' against an array itself and after investigating on this issue, I decided to change my data structure as mentioned here.
categories changed to Map
Once I changed the categories from an array to map, I thought I could use multiple where conditions to filter the documents
let query = db.collection(collectionName)
.where(somefield, '==', true)
this.props.data.filterCategories.forEach((val) => {
query = query.where(`categories.${val}`, '==', true);
});
query = query
.orderBy(someOtherField, "desc")
.limit(itemsPerPage)
const snapshot = await query.get()
Now problem number 2, firebase requires to add indexes for compound queries. The categories I have saved within each document is dynamic and there's no way I can add these indexes in advance. What would be the ideal solution in such cases? Any help would be deeply appreciated.
This is a new feature of Firebase JavaScript SDK launched at November 7, 2019:
Version 7.3.0 - November 7, 2019
array-contains-any
"array-contains-any operator to combine up to 10 array-contains clauses on the same field with a logical OR. An array-contains-any query returns documents where the given field is an array that contains one or more of the comparison values"
citiesRef.where('regions', 'array-contains-any',
['west_coast', 'east_coast']);
Instead of iterating through each category that you wish to query and appending clauses to a single query object, each iteration should be its own independent query. And you can keep the categories in an array.
<document>
- itemId: abc123
- categories: [film, music, television]
If you wish to perform an OR query, you would make n-loops where each loop would query for documents where array-contains that category. Then on your end, you would dedup (remove duplicates) from the results based on the item's identifier. So if you wanted to query film or music, you would make 2 loops where the first iteration queried documents where array-contains film and the second loop queried documents where array-contains music. The results would be placed into the same collection and then you would simply remove all duplicates with the same itemId.
This also does not pose a problem with the composite-index limit because categories is a static field. The real problem comes with pagination because you would need to keep a record of all fetched itemId in case a future page of results returns an item that was already fetched and this would create an O(N^2) scenario (more on big-o notation: https://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/). And because you're deduping locally, pagination blocks as the user sees them are not guaranteed to be even. If each pagination block is set to 25 documents, for example, some pages may end up displaying 24, some 21, others 14, depending on how many duplicates were removed from each block.
Are you planning on retrieving documents with the exact category array? Say, your user preference is listed as ["Film", "Music"]. Do you wish to retrieve only those documents with Film AND Music, or do you wish to retrieve documents having Film OR music?
If it's the latter, then maybe you can query for all documents with "Film" and then query for all documents with "Music", then merge it. However, the drawback here is some redundant document reads, when such document has both "Film" and "Music" in the categoryArray field.
You can also explore using Algolia to enable full-text search. In this case, you'd probably store the category list as a string maybe separated by commas, then update the whole string when the user changes their preferences.
For the former case, I have not come across sa workable solution other than maybe storing it as a concatenated string in alphabetical order? Others might have a more solid solution than mine.
Hope this helps!
Your query includes an orderBy clause. This, in combination with any equality filter, requires that you create an index to support that query. There is no way to avoid this.
If you remove the orderBy, you will be able to have flexible, dynamic filters for equality using the map properties in the document. This is the only way you will be able to have a dynamic filter without creating an index. This of course means that you will have to order and page the query results on the client.

What's the fastest way to check for the existence of a mongodb doc?

What's the fastest way to check for the existence of a mongodb doc?
Should I just use find and if it returns nothing?
EDIT:
collection.findOne {#attribute}, (err, doc) ->
if err then console.log err
if interaction
#exists
else
#does not
If you are just testing for a single document, use findOne (or the equivalent in your driver); most drivers implement this in the most efficient possible way (by setting a negative limit of 1 on the request, which asks mongo to return immediately after finding one document, even if more might match, and not to create a cursor that won't ever be used by the client).
If you have an index that can serve your query, you can use field selection to select (a subset of) the fields in the index; this will make use of Mongo's "covered index" functionality to avoid a lookup to the underlying collection data. Be sure to set {_id: 0} in your field selector unless _id is in your index.
Covered Index is what you want. If the info you know about the document is indexed, then you can use this facility to query and retrieve info from the index only (in RAM) and will not go to disk to get the reference document. Explained in the mongo docs here.
http://www.mongodb.org/display/DOCS/mongo%20wire%20protocol#MongoWireProtocol-OPQUERY
numberToReturn : Limits the number of documents in the first CONTRIB:OP_REPLY message to the query. However, the database will still establish a cursor and return the cursorID to the client if there are more results than numberToReturn. If the client driver offers 'limit' functionality (like the SQL LIMIT keyword), then it is up to the client driver to ensure that no more than the specified number of document are returned to the calling application. If numberToReturn is 0, the db will used the default return size. If the number is negative, then the database will return that number and close the cursor. No futher results for that query can be fetched. If numberToReturn is 1 the server will treat it as -1 (closing the cursor automatically).

Categories