How to get the second 10 document in a Firestore Query? - javascript

More precisely, a slice of the ordered documents. My idea would be this, but it isn't good:
firestore().collection("queue").orderBy("order_id", "asc").limit(3,5)
I'd be grateful if anyone could answer it.

Best Practice
"Do not use offsets. Instead, use cursors. Using an offset only avoids returning the skipped documents to your application, but these documents are still retrieved internally. The skipped documents affect the latency of the query, and your application is billed for the read operations required to retrieve them."

Firestore does not offer offset-based query results for web and mobile clients, as they are inefficient and costly on your bill. If you want to implement pagination in your app, you should follow the linked documentation and design your app accordingly. This will get you the ability to jump forward and backward in query results, but not to a specific index or offset without first reading everything up to that offset (which is the expensive part that Firestore is suggesting you should not do).

Related

What is the best way to reduce firestore document reads in chat app on page load?

I have a chat app that is charging me a large number of reads for each page load, 1 for each message to show. I'm trying to figure out a way to reduce that number and optimize for cost, as refreshing the page a few times costs hundreds of reads.
The firestore pricing documentation says
For queries other than document
reads, such as a request for a list of collection IDs, you are billed
for one document read. If fetching the complete set of results
requires more than one request (for example, if you are using
pagination), you are billed once per request.
I considered that maybe if I fetch an entire collection without a query like shown here in the docs, a cost difference might be remotely possible. I'm sure that's probably wrong, but I can't find anything specifying what the exceptions are that only cost 1 read. It also crossed my mind to create an array to hold the most recent messages in the parent document of the collection, but the security rules for updating that array seem overly complex and not practical. I also read about using the firebase cache, but that doesn't seem useful here.
Here is code to demonstrate how I'm currently loading messages. I'm using the react-firebase-hooks library to snapshot this data with useCollectionData:
const q = query(messagesRef, orderBy("createdAt", "desc"), limit(100))
const [messages] = useCollectionData(q)
In researching, I found this question where I'm pretty sure the accepted answer is wrong. It did make me question the rules. Are there any strategies to reduce the number of reads for this common use case?
Pagination still incurs charges on a per-document read, right?
Yes, it does, but only when you load more pages.
I'm not trying to load the entire collection, but rather wondering if loading the collection without a query has a different cost than with.
Loading a collection without a query that is limiting the results, means that you're reading the entire collection. And, yes, the cost will be much higher if you're not using a query. Remember, that the cost of reading a collection/query in Firestore is equal to the number of documents that are actually returned. For example, if you have a collection of 1 million documents, and your query returns 100, you'll have to pay only 100 document reads.
I'm overall trying to figure out if there's a strategy that can improve the read cost of the example query I gave.
No. If you need to get the newest 100 messages, that's the best query you can have. The only change you can make to decrease the number of reads would be to change the value that you pass to the limit() function. And maybe it makes sense since a user might not be interested in reading 100 messages at once. Always try to display data that fits into a screen, and load any other data progressively.

Get size of the query in Firestore

How would one get whole query size efficiently from the Firestore collection which has thousands of documents?
In my case I query documents by few different rules:
Start date
End date
Place id
Keywords
Then I limit the query to show only 50 records but I would need to get the size of the query without this limitations since that way pagination would show correctly in the front end.
I could use cloud function which makes the same query as earlier but without limit and then get size of it, but is there more efficient way of doing this? Query size could be thousands of documents so is there any performance issues by doing it this way? And how does the billing work on this kind of situation?
If ie. My query is 1500 documents is there going to be 1500 read operations to get the size of this query?
There has been other topics which recommends using counters to get size of the collection, but this does not suit my approach since the size depends on user's search parameters stated above.
All recommendations for this problem are welcome!
If you have in one collection thousands of documents it might be possible to need to update a counter very often. In Cloud Firestore, you can only update a single document about once per second, which might be too low for some high-traffic applications.
Query size could be thousands of documents so is there any performance issues by doing it this way?
No, it won't. According to the official documentation regarding Firestore counter, you can use distributed counters:
To support more frequent counter updates, create a distributed counter. Each counter is a document with a subcollection of "shards," and the value of the counter is the sum of the value of the shards.
This practice can help you achieve what you want.
And how does the billing work on this kind of situation?
In case you want to read the entire collection at once, you'll be billed with a read operation for each document read.
My query is 1500 documents is there going to be 1500 read operations to get the size of this query?
If you are looping the entire collection to get the number of documents, yes.
For more details about storing counters, please see the last part of my answer from this post:
As a personal hint, don't store this kind of counters in Cloud Firestore, because every time you increase or decrease the counter will cost you a read or a write operation. Host this counter in Firebase Realtime database at no cost.

How to build 'real time visitors on site', Google API?

I'm building a custom admin dashboard for users on our site who create posts. I want to show them the active amount of visitors on their posts only (not on the entire site).
I want it to act exactly like GA does it:
I was originally thinking of building this from scratch, but in retrospect it might be easier to use the GA API?
I've stared at the docs for forever and I'm just not groking it, so I'm coming here for help.
We have ~5,000 posts total, and I some people on our site have authored over 1000 posts, so the 'input' to GA will be anywhere from 1 to 1000+ slugs (for only their posts).
I want a combined amount of on-site traffic for their posts only.
Optionally, maybe it would have to be reversed... I'm not sure if GA can show it, but even better probably would be to get a content breakdown of the realtime visitors from the API, with 5000 max results. From there I can filter through the result set slugs (along with then number of users on each), and compare those results to each slug which belongs to that user, then just sum the totals on my end.
Is this something the Google API could help me with? which API endpoint would I need to use? Is it possible to have 5000+ max results for URLs with traffic on them from the API?
Thanks!
Yes, it is possible.
It seems that you should utilize Real Time Data: get endpoint.
Additionally, to limit results for specific pages (posts) only, you should use dimension filters (filters which will select only specific page views before calculating aggregated result), and 'ga:pagePath' looks like the one you need:
ga:pagePath
UI Name: Page
A page on your website specified by path and/or query parameters. Use in conjunction with hostname to get the full URL of the page.
Source
You might prefer using ga:pageTitle instead, if you have similar title for posts of a single author, and you haven't got common path elements in posts of the same author.
So you do something like:
GET https://www.googleapis.com/analytics/v3/data/realtime
ids=ga:<your_analytics_id>
metrics=rt:activeUsers
dimensions=rt:pagePath
filters=rt:pagePath=~/authors/123/*
Please notice that there maybe slight difference in real time and non-realtime API (e.g. use of 'rt' instead of 'ga' above), and generally realtime-API is still in beta.
Generally speaking, you should go here: Real Time Reporting API - Developer Guide and look through the links in the table of contents (left part of the page).
What about 'building from scratch' idea: it's rather simple from the developer's perspective, but it could be complex from the dev-ops perspective. I.e., it's not a problem to write code which would aggregate such metrics. But it could be a problem to make a system which will sustain required for that task amount of requests per second.
I think you will want to apply your second approach: pull down realtime visitors for all slugs and then aggregate by author on your own server.
There is a quota of 10,000 queries per profile per day. Using your first approach, it sounds like you would be performing a query for each author. Say you have 50 authors. This would leave you only 20 queries/day/author (10,000/50). Factoring in the time dimension, this would allow you only 8.33 (200/24) queries per hour for each author. Not very "realtime like".
If you have problems getting it going, check out http://www.embeddedanalytics.com - we have done many implementations such as this. In fact, we even have hat "Right Now" realtime widget.
Is there a way to determine the author based on the slug title?

Meteor.js - Should you denormalize data?

This question has been driving me crazy and I can't get my head around it. I come from a MySQL relational background and have been using Meteorjs and Mongo. For the purposes of this question take the example of posts and authors. One Author to Many Posts. I have come up with two ways in which to do this:
Have a single collection of posts - Each post has the author information embedded into the document. This of course leads to denormalization and issues such as if the author name changes how do you keep the data correct.
Have two collections: posts and authors - Each post has an author ID which references the authors collection. I then attempt to do a "join" on a non relational database while trying to maintain reactivity.
It seems to me with MongoDB degrees of denormalization is acceptable and I am tempted to embed as implementing joins really does feel like going against the ideals of Mongo.
Can anyone shed any light on what is the right approach especially in terms of wanting my app data to scale well and be manageable?
Thanks
Denormalisation is useful when you're scaling your application and you notice that some queries are taking too much time to complete. I also noticed that most Mongodb developers tend to forget about data normalisation but that's another topic.
Some developers say things like: "Don't use observe and observeChanges because it's slow". We're building real-time applications so that a normal thing to happen, it's a CPU intensive app design.
In my opinion, you should always aim for a normalised database design and then you have to decide, try and test which fields, that duplicated/denormalised, could improve your app's performance. Example: You remove 1 query per user. The UI need an extra field and it's fast to duplicated it, etc.
With the denormalisation you've an extra price to pay. You've to update the denormalised fields according to the main collection.
Example:
Let's say that you Authors and Articles collections. On each article you have the author name. The author might change his name. With a normalised scenario, it works fine. With a denormalised scenario you have to update the Author document name AND every single article, owned by this author, with the new name.
Keeping a normalised design makes you life easier but denormalisation, eventually, becomes necessary.
From a MeteorJs perspective: With the normalised scenario you're sending data from 2 Collections to the client. With the denormalised scenario, you only send 1 collection. You can also reactively join on the server and send 1 collection to the client, although it increases the RAM usage because of MergeBox on the server.
Denormalisation is something that it's very specify for you application needs. You can use Kadira to find ways of making your application faster. The database design is only 1 factor out of many that you play with when trying to improve performance.

Is it possible to control order of replication?

I have a huge master CouchDB database and slave read-only CouchDB database, that synchronizes with master database.
Because rate of changes is quick, and channel between servers is slow and unstable, I want to set order/priority to define what documents come first. I need to ensure that the documents with highest priority are definitely of the latest version, and I can ignore documents in the end of list.
SORTING, not FILTERING
If it is not possible, what solution could be?
Resource I have already looked at:
http://wiki.apache.org/couchdb/Replication
http://couchapp.org/page/index
UPDATE: the master database is actually Node.js NPM registry, and order is list of Most Depended-upon Packages. I am trying to make proxy, because cloning 50G always fails after a while. But the fact is "we don't need 90% of those modules, but quick & reliable access to those we depend on."
The short answer is no.
The long answer is that CouchDB provides ACID guarantees at the individual document level only, by design. The replicator will update each document atomically when it replicates (as can anyone, the replicator is just using the public API) but does not guarantee ordering, this is mostly because it uses multiple http connections to improve throughput. You can configure that down to 1 if you like and you'll get better ordering, but it's not a panacea.
After the bigcouch merge, all bets are off, there will be multiple sources and multiple targets with no imposed total order.
CouchDB, out of the box, does not provide you with any options to control the order of replication. I'm guessing you could piece something together if you keep documents with different priorities in different databases on the master, though. Then, you could replicate the high-priority master database into the slave database first, replicate lower-priority databases after that, etc.
You could set up filtered replication or named document replication:
http://wiki.apache.org/couchdb/Replication#Filtered_Replication
http://wiki.apache.org/couchdb/Replication#Named_Document_Replication
Both of these are alternatives to replicating an entire database. You could do the replication in smaller batch sizes, and order the batches to match your priorities.

Categories