What is the best way to reduce firestore document reads in chat app on page load?

What is the best way to reduce firestore document reads in chat app on page load? - javascript

I have a chat app that is charging me a large number of reads for each page load, 1 for each message to show. I'm trying to figure out a way to reduce that number and optimize for cost, as refreshing the page a few times costs hundreds of reads.
The firestore pricing documentation says
For queries other than document
reads, such as a request for a list of collection IDs, you are billed
for one document read. If fetching the complete set of results
requires more than one request (for example, if you are using
pagination), you are billed once per request.
I considered that maybe if I fetch an entire collection without a query like shown here in the docs, a cost difference might be remotely possible. I'm sure that's probably wrong, but I can't find anything specifying what the exceptions are that only cost 1 read. It also crossed my mind to create an array to hold the most recent messages in the parent document of the collection, but the security rules for updating that array seem overly complex and not practical. I also read about using the firebase cache, but that doesn't seem useful here.
Here is code to demonstrate how I'm currently loading messages. I'm using the react-firebase-hooks library to snapshot this data with useCollectionData:
const q = query(messagesRef, orderBy("createdAt", "desc"), limit(100))
const [messages] = useCollectionData(q)
In researching, I found this question where I'm pretty sure the accepted answer is wrong. It did make me question the rules. Are there any strategies to reduce the number of reads for this common use case?

Pagination still incurs charges on a per-document read, right?
Yes, it does, but only when you load more pages.
I'm not trying to load the entire collection, but rather wondering if loading the collection without a query has a different cost than with.
Loading a collection without a query that is limiting the results, means that you're reading the entire collection. And, yes, the cost will be much higher if you're not using a query. Remember, that the cost of reading a collection/query in Firestore is equal to the number of documents that are actually returned. For example, if you have a collection of 1 million documents, and your query returns 100, you'll have to pay only 100 document reads.
I'm overall trying to figure out if there's a strategy that can improve the read cost of the example query I gave.
No. If you need to get the newest 100 messages, that's the best query you can have. The only change you can make to decrease the number of reads would be to change the value that you pass to the limit() function. And maybe it makes sense since a user might not be interested in reading 100 messages at once. Always try to display data that fits into a screen, and load any other data progressively.

Related

How to get the second 10 document in a Firestore Query?

More precisely, a slice of the ordered documents. My idea would be this, but it isn't good:
firestore().collection("queue").orderBy("order_id", "asc").limit(3,5)
I'd be grateful if anyone could answer it.

Best Practice
"Do not use offsets. Instead, use cursors. Using an offset only avoids returning the skipped documents to your application, but these documents are still retrieved internally. The skipped documents affect the latency of the query, and your application is billed for the read operations required to retrieve them."

Firestore does not offer offset-based query results for web and mobile clients, as they are inefficient and costly on your bill. If you want to implement pagination in your app, you should follow the linked documentation and design your app accordingly. This will get you the ability to jump forward and backward in query results, but not to a specific index or offset without first reading everything up to that offset (which is the expensive part that Firestore is suggesting you should not do).

Get size of the query in Firestore

How would one get whole query size efficiently from the Firestore collection which has thousands of documents?
In my case I query documents by few different rules:
Start date
End date
Place id
Keywords
Then I limit the query to show only 50 records but I would need to get the size of the query without this limitations since that way pagination would show correctly in the front end.
I could use cloud function which makes the same query as earlier but without limit and then get size of it, but is there more efficient way of doing this? Query size could be thousands of documents so is there any performance issues by doing it this way? And how does the billing work on this kind of situation?
If ie. My query is 1500 documents is there going to be 1500 read operations to get the size of this query?
There has been other topics which recommends using counters to get size of the collection, but this does not suit my approach since the size depends on user's search parameters stated above.
All recommendations for this problem are welcome!

If you have in one collection thousands of documents it might be possible to need to update a counter very often. In Cloud Firestore, you can only update a single document about once per second, which might be too low for some high-traffic applications.
Query size could be thousands of documents so is there any performance issues by doing it this way?
No, it won't. According to the official documentation regarding Firestore counter, you can use distributed counters:
To support more frequent counter updates, create a distributed counter. Each counter is a document with a subcollection of "shards," and the value of the counter is the sum of the value of the shards.
This practice can help you achieve what you want.
And how does the billing work on this kind of situation?
In case you want to read the entire collection at once, you'll be billed with a read operation for each document read.
My query is 1500 documents is there going to be 1500 read operations to get the size of this query?
If you are looping the entire collection to get the number of documents, yes.
For more details about storing counters, please see the last part of my answer from this post:
As a personal hint, don't store this kind of counters in Cloud Firestore, because every time you increase or decrease the counter will cost you a read or a write operation. Host this counter in Firebase Realtime database at no cost.

Meteor, mongodb - canteen optimization

TL;DR:
I'm making an app for a canteen. I have a collection with the persons and a collection where I "log" every meat took. I need to know those who DIDN'T take the meal.
Long version:
I'm making an application for my local Red Cross.
I'm trying to optimize this situation:
there is a canteen at wich the helped people can take food at breakfast, lunch and supper. We need to know how many took the meal (and this is easy).
if they are present they HAVE TO take the meal and eat, so we need to know how many (and who) HAVEN'T eat (this is the part that I need to optimize).
When they take the meal the "cashier" insert their barcode, the program log the "transaction" in the log collection.
Actually, on creation of the template "canteen" I create a local collection "meals" and populate it with the data of all the people in the DB, (so ID, name, fasting/satiated), then I use this collection for my counters and to display who took the meal and who didn't.
(the variable "mealKind" is = "breakfast" OR "lunch" OR "dinner" depending on the actual serving.)
Template.canteen.created = function(){
Meals=new Mongo.Collection(null);
var today= new Date();today.setHours(0,0,1);
var pers=Persons.find({"status":"present"},{fields:{"Name":1,"Surname":1,"barcode":1}}).fetch();
pers.forEach(function(uno){
var vediamo=Log.findOne({"dest":uno.codice,"what":mealKind, "when":{"$gte": today}});
if(typeof vediamo=="object"){
uno['eat']="satiated";
}else{
uno['eat']="fasting";
}
Meals.insert(uno);
});
};
Template.canteen.destroyed = function(){
meals.remove({});
};
From the meal collection I estrapolate the two colums of people satiated (with name, surname and barcode) and fasting, and I also use two helpers:
fasting:function(){
return Meals.find({"eat":"fasting"});
}
"countFasting":function(){
return Meals.find({"eat":"fasting"}).count();
}
//same for satiated
This was ok, but now the number of people is really increasing (we are arount 1000 and counting) and the creation of the page is very very slow, and usually it stops with errors so I can read that "100 fasting, 400 satiated" but I have around 1000 persons in the DB.
I can't figure out how to optimize the workflow, every other method that I tried involved (in a manner or another) more queries to the DB; I think that I missed the point and now I cannot see it.
I'm not sure about aggregation at this level and inside meteor, because of minimongo.
Although making this server side and not client side is clever, the problem here is HOW discriminate "fasting" vs "satiated" without cycling all the person collection.
+1 if the solution is compatibile with aleed:tabular

EDIT
I am still not sure about what is causing your performance issue (too many things in client memory / minimongo, too many calls to it?), but you could at least try different approaches, more traditionally based on your server.
By the way, you did not mention either how you display your data or how you get the incorrect reading for your number of already served / missing Persons?
If you are building a classic HTML table, please note that browsers struggle rendering more than a few hundred rows. If you are in that case, you could implement a client-side table pagination / infinite scrolling. Look for example at jQuery DataTables plugin (on which is based aldeed:tabular). Skip the step of building an actual HTML table, and fill it directly using $table.rows.add(myArrayOfData).draw() to avoid the browser limitation.
Original answer
I do not exactly understand why you need to duplicate your Persons collection into a client-side Meals local collection?
This requires that you have first all documents of Persons sent from server to client (this may not be problematic if your server is well connected / local. You may also still have autopublish package on, so you would have already seen that penalty), and then cloning all documents (checking for your Logs collection to retrieve any previous passages), effectively doubling your memory need.
Is your server and/or remote DB that slow to justify your need to do everything locally (client side)?
Could be much more problematic, should you have more than one "cashier" / client browser open, their Meals local collections will not be synchronized.
If your server-client connection is good, there is no reason to do everything client side. Meteor will automatically cache just what is needed, and provide optimistic DB modification to keep your user experience fast (should you structure your code correctly).
With aldeed:tabular package, you can easily display your Persons big table by "pages".
You can also link it with your Logs collection using the dburles:collection-helpers (IIRC there is an example en the aldeed:tabular home page).

How does meteor perform with many subscriptions over larger documents?

I'm currently designing a database for a fairly large meteor app, and we're debating whether Meteor will perform better using more subscriptions, to collections of tiny documents, or fewer subscriptions to collections of larger documents.
Some of these documents could in end become quite large, such as listings of user favourites or preferences, that would only be viewable to the individual user in a particular view.
In terms of numbers we're talking about 10 subscriptions, at least four of which would not be consistently subscribed too, returning only a single larger document in particular views.
Versus 4 subscriptions to collections of possibly quite large documents, (I realize that those individual view would probably render faster having the data already on the client).
Any insights or empirical data would be incredibly helpful.
Thanks.

This is probably not a complete answer. I'm currently in the process of solving this problem myself.
I have some experience with larger data sets. I subscribe to a single collection without any restrictions:
Meteor.publish('collectionName', function () {
return collectionName.find();
});
My collection contains 400 documents with a total size of approximately 600KB (after dumping the collection using mongodump). With about 100 users in the system (using it daily from different continents) we do have a few performance issues:
When I load the page displaying the 400 items, you can watch the list being populated.
It is also important what you do with the 400 documents. Each of my entries is a fairly complex DOM node with multiple helpers executed for each item.
Generating the DOM takes some time and causes a peak on the clients CPU. Also a lot of data is transferred to the client.
A few ideas to solve the problems:
Using pagination (e.g. package alethes:pages) to prevent intense client computation.
Use a separate publication/subscription for a list view only including the fields necessary to display the list items to reduce the size of each document
Only publish/subscribe to documents that are viewable to the current user (e.g. checking for login status, access permissions, etc) to reduce the number of documents.
Cache the subscription: https://github.com/meteorhacks/subs-manager (haven't tried that yet)

Meteor.js - Should you denormalize data?

This question has been driving me crazy and I can't get my head around it. I come from a MySQL relational background and have been using Meteorjs and Mongo. For the purposes of this question take the example of posts and authors. One Author to Many Posts. I have come up with two ways in which to do this:
Have a single collection of posts - Each post has the author information embedded into the document. This of course leads to denormalization and issues such as if the author name changes how do you keep the data correct.
Have two collections: posts and authors - Each post has an author ID which references the authors collection. I then attempt to do a "join" on a non relational database while trying to maintain reactivity.
It seems to me with MongoDB degrees of denormalization is acceptable and I am tempted to embed as implementing joins really does feel like going against the ideals of Mongo.
Can anyone shed any light on what is the right approach especially in terms of wanting my app data to scale well and be manageable?
Thanks

Denormalisation is useful when you're scaling your application and you notice that some queries are taking too much time to complete. I also noticed that most Mongodb developers tend to forget about data normalisation but that's another topic.
Some developers say things like: "Don't use observe and observeChanges because it's slow". We're building real-time applications so that a normal thing to happen, it's a CPU intensive app design.
In my opinion, you should always aim for a normalised database design and then you have to decide, try and test which fields, that duplicated/denormalised, could improve your app's performance. Example: You remove 1 query per user. The UI need an extra field and it's fast to duplicated it, etc.
With the denormalisation you've an extra price to pay. You've to update the denormalised fields according to the main collection.
Example:
Let's say that you Authors and Articles collections. On each article you have the author name. The author might change his name. With a normalised scenario, it works fine. With a denormalised scenario you have to update the Author document name AND every single article, owned by this author, with the new name.
Keeping a normalised design makes you life easier but denormalisation, eventually, becomes necessary.
From a MeteorJs perspective: With the normalised scenario you're sending data from 2 Collections to the client. With the denormalised scenario, you only send 1 collection. You can also reactively join on the server and send 1 collection to the client, although it increases the RAM usage because of MergeBox on the server.
Denormalisation is something that it's very specify for you application needs. You can use Kadira to find ways of making your application faster. The database design is only 1 factor out of many that you play with when trying to improve performance.

We Keep Coding

JavaScript is the programming language of the Web.