firestore: arrays vs sub collection of documents performance

firestore: arrays vs sub collection of documents performance - javascript

i would like to ask if there is a best practice for firestore, when one develops a chat app, and what is the best practice to store messages for chat-rooms.
The assumption here is that every chatroom has its own document.
I started using an array to store the messages from the users. The problem with that approach is that there is no way to add, a insert(append) a new entry everytime a new message is submitted to the chat room. One has to save a new copy of the array with the new message appended. This seems like something that would scale really bad, unless the chat history is split in sub-arrays etc..
In the official documents, they suggest a structure, where one should store the messages of a specific chatroom as separate documents in a sub collection of that chatroom. I wonder if this approach is the best, and what would be some drawbacks, or if there is another preferred way to do this.

I would generally go with the approach of "Every chat room has a subcollection of messages. And every new message is a separate document in this subcollection." This has several advantages: It's easy to add or edit individual messages, and you can perform a number of different queries (like "Grab the 20 most recent messages")
The biggest drawback, I suppose, is that if you find that new users are frequently going to be entering your chat and will want to see the entire chat history of the room up until they joined, that would result in a large number of database reads. Realistically, though, I don't know how often that would happen in real life, and you could mitigate this by using pagination to grab your historical chat in batches.

To add to what Todd said:
In arrays you cannot store Timestamps - a big downside for your case, as you'll want the time the message was sent.

Related

What is the best way to reduce firestore document reads in chat app on page load?

I have a chat app that is charging me a large number of reads for each page load, 1 for each message to show. I'm trying to figure out a way to reduce that number and optimize for cost, as refreshing the page a few times costs hundreds of reads.
The firestore pricing documentation says
For queries other than document
reads, such as a request for a list of collection IDs, you are billed
for one document read. If fetching the complete set of results
requires more than one request (for example, if you are using
pagination), you are billed once per request.
I considered that maybe if I fetch an entire collection without a query like shown here in the docs, a cost difference might be remotely possible. I'm sure that's probably wrong, but I can't find anything specifying what the exceptions are that only cost 1 read. It also crossed my mind to create an array to hold the most recent messages in the parent document of the collection, but the security rules for updating that array seem overly complex and not practical. I also read about using the firebase cache, but that doesn't seem useful here.
Here is code to demonstrate how I'm currently loading messages. I'm using the react-firebase-hooks library to snapshot this data with useCollectionData:
const q = query(messagesRef, orderBy("createdAt", "desc"), limit(100))
const [messages] = useCollectionData(q)
In researching, I found this question where I'm pretty sure the accepted answer is wrong. It did make me question the rules. Are there any strategies to reduce the number of reads for this common use case?

Pagination still incurs charges on a per-document read, right?
Yes, it does, but only when you load more pages.
I'm not trying to load the entire collection, but rather wondering if loading the collection without a query has a different cost than with.
Loading a collection without a query that is limiting the results, means that you're reading the entire collection. And, yes, the cost will be much higher if you're not using a query. Remember, that the cost of reading a collection/query in Firestore is equal to the number of documents that are actually returned. For example, if you have a collection of 1 million documents, and your query returns 100, you'll have to pay only 100 document reads.
I'm overall trying to figure out if there's a strategy that can improve the read cost of the example query I gave.
No. If you need to get the newest 100 messages, that's the best query you can have. The only change you can make to decrease the number of reads would be to change the value that you pass to the limit() function. And maybe it makes sense since a user might not be interested in reading 100 messages at once. Always try to display data that fits into a screen, and load any other data progressively.

Meteor, mongodb - canteen optimization

TL;DR:
I'm making an app for a canteen. I have a collection with the persons and a collection where I "log" every meat took. I need to know those who DIDN'T take the meal.
Long version:
I'm making an application for my local Red Cross.
I'm trying to optimize this situation:
there is a canteen at wich the helped people can take food at breakfast, lunch and supper. We need to know how many took the meal (and this is easy).
if they are present they HAVE TO take the meal and eat, so we need to know how many (and who) HAVEN'T eat (this is the part that I need to optimize).
When they take the meal the "cashier" insert their barcode, the program log the "transaction" in the log collection.
Actually, on creation of the template "canteen" I create a local collection "meals" and populate it with the data of all the people in the DB, (so ID, name, fasting/satiated), then I use this collection for my counters and to display who took the meal and who didn't.
(the variable "mealKind" is = "breakfast" OR "lunch" OR "dinner" depending on the actual serving.)
Template.canteen.created = function(){
Meals=new Mongo.Collection(null);
var today= new Date();today.setHours(0,0,1);
var pers=Persons.find({"status":"present"},{fields:{"Name":1,"Surname":1,"barcode":1}}).fetch();
pers.forEach(function(uno){
var vediamo=Log.findOne({"dest":uno.codice,"what":mealKind, "when":{"$gte": today}});
if(typeof vediamo=="object"){
uno['eat']="satiated";
}else{
uno['eat']="fasting";
}
Meals.insert(uno);
});
};
Template.canteen.destroyed = function(){
meals.remove({});
};
From the meal collection I estrapolate the two colums of people satiated (with name, surname and barcode) and fasting, and I also use two helpers:
fasting:function(){
return Meals.find({"eat":"fasting"});
}
"countFasting":function(){
return Meals.find({"eat":"fasting"}).count();
}
//same for satiated
This was ok, but now the number of people is really increasing (we are arount 1000 and counting) and the creation of the page is very very slow, and usually it stops with errors so I can read that "100 fasting, 400 satiated" but I have around 1000 persons in the DB.
I can't figure out how to optimize the workflow, every other method that I tried involved (in a manner or another) more queries to the DB; I think that I missed the point and now I cannot see it.
I'm not sure about aggregation at this level and inside meteor, because of minimongo.
Although making this server side and not client side is clever, the problem here is HOW discriminate "fasting" vs "satiated" without cycling all the person collection.
+1 if the solution is compatibile with aleed:tabular

EDIT
I am still not sure about what is causing your performance issue (too many things in client memory / minimongo, too many calls to it?), but you could at least try different approaches, more traditionally based on your server.
By the way, you did not mention either how you display your data or how you get the incorrect reading for your number of already served / missing Persons?
If you are building a classic HTML table, please note that browsers struggle rendering more than a few hundred rows. If you are in that case, you could implement a client-side table pagination / infinite scrolling. Look for example at jQuery DataTables plugin (on which is based aldeed:tabular). Skip the step of building an actual HTML table, and fill it directly using $table.rows.add(myArrayOfData).draw() to avoid the browser limitation.
Original answer
I do not exactly understand why you need to duplicate your Persons collection into a client-side Meals local collection?
This requires that you have first all documents of Persons sent from server to client (this may not be problematic if your server is well connected / local. You may also still have autopublish package on, so you would have already seen that penalty), and then cloning all documents (checking for your Logs collection to retrieve any previous passages), effectively doubling your memory need.
Is your server and/or remote DB that slow to justify your need to do everything locally (client side)?
Could be much more problematic, should you have more than one "cashier" / client browser open, their Meals local collections will not be synchronized.
If your server-client connection is good, there is no reason to do everything client side. Meteor will automatically cache just what is needed, and provide optimistic DB modification to keep your user experience fast (should you structure your code correctly).
With aldeed:tabular package, you can easily display your Persons big table by "pages".
You can also link it with your Logs collection using the dburles:collection-helpers (IIRC there is an example en the aldeed:tabular home page).

How to store user actions in MeteorJs using MongoDB?

I'm using Meteor JS for a project so inherently I'm using MongoDB. I'm storing a user's check in and out actions. I'm currently storing them as individual docs in the collection. Each action contains 3 fields; in or out, time of action and userid. Is the best way to go though? Should I just have one doc per members and then store each action in an array? Is there another way? I anticipate several hundred members, but hopefully several thousands of members in the future. Thanks.

From experience, I can say that storing records instead of arrays is a better choice in the long run.
As far as Meteor is concerned, its reactivity handles collection records, but not individual fields in arrays. In other words, if one element gets added to the checkins array of a user object, the entire user object needs to be synchronized with the clients. If you store records instead, only the newly added record will be sent by the publication.
As far as MongoDB is concerned, there is a document size limit of 16MB. Not sure how frequent your checkins and checkouts are, but if you store them in an array, you might run into that limitation at some point.
Records are also easier to access than arrays.
For more details, see MongoDB data modeling and Database modeling in Bulletproof Meteor.

Meteor.js - Should you denormalize data?

This question has been driving me crazy and I can't get my head around it. I come from a MySQL relational background and have been using Meteorjs and Mongo. For the purposes of this question take the example of posts and authors. One Author to Many Posts. I have come up with two ways in which to do this:
Have a single collection of posts - Each post has the author information embedded into the document. This of course leads to denormalization and issues such as if the author name changes how do you keep the data correct.
Have two collections: posts and authors - Each post has an author ID which references the authors collection. I then attempt to do a "join" on a non relational database while trying to maintain reactivity.
It seems to me with MongoDB degrees of denormalization is acceptable and I am tempted to embed as implementing joins really does feel like going against the ideals of Mongo.
Can anyone shed any light on what is the right approach especially in terms of wanting my app data to scale well and be manageable?
Thanks

Denormalisation is useful when you're scaling your application and you notice that some queries are taking too much time to complete. I also noticed that most Mongodb developers tend to forget about data normalisation but that's another topic.
Some developers say things like: "Don't use observe and observeChanges because it's slow". We're building real-time applications so that a normal thing to happen, it's a CPU intensive app design.
In my opinion, you should always aim for a normalised database design and then you have to decide, try and test which fields, that duplicated/denormalised, could improve your app's performance. Example: You remove 1 query per user. The UI need an extra field and it's fast to duplicated it, etc.
With the denormalisation you've an extra price to pay. You've to update the denormalised fields according to the main collection.
Example:
Let's say that you Authors and Articles collections. On each article you have the author name. The author might change his name. With a normalised scenario, it works fine. With a denormalised scenario you have to update the Author document name AND every single article, owned by this author, with the new name.
Keeping a normalised design makes you life easier but denormalisation, eventually, becomes necessary.
From a MeteorJs perspective: With the normalised scenario you're sending data from 2 Collections to the client. With the denormalised scenario, you only send 1 collection. You can also reactively join on the server and send 1 collection to the client, although it increases the RAM usage because of MergeBox on the server.
Denormalisation is something that it's very specify for you application needs. You can use Kadira to find ways of making your application faster. The database design is only 1 factor out of many that you play with when trying to improve performance.

Manage relations among users in db

I am creating a mock app with user creation/auth/friend in a node js learning exercise. Having spent my time mostly at the front end of things, I am a n00b as far as DBs are concerned. I want to create a user database where I want to keep track of user profiles and their connections/friends.
Primary objective is to load/store users connections in the database.
Fetch this information and give it to the user most efficiently in least number of queries.
I'd really appreciate some help with a DB structure I should be using that can accomplish this. I am using mongodb and node.
Off the top of my head: I can store the user's connections in an object in the "connections" field. But this will involve making a lot of queries to fetch connections' details like their "about me" information - which I can also store in the same object as well.
Confused. Would really appreciate some pointers.

Take a look at the Mongoose ORM. It has a populate method that grabs foreign documents. Lots of other great stuff too.
You could say
Users.find({}).populate('connections').exec(function(err,users) { ... });
Before popualte the users' array of connections was an array of IDs, after, its an array of user documents.

We Keep Coding

JavaScript is the programming language of the Web.