Meteor, mongodb - canteen optimization

Meteor, mongodb - canteen optimization - javascript

TL;DR:
I'm making an app for a canteen. I have a collection with the persons and a collection where I "log" every meat took. I need to know those who DIDN'T take the meal.
Long version:
I'm making an application for my local Red Cross.
I'm trying to optimize this situation:
there is a canteen at wich the helped people can take food at breakfast, lunch and supper. We need to know how many took the meal (and this is easy).
if they are present they HAVE TO take the meal and eat, so we need to know how many (and who) HAVEN'T eat (this is the part that I need to optimize).
When they take the meal the "cashier" insert their barcode, the program log the "transaction" in the log collection.
Actually, on creation of the template "canteen" I create a local collection "meals" and populate it with the data of all the people in the DB, (so ID, name, fasting/satiated), then I use this collection for my counters and to display who took the meal and who didn't.
(the variable "mealKind" is = "breakfast" OR "lunch" OR "dinner" depending on the actual serving.)
Template.canteen.created = function(){
Meals=new Mongo.Collection(null);
var today= new Date();today.setHours(0,0,1);
var pers=Persons.find({"status":"present"},{fields:{"Name":1,"Surname":1,"barcode":1}}).fetch();
pers.forEach(function(uno){
var vediamo=Log.findOne({"dest":uno.codice,"what":mealKind, "when":{"$gte": today}});
if(typeof vediamo=="object"){
uno['eat']="satiated";
}else{
uno['eat']="fasting";
}
Meals.insert(uno);
});
};
Template.canteen.destroyed = function(){
meals.remove({});
};
From the meal collection I estrapolate the two colums of people satiated (with name, surname and barcode) and fasting, and I also use two helpers:
fasting:function(){
return Meals.find({"eat":"fasting"});
}
"countFasting":function(){
return Meals.find({"eat":"fasting"}).count();
}
//same for satiated
This was ok, but now the number of people is really increasing (we are arount 1000 and counting) and the creation of the page is very very slow, and usually it stops with errors so I can read that "100 fasting, 400 satiated" but I have around 1000 persons in the DB.
I can't figure out how to optimize the workflow, every other method that I tried involved (in a manner or another) more queries to the DB; I think that I missed the point and now I cannot see it.
I'm not sure about aggregation at this level and inside meteor, because of minimongo.
Although making this server side and not client side is clever, the problem here is HOW discriminate "fasting" vs "satiated" without cycling all the person collection.
+1 if the solution is compatibile with aleed:tabular

EDIT
I am still not sure about what is causing your performance issue (too many things in client memory / minimongo, too many calls to it?), but you could at least try different approaches, more traditionally based on your server.
By the way, you did not mention either how you display your data or how you get the incorrect reading for your number of already served / missing Persons?
If you are building a classic HTML table, please note that browsers struggle rendering more than a few hundred rows. If you are in that case, you could implement a client-side table pagination / infinite scrolling. Look for example at jQuery DataTables plugin (on which is based aldeed:tabular). Skip the step of building an actual HTML table, and fill it directly using $table.rows.add(myArrayOfData).draw() to avoid the browser limitation.
Original answer
I do not exactly understand why you need to duplicate your Persons collection into a client-side Meals local collection?
This requires that you have first all documents of Persons sent from server to client (this may not be problematic if your server is well connected / local. You may also still have autopublish package on, so you would have already seen that penalty), and then cloning all documents (checking for your Logs collection to retrieve any previous passages), effectively doubling your memory need.
Is your server and/or remote DB that slow to justify your need to do everything locally (client side)?
Could be much more problematic, should you have more than one "cashier" / client browser open, their Meals local collections will not be synchronized.
If your server-client connection is good, there is no reason to do everything client side. Meteor will automatically cache just what is needed, and provide optimistic DB modification to keep your user experience fast (should you structure your code correctly).
With aldeed:tabular package, you can easily display your Persons big table by "pages".
You can also link it with your Logs collection using the dburles:collection-helpers (IIRC there is an example en the aldeed:tabular home page).

Related

Loading small pieces of data with many subscriptions in Meteor

I have a question about Collections - specifically, I want to have a large collection on a server, and load only small bits of it a piece at a time, in an unpredictable order, where I might stop wanting to have a local copy of any given piece at any time. Should I make a new subscription for each piece of data, and then stop it when I no longer want that piece of data? Or should I use some other method? Or should I just load large chunks of it that I won't use and leave them sitting around in my local copy of the collection?
Edit: Or should I have one subscription with a list of the ID's for each piece of data I want, and have the publication function specifically find each of those? Seems complicated, but it does keep me with only having to deal with one subscription.
Edit: Or maybe I should just skip using publications and subscriptions, and just use Methods to pass my data to the client? Loses a lot of functionality, and requires some extra work, but it does dodge most of the problems and should work just fine for my purposes.

Suppose Mongo collection ="items"
{
name:'item1',
type:'basic',
qty:40
}
you define collections on the Meteor server with
Items= new Mongo.Collection('items')
1.These collections contain all the data from the MongoDB collections, and you can run Items.find({...}) on them, which will return a cursor (a set of records, with methods to iterate through them and return them).
Meteor.publish('itemOver30', function itemPublication() {
return(Items.find({qty:{gte:10},{name:1,qty:1}));
});
This will return cursor to all the records with item qty over 30 in items collection(subset of total records, not whole collection).
2.Cursor is used to publish (send) a set of records (called a "record set"). You can optionally publish only some fields from those records. It is record sets (not collections) that clients subscribe to.
Meteor.subscribe('itemOver30');
On the client, you have Minimongo collections that partially mirror some of the records from the server. "Partially" because they may contain only some of the fields, and "some of the records" because you usually want to send to the client only the records it needs, to speed up page load, and only those it needs and has permission to access.

Should I use a cache for this?

I made a code this summer holidays and today I look for the first time at my code again, and I am strugging on one thing I did.
My system is a system with multiple types (pages, newsletters etc.) and multiple subtypes (items, archive, concepts etc.). The idea now I have an object like this:
object { 1: { normal: { 1: { content: 'somecontent', title: 'sometitle' } } } }
Another example:
object { 1: { normal: { 1: { content: 'somecontent', title: 'sometitle' } }, archive: {} }, 2: { normal: {} } }
The data originally comes from the database. I'm making a system to edit pages on the website and other things like newsletters. Because I have multiple types and subtypes.
I made a cache for the reason I don't want to get all items from the database every time. But now the problem is if I add an item, edit an item and remove an item I have to delete it from the cache / edit / add.
My question: is this a good way? I thought it is because you don't have to call an AJAX file to get the data from the database.
I'm sorry if I'm not allowed to ask this here.

My question: is this a good way? I thought it is because you don't
have to call an AJAX file to get the data from the database.
The answer is that "it depends". There is no always right and always wrong answer for caching because caching is a tradeoff between efficiency and timeliness of data.
If you want maximum efficiency, you cache like crazy, but your data may not be perfectly up to date because you're using old data from the cache.
If you want the most up-to-date data, you don't cache anything so you always get the latest data, but obviously efficiency may suffer if you are regular requesting the same data over and over.
So, it's a tradeoff and the tradeoff depends entirely upon the application, its needs, how often the data is modified and what the consequences are for having stale data or for not caching. There is no single right or wrong answer for that tradeoff. It depends entirely upon the particular situation for your application and the tradeoff may even be different for some types of data vs. others within the same application.
For example, let's supposed you were writing an online bidding site that offered some functionality like eBay. You would probably be fine caching the item description for at least several hours because that almost never changes and even if it does, the consequences of being a bit tardy on seeing a new item description are fairly low. But, you could never cache the data on the current bid because the timeliness of that information is critical. The user needs to always see the latest info on the current bid, even if you have to make some sacrifices in efficiency.
Also, remember that caching isn't completely all or none. You can set a lifetime for a cached value such that it can only be used for a certain period of time that is appropriate for the type of data. For example, you might cache an item description in the above auction for up to 2 hours. This allows you to achieve some efficiency gains, but also to eventually see the new data if it happens to change.
In general, you have to review the consequences of showing stale data. If the consequences for having data that is even minutes out of date are high (like the latest price in a live auction), then you can't cache that data at all.
If the consequences of having data that is even hours out of date are low, then you can likely cache that value for at least several hours - maybe even longer.
And, when considering what to cache, you obviously want to first look at the items that are most requested and are the most expensive on your server to retrieve. Some analysis of the usage pattern on your server would give you a prioritized list of candidates to consider for caching.

My question: is this a good way? I thought it is because you don't
have to call an AJAX file to get the data from the database.
This is fine if
1) You want to provide offline reading continuity to the user. User doesn't have to wait for internet connection to be available so that they can read at any time.
2) Your data-service is quite heavy and you want to avoid multiple/frequent visits to the server to get the same data over and over again.
3) You want your app to be bundled with a native package (like phonegap) to become a hybrid app and give a complete offline experience to the user.
This is not a comprehensive list, but just to get your started in terms of when to go for offline and when to keep totally offline
So, on the other hand, this is a bad idea if
1) Your local storage structure is going to change frequently for user to require re-install (unless you can figure out auto-upgrate of local storage)
2) All your features are transactional and require synch with other users also.

Nothing wrong with your approach, just make sure you have kept these points in mind while managing client-side cache
You have one variable 'version' maintained, this version is to be increased whenever there's any change in structure, this version will be sent to client every time, client is responsible for comparison of versions and empty client cache if server version is greater than client version.
You can implement or find any open-sources to handle your ajax responses, this one might be useful - https://github.com/SaneMethod/jquery-ajax-localstorage-cache.
you can set proper expiry tag from server, which can also help, browser to cache response for you, if it is 'get' request.
You can also implement server-side cache, which will not make calls to database, it will cache response against request-url, Note - if different users are supposed to receive different response than this approach wont work. You can delete the cache if any changes happens related to that particular data set - delete/update
In your case you can also maintain flags on server, which simply tells if data has been updated or not the time of article update, if stored version is older you can make server-request or just use local version.
I hope it helps.

Meteor.js - Should you denormalize data?

This question has been driving me crazy and I can't get my head around it. I come from a MySQL relational background and have been using Meteorjs and Mongo. For the purposes of this question take the example of posts and authors. One Author to Many Posts. I have come up with two ways in which to do this:
Have a single collection of posts - Each post has the author information embedded into the document. This of course leads to denormalization and issues such as if the author name changes how do you keep the data correct.
Have two collections: posts and authors - Each post has an author ID which references the authors collection. I then attempt to do a "join" on a non relational database while trying to maintain reactivity.
It seems to me with MongoDB degrees of denormalization is acceptable and I am tempted to embed as implementing joins really does feel like going against the ideals of Mongo.
Can anyone shed any light on what is the right approach especially in terms of wanting my app data to scale well and be manageable?
Thanks

Denormalisation is useful when you're scaling your application and you notice that some queries are taking too much time to complete. I also noticed that most Mongodb developers tend to forget about data normalisation but that's another topic.
Some developers say things like: "Don't use observe and observeChanges because it's slow". We're building real-time applications so that a normal thing to happen, it's a CPU intensive app design.
In my opinion, you should always aim for a normalised database design and then you have to decide, try and test which fields, that duplicated/denormalised, could improve your app's performance. Example: You remove 1 query per user. The UI need an extra field and it's fast to duplicated it, etc.
With the denormalisation you've an extra price to pay. You've to update the denormalised fields according to the main collection.
Example:
Let's say that you Authors and Articles collections. On each article you have the author name. The author might change his name. With a normalised scenario, it works fine. With a denormalised scenario you have to update the Author document name AND every single article, owned by this author, with the new name.
Keeping a normalised design makes you life easier but denormalisation, eventually, becomes necessary.
From a MeteorJs perspective: With the normalised scenario you're sending data from 2 Collections to the client. With the denormalised scenario, you only send 1 collection. You can also reactively join on the server and send 1 collection to the client, although it increases the RAM usage because of MergeBox on the server.
Denormalisation is something that it's very specify for you application needs. You can use Kadira to find ways of making your application faster. The database design is only 1 factor out of many that you play with when trying to improve performance.

Breeze.js cache limitations? Or Browser?

We are investigating using Breeze for field deployment of some tools. The scenario is this -- an auditor will visit sites in the field, where most of the time there will be no -- or very degraded -- internet access. Rather than replicate our SQL database on all the laptops and tablets (if that's even possible), we are hoping to use Breeze to cache the data and then store it locally so it is accessible when there is not a usable connection.
Unfortunately, Breeze seems to choke when caching any significant amount of data. Generally on Chrome it's somewhere between 8 and 13MB worth of entities (as measured by the HTTPResponse headers). This can change a bit depending on how many tabs I have open and such, but I have not been able to move that more than 10%. the error I get is the Chrome tab crashes and tells me to reload. The error is replicable (I download the data in 100K chunks and it fails on the same read every time and works fine if I stop it after the previous read) When I change the page size, it always fails within the same range.
Is this a limitation of Breeze, or Chrome? Or windows? I tried it on Firefox, and it handles even less data before the whole browser crashes. IE fares a little better, but none of them do great.
Looking at performance in task manager, I get the following:
IE goes from 250M memory usage to 1.7G of memory usage during the caching process and caches a total of about 14MB before throwing an out-of-memory error.
Chrome goes from 206B memory usage to about 850M while caching a total of around 9MB
Firefox goes from around 400M to about 750M and manages to cache about 5MB before the whole program crashes.
I can calculate how much will be downloaded with any selection criteria, but I cannot find a way to calculate how much data can be handled by any specific browser instance. This makes using Breeze for offline auditing close to useless.
Has anyone else tackled this problem yet? What are the best approaches to handling something like this. I've thought of several things, but none of them are ideal. Any ideas would be appreciated.
ADDED At Steve Schmitt's request:
Here are some helpful links:
Metadata
Entity Diagram (pdf) (and html and edmx)
The first query, just to populate the tags on the page runs quickly and downloads minimal data:
var query = breeze.EntityQuery
.from("Countries")
.orderBy("Name")
.expand("Regions.Districts.Seasons, Regions.Districts.Sites");
Once the user has select the Sites s/he wishes to cache, the following two queries are kicked off (used to be one query, but I broke it into two hoping it would be less of a burden on resources -- it didn't help). The first query (usually 2-3K entities and about 2MB) runs as expected. Some combination of the predicates listed are used to filter the data.
var qry = breeze.EntityQuery
.from("SeasonClients")
.expand("Client,Group.Site,Season,VSeasonClientCredit")
.orderBy("DistrictId,SeasonId,GroupId,ClientId")
var p = breeze.Predicate("District.Region.CountryId", "==", CountryId);
var p1 = breeze.Predicate("SeasonId", "==", SeasonId);
var p2 = breeze.Predicate("DistrictId", "==", DistrictId);
var p3 = breeze.Predicate("Group.Site.SiteId", "in", SiteIds);
After the first query runs, the second query (below) runs (also using some combination of the predicates listed to filter the data. At about 9MB, it will have about 50K rows to download). When the total download burden between the two queries is between 10MB and 13MB, browsers will crash.
var qry = breeze.EntityQuery
.from("Repayments")
.orderBy('SeasonId,ClientId,RepaymentDate');
var p1 = breeze.Predicate("District.Region.CountryId", "==", CountryId);
var p2 = breeze.Predicate("SeasonId", "==", SeasonId);
var p3 = breeze.Predicate("DistrictId", "==", DistrictId);
var p4 = breeze.Predicate("SiteId", "in", SiteIds);
Thanks for the interest, Steve. You should know that the Entity Relationships are inherited and currently in production supporting the majority of the organization's operations, so as few changes as possible to that would be best. Also, the hope is to grow this from a reporting application to one with which data entry can be done in the field (so, as I understand it, using projections to limit the data wouldn't work).
Thanks for the interest, and let me know if there is anything else you need.

Here are some suggestions based on my experience building on an offline capable web application using breeze. Some or all of these might not make sense for your use cases...
Identify which entity types need to be editable vs which are used to fill drop-downs etc. Load non-editable data using the noTracking query option and cache them in localStorage yourself using JSON.stringify. This avoids the overhead of coercing the data into entities, change tracking, etc. Good candidates for this approach in your model might be entity types like Country, Region, District, Site, etc.
If possible, provide a facility in your application for users to identify which records they want to "take offline". This way you don't need to load and cache everything, which can get quite expensive depending on the number of relationships, entities, properties, etc.
In conjunction with suggestion #2, avoid loading all the editable data at once and avoid using the same EntityManager instance to load each set of data. For example, if the Client entity is something that needs to be editable out in the field without a connection, create a new EntityManager, load a single client (expanding any children that also need to be editable) and cache this data separately from other clients.
Cache the breeze metadata once. When calling exportEntities the includeMetadata argument should be false. More info on this here.
To create new EntityManager instances make use of the createEmptyCopy method.
EDIT:
I want to respond to this comment:
Say I have a client who has bills and payments. That client is in a
group, in a site, in a region, in a country. Are you saying that the
client, payment, and bill information might each have their own EM,
while the location hierarchy might be in a 4th EM with no-tracking?
Then when I refer to them, I wire up the relationships as needed using
LINQs on the different EMs (give me all the bills for customer A, give
me all the payments for customer A)?
It's a bit of a judgement call in terms of deciding how to separate things out. Some of what I'm suggesting might be overkill, it really depends on the amount of data and the way your application is used.
Assuming you don't need to edit groups, sites, regions and countries while offline, the first thing I'd do would be to load the list of groups using the noTracking option and cache them in localStorage for offline use. Then do the same for sites, regions and countries. Keep in mind, entities loaded with the noTracking option aren't cached in the entity manager so you'll need to grab the query result, JSON.stringify it and then call localStorage.setItem. The intent here is to make sure your application always has access to the list of groups, sites, regions, etc so that when you display a form to edit a client entity you'll have the data you need to populate the group, site, region and country select/combobox/dropdown.
Assuming the user has identified the subset of clients they want to work with while offline, I'd then load each of these clients one at a time (including their payment and bill information but not expanding their group, site, region, country) and cache each client+payments+bills set using entityManager.exportEntities. Reasoning here is it doesn't make sense to load several clients plus their payments and bills into the same EntityManager each time you want to edit a particular client. That could be a lot of unnecessary overhead, but again, this is a bit of a judgement call.

#Jeremy's answer was excellent and very helpful, but didn't actually answer the question, which I was starting to think was unanswerable, or at least the wrong question. However #Steve in the comments gave me the most appropriate information for this question.
It is neither Breeze nor the Browser, but rather Knockout. Apparently the knockout wrapper around the breeze entities uses all that memory (at least while loading the entities and in my environment). As described above, Knockout/Breeze would crap out after reading around 5MB of data, causing Chrome to crash with over 1.7GB of memory usage (from a pre-download memory usage around 300MB). Rewriting the app in ANgularJS eliminated the problem. So far I have been able to download over 50MB from the exact same EF6 model into Breeze/Angular, total Chrome memory usage never went above 625MB.
I will be testing larger payloads, but 50 MB more than satisfies my needs for the moment. Thanks everyone for your help.

CouchDB: How to change view function via javascript?

I am playing around with CouchDB to test if it is "possible" [1] to store scientific data (simulated and experimental raw data + metadata). A big pro is the schema-less approach of CouchDB: we have to be very flexible with the metadata, as the set of parameters changes very often.
Up to now I have some code to feed raw data, plots (both as attachments), and hierarchical metadata (as JSON) into CouchDB documents, and have written some prototype Javascript for filtering and showing. But the filtering is done on the client side (a.k.a. browser): The map function simply returns everything.
How could I change the (or push a second) map function of a specific _design-document with simple browser-JS?
I do not think that a temporary view would yield any performance gain...
Thanks for your time and answers.
[1]: of course it is possible, but is it also useful? feasible? reasonable?
[added]
Ah, the jquery.couch.js (version 0.9.0) provides a saveDoc() function, which could update the _design document with the new map function.
But I also tried out the query function, which uses a temporary view. Okay, "do not use this in the real product, only during development"... But scientific research is steady development, right?
Temporary views are getting cached, as I noticed, and it works well for ~1000 documents per DB. A second plus: all users (think of 1 to 3, so a big user management is quit of an overkill) can work with their own temporary view.

Never ever use temporary views. They are really only there for dev and debugging purposes. For more information, see http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views (specifically the bold "NOTE").
And yes, because design documents are really just documents with special powers, you can run you GET/POST/PUT/DELETE methods on them. However, you will usually need admin privileges to do this. So, if you are allowing a client side piece of software to do that, you are making your entire database public for read/write access - this may be fine for your application, but is important to remember.
Ex., if you restrict access to your database, but put the username and password in client side javascript, then anyone can see that username and password.
Cheers.

I´ve written an helper functions for jquery.couch and design docs, take a look at:
https://github.com/grischaandreew/jquery.couch.js

We Keep Coding

JavaScript is the programming language of the Web.