Best practices for refetching part of a GraphQL query with Apollo?

Best practices for refetching part of a GraphQL query with Apollo? - javascript

I have the following react-apollo-wrapped GraphQL query:
user(id: 1) {
name
friends {
id
name
}
}
As semantically represented, it fetches the user with ID 1, returns its name, and returns the id and name of all of its friends.
I then render this in a component structure like the following:
graphql(ParentComponent)
-> UserInfo
-> ListOfFriends (with the list of friends passed in)
This is all working for me. However, I wish to be able to refetch the list of friends for the current user.
I can do this.props.data.refetch() on the parent component and updates will be propagated; however, I'm not sure this is the best practice, given that my GraphQL query looks something more like this,
user(id: 1) {
name
foo1
foo2
foo3
foo4
foo5
...
friends {
id
name
}
}
Whilst the only thing I wish to refetch is the list of friends.
What is the best way to cleanly architect this? I'm thinking along the lines of binding an initially skipped GraphQL fetcher to the ListOfFriends component, which can be triggered as necessary, but would like some guidance on how this should be best done.
Thanks in advance.

I don't know why you question is downvoted because I think it is a very valid question to ask. One of GraphQL's selling points is "fetch less and more at once". A client can decide very granually what it needs from the backend. Using deeply nested graphlike queries that previously required multiple endpoints can now be expressed in a single query. At the same time over-fetching can be avoided. Now you find yourself with a big query, everything loads at once and there are no n+1 query waterfalls. But now you know that a few fields in your big query are subject to change from now and then and you want to actively update the cache with new data from the server. Apollo offers the refetch field but it loads the whole query which clearly is overfetching that was sold to me as not being a concern anymore in GraphQL. Let me offer some solutions:
Premature Optimisation?
The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming. - Donald Knuth
Sometimes we try to optimise too much without measuring first. Write it the easy way first and then see if it is really an issue. What exactly is slow? The network? A particular field in the query? The sheer size of the query?
After you analized what exactly is slow we can start looking into improving:
Refetch and include/skip directives
Using directives you can exclude fields from a query depending on variables. The refetch function can specify different variables than the initial query. This way you can exclude fields when you refetch the query.
Splitting up Queries
Single page apps are a great idea. The HTML is generated client side and the page does not have to make expensive trips to the server to render a new page. But soon SPAs got to big and code splitting became an issue. And now we are basically back to server side rendering and splitting the app into pages. The same might apply to GraphQL. Sometimes queries are too big and should be split. You could split up the queries for UserInfo and ListOfFriends. Inside of the cache the fields will be merged. With query batching both queries will be send in the same request and a GraphQL server that implements per request resource caching correctly (e.g. with Dataloader) will barely notice a difference.
Subscriptions
Maybe you are ready to use subscriptions already. Subscriptions send updates from the server for fields that have changed. This way you could subscribe to a user's friends and get updates in real time. The good news is that Apollo Client, Relay and many server implementations offer support for subscriptions already. The bad news is that it needs websockets that usually put different requirements on your technology stack than pure HTTP.
withApollo() -> this.client.query
This should only be your last resort! Using react-apollo's withApollo higher order component you can directly inject the ApolloClient instance. You can now execute queries using this.client.query(). { user(id: 1) { friendlist { ... } } } can be used to just fetch the friend list and update the cache which will lead to an update of your component. This might look like what you want but can haunt you in later stages of the app.

Related

RESTful API with related resources

I'm in the process of designing my first serious RESTful API, which will sit above a WCF service.
There are resources like; outlet, schedule and job. A schedule is always owned by an outlet, and a schedule will contain 0 or more jobs. A job does not have to be on a schedule.
I keep coming back to thinking that resources should be addressable in the same type of way resources are addressed on a file system. This would mean I'd have URI's like:
/outlets
/outlets/4/schedules
/outlets/4/schedules/1000/jobs
/outlets/4/schedules/1000/jobs/5123
Things start to get messy though when considering how to pull resources back under different situations though.
e.g. I want a job not on a schedule:
/outlets/4/jobs/85 (this now means we've got 2 ways to pull a job back that's on a schedule)
e.g. I want all schedules regardless of outlet:
/schedules or /outlets/ALL/schedules
There are also lots of other more complex requirements but I'm sure you get the gist.
File system's have a good, logical way of addressing resources. You can obviously create symbolic links and achieve something approximating what I describe but it'd be messy. It'll be even messier once things get even slightly more complex, such as adding the ability to get schedules by date:
/outlets/4/2016-08-29/schedules
And without using query string parameters I'm not even sure how I'd request back all jobs that are NOT on a schedule. The following feels wrong because unscheduled is not a resource:
/outlets/4/unscheduled/jobs
So, I'm coming to think that file system type addressing is only going to work for the simplest of services (our underlying system has hundreds of entity types, with some very complex relationships and a huge number of operations).
Having multiple ways of doing the same thing tends to lead to confusion and messy documentation and I want to avoid it. As a result I'm almost forced to opt for going with the lowest common denominator and choosing very simple address forms - like the 3rd one below:
/outlets/4/schedules/1000/jobs/5123
/outlets/4/jobs/5123
/jobs/5123
From these very simple address forms I would then need to expand using query string parameters to do anything more complex, e.g:
/jobs?scheduleId=1000
/jobs?outletId=4
/jobs?outletId=4&fromDate=2016-01-01&toDate=2016-01-31
This feels like it's going against the REST model though and query string parameters like this aren't predictable, so far from the "no docs needed" idea.
OK, so at the minute I'm almost on the side of the fence that is saying in order to get a clean, maintainable API I'm going to have to go with very simple resource addresses, use query string parameters extensively and have good documentation.
Anyway, this doesn't feel like the conclusion I should have arrived at. Where have I gone wrong?

Welcome to the world of REST API programming. These are the hard problems which we all face when trying to apply general principles to specific situations. There is no clear and easy answer to your questions, but here are a few additional tips that may be useful.
First, you're right that the file-system approach to addressing breaks down when you have complex relationships. You'll only want to establish that sort of addressing when there's a true hierarchy there.
For example, if all jobs were part of a single schedule, then it would make sense to look to schedules/{id}/jobs/{id} to get to a given job. If you think of it from a data-storage perspective, you could imagine there being an XML file for each schedule, and the jobs would just be elements within that file.
However, it sounds like in this particular case your data is more relational. From a data-storage perspective, you'd represent each job as a row in a database table, and establish some foreign key relationships to tie some jobs to some schedules. Your addressing scheme should reflect this by making /jobs a top-level endpoint, and using optional query string parameters to filter by schedule or outlet when it makes sense to do so.
So you're on the right track. One more thing you might want to consider is OData, which extends the basic REST principles with a standards-oriented way of representing things like filtering with query string parameters. The address syntax feels a little "out there", but it does a pretty good job of handling the situations where straight REST starts falling apart. And because it's more standardized, there are tools available to help with things like translating from a data layer into an OData endpoint, or generating client-side proxy helpers based on the metadata exposed by that endpoint.
This feels like it's going against the REST model though and query string parameters like this aren't predictable, so far from the "no docs needed" idea.
If you use OData, then its spec combines with the metadata produced by your tooling to become your documentation. For example, your metadata says that a job has a date property which represents a date. Then the OData spec provides a way to represent filter queries for a date value. From this information, consumers can reliably produce a filter query that will "just work" because you're using a framework server-side to do the hard parts. And if they don't feel like memorizing how OData URLs work, they can generate a client proxy in the language of their choice so they can generate the appropriate URL via their favorite syntax.

Meteor, mongodb - canteen optimization

TL;DR:
I'm making an app for a canteen. I have a collection with the persons and a collection where I "log" every meat took. I need to know those who DIDN'T take the meal.
Long version:
I'm making an application for my local Red Cross.
I'm trying to optimize this situation:
there is a canteen at wich the helped people can take food at breakfast, lunch and supper. We need to know how many took the meal (and this is easy).
if they are present they HAVE TO take the meal and eat, so we need to know how many (and who) HAVEN'T eat (this is the part that I need to optimize).
When they take the meal the "cashier" insert their barcode, the program log the "transaction" in the log collection.
Actually, on creation of the template "canteen" I create a local collection "meals" and populate it with the data of all the people in the DB, (so ID, name, fasting/satiated), then I use this collection for my counters and to display who took the meal and who didn't.
(the variable "mealKind" is = "breakfast" OR "lunch" OR "dinner" depending on the actual serving.)
Template.canteen.created = function(){
Meals=new Mongo.Collection(null);
var today= new Date();today.setHours(0,0,1);
var pers=Persons.find({"status":"present"},{fields:{"Name":1,"Surname":1,"barcode":1}}).fetch();
pers.forEach(function(uno){
var vediamo=Log.findOne({"dest":uno.codice,"what":mealKind, "when":{"$gte": today}});
if(typeof vediamo=="object"){
uno['eat']="satiated";
}else{
uno['eat']="fasting";
}
Meals.insert(uno);
});
};
Template.canteen.destroyed = function(){
meals.remove({});
};
From the meal collection I estrapolate the two colums of people satiated (with name, surname and barcode) and fasting, and I also use two helpers:
fasting:function(){
return Meals.find({"eat":"fasting"});
}
"countFasting":function(){
return Meals.find({"eat":"fasting"}).count();
}
//same for satiated
This was ok, but now the number of people is really increasing (we are arount 1000 and counting) and the creation of the page is very very slow, and usually it stops with errors so I can read that "100 fasting, 400 satiated" but I have around 1000 persons in the DB.
I can't figure out how to optimize the workflow, every other method that I tried involved (in a manner or another) more queries to the DB; I think that I missed the point and now I cannot see it.
I'm not sure about aggregation at this level and inside meteor, because of minimongo.
Although making this server side and not client side is clever, the problem here is HOW discriminate "fasting" vs "satiated" without cycling all the person collection.
+1 if the solution is compatibile with aleed:tabular

EDIT
I am still not sure about what is causing your performance issue (too many things in client memory / minimongo, too many calls to it?), but you could at least try different approaches, more traditionally based on your server.
By the way, you did not mention either how you display your data or how you get the incorrect reading for your number of already served / missing Persons?
If you are building a classic HTML table, please note that browsers struggle rendering more than a few hundred rows. If you are in that case, you could implement a client-side table pagination / infinite scrolling. Look for example at jQuery DataTables plugin (on which is based aldeed:tabular). Skip the step of building an actual HTML table, and fill it directly using $table.rows.add(myArrayOfData).draw() to avoid the browser limitation.
Original answer
I do not exactly understand why you need to duplicate your Persons collection into a client-side Meals local collection?
This requires that you have first all documents of Persons sent from server to client (this may not be problematic if your server is well connected / local. You may also still have autopublish package on, so you would have already seen that penalty), and then cloning all documents (checking for your Logs collection to retrieve any previous passages), effectively doubling your memory need.
Is your server and/or remote DB that slow to justify your need to do everything locally (client side)?
Could be much more problematic, should you have more than one "cashier" / client browser open, their Meals local collections will not be synchronized.
If your server-client connection is good, there is no reason to do everything client side. Meteor will automatically cache just what is needed, and provide optimistic DB modification to keep your user experience fast (should you structure your code correctly).
With aldeed:tabular package, you can easily display your Persons big table by "pages".
You can also link it with your Logs collection using the dburles:collection-helpers (IIRC there is an example en the aldeed:tabular home page).

Meteor.js - Should you denormalize data?

This question has been driving me crazy and I can't get my head around it. I come from a MySQL relational background and have been using Meteorjs and Mongo. For the purposes of this question take the example of posts and authors. One Author to Many Posts. I have come up with two ways in which to do this:
Have a single collection of posts - Each post has the author information embedded into the document. This of course leads to denormalization and issues such as if the author name changes how do you keep the data correct.
Have two collections: posts and authors - Each post has an author ID which references the authors collection. I then attempt to do a "join" on a non relational database while trying to maintain reactivity.
It seems to me with MongoDB degrees of denormalization is acceptable and I am tempted to embed as implementing joins really does feel like going against the ideals of Mongo.
Can anyone shed any light on what is the right approach especially in terms of wanting my app data to scale well and be manageable?
Thanks

Denormalisation is useful when you're scaling your application and you notice that some queries are taking too much time to complete. I also noticed that most Mongodb developers tend to forget about data normalisation but that's another topic.
Some developers say things like: "Don't use observe and observeChanges because it's slow". We're building real-time applications so that a normal thing to happen, it's a CPU intensive app design.
In my opinion, you should always aim for a normalised database design and then you have to decide, try and test which fields, that duplicated/denormalised, could improve your app's performance. Example: You remove 1 query per user. The UI need an extra field and it's fast to duplicated it, etc.
With the denormalisation you've an extra price to pay. You've to update the denormalised fields according to the main collection.
Example:
Let's say that you Authors and Articles collections. On each article you have the author name. The author might change his name. With a normalised scenario, it works fine. With a denormalised scenario you have to update the Author document name AND every single article, owned by this author, with the new name.
Keeping a normalised design makes you life easier but denormalisation, eventually, becomes necessary.
From a MeteorJs perspective: With the normalised scenario you're sending data from 2 Collections to the client. With the denormalised scenario, you only send 1 collection. You can also reactively join on the server and send 1 collection to the client, although it increases the RAM usage because of MergeBox on the server.
Denormalisation is something that it's very specify for you application needs. You can use Kadira to find ways of making your application faster. The database design is only 1 factor out of many that you play with when trying to improve performance.

Breeze.js cache limitations? Or Browser?

We are investigating using Breeze for field deployment of some tools. The scenario is this -- an auditor will visit sites in the field, where most of the time there will be no -- or very degraded -- internet access. Rather than replicate our SQL database on all the laptops and tablets (if that's even possible), we are hoping to use Breeze to cache the data and then store it locally so it is accessible when there is not a usable connection.
Unfortunately, Breeze seems to choke when caching any significant amount of data. Generally on Chrome it's somewhere between 8 and 13MB worth of entities (as measured by the HTTPResponse headers). This can change a bit depending on how many tabs I have open and such, but I have not been able to move that more than 10%. the error I get is the Chrome tab crashes and tells me to reload. The error is replicable (I download the data in 100K chunks and it fails on the same read every time and works fine if I stop it after the previous read) When I change the page size, it always fails within the same range.
Is this a limitation of Breeze, or Chrome? Or windows? I tried it on Firefox, and it handles even less data before the whole browser crashes. IE fares a little better, but none of them do great.
Looking at performance in task manager, I get the following:
IE goes from 250M memory usage to 1.7G of memory usage during the caching process and caches a total of about 14MB before throwing an out-of-memory error.
Chrome goes from 206B memory usage to about 850M while caching a total of around 9MB
Firefox goes from around 400M to about 750M and manages to cache about 5MB before the whole program crashes.
I can calculate how much will be downloaded with any selection criteria, but I cannot find a way to calculate how much data can be handled by any specific browser instance. This makes using Breeze for offline auditing close to useless.
Has anyone else tackled this problem yet? What are the best approaches to handling something like this. I've thought of several things, but none of them are ideal. Any ideas would be appreciated.
ADDED At Steve Schmitt's request:
Here are some helpful links:
Metadata
Entity Diagram (pdf) (and html and edmx)
The first query, just to populate the tags on the page runs quickly and downloads minimal data:
var query = breeze.EntityQuery
.from("Countries")
.orderBy("Name")
.expand("Regions.Districts.Seasons, Regions.Districts.Sites");
Once the user has select the Sites s/he wishes to cache, the following two queries are kicked off (used to be one query, but I broke it into two hoping it would be less of a burden on resources -- it didn't help). The first query (usually 2-3K entities and about 2MB) runs as expected. Some combination of the predicates listed are used to filter the data.
var qry = breeze.EntityQuery
.from("SeasonClients")
.expand("Client,Group.Site,Season,VSeasonClientCredit")
.orderBy("DistrictId,SeasonId,GroupId,ClientId")
var p = breeze.Predicate("District.Region.CountryId", "==", CountryId);
var p1 = breeze.Predicate("SeasonId", "==", SeasonId);
var p2 = breeze.Predicate("DistrictId", "==", DistrictId);
var p3 = breeze.Predicate("Group.Site.SiteId", "in", SiteIds);
After the first query runs, the second query (below) runs (also using some combination of the predicates listed to filter the data. At about 9MB, it will have about 50K rows to download). When the total download burden between the two queries is between 10MB and 13MB, browsers will crash.
var qry = breeze.EntityQuery
.from("Repayments")
.orderBy('SeasonId,ClientId,RepaymentDate');
var p1 = breeze.Predicate("District.Region.CountryId", "==", CountryId);
var p2 = breeze.Predicate("SeasonId", "==", SeasonId);
var p3 = breeze.Predicate("DistrictId", "==", DistrictId);
var p4 = breeze.Predicate("SiteId", "in", SiteIds);
Thanks for the interest, Steve. You should know that the Entity Relationships are inherited and currently in production supporting the majority of the organization's operations, so as few changes as possible to that would be best. Also, the hope is to grow this from a reporting application to one with which data entry can be done in the field (so, as I understand it, using projections to limit the data wouldn't work).
Thanks for the interest, and let me know if there is anything else you need.

Here are some suggestions based on my experience building on an offline capable web application using breeze. Some or all of these might not make sense for your use cases...
Identify which entity types need to be editable vs which are used to fill drop-downs etc. Load non-editable data using the noTracking query option and cache them in localStorage yourself using JSON.stringify. This avoids the overhead of coercing the data into entities, change tracking, etc. Good candidates for this approach in your model might be entity types like Country, Region, District, Site, etc.
If possible, provide a facility in your application for users to identify which records they want to "take offline". This way you don't need to load and cache everything, which can get quite expensive depending on the number of relationships, entities, properties, etc.
In conjunction with suggestion #2, avoid loading all the editable data at once and avoid using the same EntityManager instance to load each set of data. For example, if the Client entity is something that needs to be editable out in the field without a connection, create a new EntityManager, load a single client (expanding any children that also need to be editable) and cache this data separately from other clients.
Cache the breeze metadata once. When calling exportEntities the includeMetadata argument should be false. More info on this here.
To create new EntityManager instances make use of the createEmptyCopy method.
EDIT:
I want to respond to this comment:
Say I have a client who has bills and payments. That client is in a
group, in a site, in a region, in a country. Are you saying that the
client, payment, and bill information might each have their own EM,
while the location hierarchy might be in a 4th EM with no-tracking?
Then when I refer to them, I wire up the relationships as needed using
LINQs on the different EMs (give me all the bills for customer A, give
me all the payments for customer A)?
It's a bit of a judgement call in terms of deciding how to separate things out. Some of what I'm suggesting might be overkill, it really depends on the amount of data and the way your application is used.
Assuming you don't need to edit groups, sites, regions and countries while offline, the first thing I'd do would be to load the list of groups using the noTracking option and cache them in localStorage for offline use. Then do the same for sites, regions and countries. Keep in mind, entities loaded with the noTracking option aren't cached in the entity manager so you'll need to grab the query result, JSON.stringify it and then call localStorage.setItem. The intent here is to make sure your application always has access to the list of groups, sites, regions, etc so that when you display a form to edit a client entity you'll have the data you need to populate the group, site, region and country select/combobox/dropdown.
Assuming the user has identified the subset of clients they want to work with while offline, I'd then load each of these clients one at a time (including their payment and bill information but not expanding their group, site, region, country) and cache each client+payments+bills set using entityManager.exportEntities. Reasoning here is it doesn't make sense to load several clients plus their payments and bills into the same EntityManager each time you want to edit a particular client. That could be a lot of unnecessary overhead, but again, this is a bit of a judgement call.

#Jeremy's answer was excellent and very helpful, but didn't actually answer the question, which I was starting to think was unanswerable, or at least the wrong question. However #Steve in the comments gave me the most appropriate information for this question.
It is neither Breeze nor the Browser, but rather Knockout. Apparently the knockout wrapper around the breeze entities uses all that memory (at least while loading the entities and in my environment). As described above, Knockout/Breeze would crap out after reading around 5MB of data, causing Chrome to crash with over 1.7GB of memory usage (from a pre-download memory usage around 300MB). Rewriting the app in ANgularJS eliminated the problem. So far I have been able to download over 50MB from the exact same EF6 model into Breeze/Angular, total Chrome memory usage never went above 625MB.
I will be testing larger payloads, but 50 MB more than satisfies my needs for the moment. Thanks everyone for your help.

which is better, searching in javascript or database?

I have a grid(employee grid) which has say 1000-2000 rows.
I display employee name and department in the grid.
When I get data for the grid, I get other detail for the employee too(Date of Birth, location,role,etc)
So the user has option to edit the employee details. when he clicks edit, I need to display other employee details in the pop up. since I have stored all the data in JavaScript, I search for the particular id and display all the details. so the code will be like
function getUserDetails(employeeId){
//i store all the employeedetails in a variable employeeInformation while getting //data for the grid.
for(var i=0;i<employeeInformation.length;i++){
if(employeeInformation[i].employeeID==employeeId){
//display employee details.
}
}
}
the second solution will be like pass employeeid to the database and get all the information for the employee. The code will be like
function getUserDetails(employeeId){
//make an ajax call to the controller which will call a procedure in the database
// to get the employee details
//then display employee details
}
So, which solution do you think will be optimal when I am handling 1000-2000 records.
I don't want to make the JavaScript heavy by storing a lot of data in the page.
UPDATED:
so one of my friend came up with a simple solution.
I am storing 4 columns for 500 rows(average). So I don't think there should not be rapid slowness in the webpage.
while loading the rows to the grid, under edit link, I give the data-rowId as an attribute so that it will be easy to retrieve the data.
say I store all the employee information in a variable called employeeInfo.
when someone clicks the edit link.. $(this).attr('data-rowId') will give the rowId and employeeInfo[$(this).attr('data-rowId')] should give all the information about the employee.
instead of storing the employeeid and looping over the employee table to find the matching employeeid, the rowid should do the trick. this is very simple. but did not strike me.

I would suggest you make an AJAX call to the controller. Because of two main reasons
It is not advisable to handle Database actiity in javascript due to security issues.
Javascript runs on client side machine it should have the least load and computation.
Javascript should be as light as possible. So i suggest you do it in the database itself.

Don't count on JavaScript performance, because it is heavily depend on computer that is running on. I suggest you to store and search on server-side rather than loading heavy payload of data in Browser which is quite restricted to resources of end-user.
Running long loops in JavaScript can lead to an unresponsive and irritating UI. Use Ajax calls to get needed data as a good practice.

Are you using HTML5? Will your users typically have relatively fast multicore computers? If so, a web-worker (http://www.w3schools.com/html/html5_webworkers.asp) might be a way to offload the search to the client while maintaining UI responsiveness.
Note, I've never used a Worker, so this advice may be way off base, but they certainly look interesting for something like this.

In terms of separation of concerns, and recommended best approach, you should be handling that domain-level data retrieval on your server, and relying on the client-side for processing and displaying only the records with which it is concerned.
By populating your client with several thousand records for it to then parse, sort, search, etc., you not only take a huge performance hit and diminish user experience, but you also create many potential security risks. Obviously this also depends on the nature of the data in the application, but for something such as employee records, you probably don't want to be storing that on the client-side. Anyone using the application will then have access to all of that.
The more pragmatic approach to this problem is to have your controller populate the client with only the specific data which pertains to it, eliminating the need for searching through many records. You can also retrieve a single object by making an ajax query to your server to retrieve the data. This has the dual benefit of guaranteeing that you're displaying the current state of the DB, as well as being far more optimized than anything you could ever hope to write in JS.

We Keep Coding

JavaScript is the programming language of the Web.