I have a dilemma on how to solve possible redundant data querying.
I am using MongoDB with Apollo server and client. My MongoDB has several collections of data. The main collection consists of IDs pointing to supporting collections.
I am not sure about how to solve the mapping of IDs of my main collection to supporting collections IDs to retrieve the actual values. The thing is that mostly I already have data of supporting collections cached in Apollo client cache.
Do you think I should only query the IDs in my main collection and map IDs to values on the frontend using cached data? Or should I have a resolver that takes IDs in main collection, makes database queries to supporting collections to get value for each ID and then sends prepared data to frontend?
I appreciate any insight! Thank you.
As always, it depends. I assume that this is your setup, with a main collection.
type OtherDoc {
id: String
field: String
}
type MainDoc {
id: String
otherDocs(param: String): [OtherDoc]
}
type Query {
mainDocs: [MainDoc]
}
In such case, querying for mainDocs { id otherDocs("...") { id field } } is definitely a natural way to get this data. It might be redundant, in terms of getting OtherDoc when different param result in the same docs. If so, you may think about querying only their IDs and then querying for separate docs, if the client doesn't have them.
I'd say it's a valid solution, but definitely not something you should consider from the beginning. This optimization will definitely limit the bandwidth, but increase the number of requests. What is more, you don't know when to actually refetch OtherDoc. Well, maybe you do, but you have to think about and build it, where without you have it out-of-the-box.
A different approach, a more cache-friendly one, may change the schema to limit such situations, where your data overlap. This is not always possible due to the business logic, but worth considering if it is.
Related
Issue
I am trying to filter and order on two different document properties using firestore. I am aware that in the documentation it says it is not possible. But I am confident there must be a solution perhaps on the client-side.
Understanding
Imagine you have a bunch of articles. In these articles, there are properties such as views and time. Say I wanted to sort so I would get the most viewed article within a week. How would I create this since Firestore does not allow such a thing? Any way to handle it on the client-side or any other solution?
I know this is invalid but perhaps there would be a way to make this work:
admin.firestore().collection("articles").where("time", "<=", mintime).orderBy("views", "desc")
But I am confident there must be a solution perhaps on the client-side.
The client side solution is to query for all possible documents, then sort them in the app instead of depending on Firestore to do that sorting for you.
I want to store the comma separated ids on a child node & how can I filter data as in sql we can use IN clause to fetch data any possibility in firebase to perform this kind of operation in firebase database.
Please suggest any possible solution for this.
Firebase Realtime Database doesn't have the equivalent of SQLs IN clause. It also doesn't have a way to find a substring in a value. So the data model you are looking to use, doesn't allow the use-case you want. As usual with NoSQL databases, the solution is to pick a data model that does allow your use-case..
The most likely cause I know for the structure you describe is to associate the child node with a bunch of categories. If that is your case, read my answer here for a proper data structure: Firebase query if child of child contains a value
This is one of the cases where the new Cloud Firestore database offers better querying support, since it recently added a feature to efficiently test if an array contains a certain value (video). If you're only just getting started with your project, you might want to check if Firestore is a better fit for your use-cases.
I have the following react-apollo-wrapped GraphQL query:
user(id: 1) {
name
friends {
id
name
}
}
As semantically represented, it fetches the user with ID 1, returns its name, and returns the id and name of all of its friends.
I then render this in a component structure like the following:
graphql(ParentComponent)
-> UserInfo
-> ListOfFriends (with the list of friends passed in)
This is all working for me. However, I wish to be able to refetch the list of friends for the current user.
I can do this.props.data.refetch() on the parent component and updates will be propagated; however, I'm not sure this is the best practice, given that my GraphQL query looks something more like this,
user(id: 1) {
name
foo1
foo2
foo3
foo4
foo5
...
friends {
id
name
}
}
Whilst the only thing I wish to refetch is the list of friends.
What is the best way to cleanly architect this? I'm thinking along the lines of binding an initially skipped GraphQL fetcher to the ListOfFriends component, which can be triggered as necessary, but would like some guidance on how this should be best done.
Thanks in advance.
I don't know why you question is downvoted because I think it is a very valid question to ask. One of GraphQL's selling points is "fetch less and more at once". A client can decide very granually what it needs from the backend. Using deeply nested graphlike queries that previously required multiple endpoints can now be expressed in a single query. At the same time over-fetching can be avoided. Now you find yourself with a big query, everything loads at once and there are no n+1 query waterfalls. But now you know that a few fields in your big query are subject to change from now and then and you want to actively update the cache with new data from the server. Apollo offers the refetch field but it loads the whole query which clearly is overfetching that was sold to me as not being a concern anymore in GraphQL. Let me offer some solutions:
Premature Optimisation?
The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming. - Donald Knuth
Sometimes we try to optimise too much without measuring first. Write it the easy way first and then see if it is really an issue. What exactly is slow? The network? A particular field in the query? The sheer size of the query?
After you analized what exactly is slow we can start looking into improving:
Refetch and include/skip directives
Using directives you can exclude fields from a query depending on variables. The refetch function can specify different variables than the initial query. This way you can exclude fields when you refetch the query.
Splitting up Queries
Single page apps are a great idea. The HTML is generated client side and the page does not have to make expensive trips to the server to render a new page. But soon SPAs got to big and code splitting became an issue. And now we are basically back to server side rendering and splitting the app into pages. The same might apply to GraphQL. Sometimes queries are too big and should be split. You could split up the queries for UserInfo and ListOfFriends. Inside of the cache the fields will be merged. With query batching both queries will be send in the same request and a GraphQL server that implements per request resource caching correctly (e.g. with Dataloader) will barely notice a difference.
Subscriptions
Maybe you are ready to use subscriptions already. Subscriptions send updates from the server for fields that have changed. This way you could subscribe to a user's friends and get updates in real time. The good news is that Apollo Client, Relay and many server implementations offer support for subscriptions already. The bad news is that it needs websockets that usually put different requirements on your technology stack than pure HTTP.
withApollo() -> this.client.query
This should only be your last resort! Using react-apollo's withApollo higher order component you can directly inject the ApolloClient instance. You can now execute queries using this.client.query(). { user(id: 1) { friendlist { ... } } } can be used to just fetch the friend list and update the cache which will lead to an update of your component. This might look like what you want but can haunt you in later stages of the app.
I have been using C# to run operations on a DocumentDB instance and really like it so far. I have a lot of C# code that queries from multiple collections to create new collections from the relationships between the first two collections.
Can I essentially move my logic up to the server in stored procedures? I tried answering this question for myself, but all I could find was documentation on how to acquire the collection associated with the stored procedure. So then I thought, could I call a stored procedure that called another stored procedure, passing in the first collection?
Is there any way I can refer to multiple collections in a stored procedure somehow?
Would it be easier to find an easier way to know what belongs to each "collection" if I store everything in the same collection?
Stored procedures run inside of a single collection (or a single partition in a partitioned collection). A call to a stored procedure can only operate on the data in that collection/partition.
When I see this question asked, I usually wonder if you are thinking of collections as a direct analog to tables from the SQL world or even the use of the word "collection" from the MongoDB world. In DocumentDB it's best to not separate your data up by type but rather to mix data of different types in the same collection and separate along some other scale out boundary like tenant, user, geography, etc. If you do that, as long as your stored procedure doesn't need to cross that tenant, user, or geography boundary, it'll be able to provide you with fully ACID cross-document transactions.
In 10 Common Misconceptions about CouchDB, Joan Touzet is asked (30:16) if CouchDB will have a way to secure/validate reads on specific documents and/or specific fields of a document.
Joan says that if someone has access to the database, he/she can access all documents in that database.
So she says that there are a few ways to accomplish that:
(30:55) Cloudant was working on field level security access. Have they implemented it yet? Is it open-sourced?
(32:10) You should create separate document in a separate database.
(32:20) Filtered replications. She mentions that it slows 'things' down. She means that the filter slows the replication, correct?
Also, according to rcouch wiki (https://github.com/rcouch/rcouch/wiki/Validate-documents-on-read), it implements a validate_doc_read function (I haven't tested it, though). Does CouchDB has anything like it?
As far as I can see, the best approach is to model the database according to my problem (one database for this, another for that, one for this person, another for that person) and do filtered replications when necessary. Any suggestions?