I am currently working on a project where we use microservice architecture. I am also somewhat new to this architecture and have had a few concerns. I understand the concept of microservices in general and also how we can have one database per service. This brings me to a point where I get confused on how to pull data from different databases for a particular user.
Scenario
Assuming I have a Users and a Posts service with their schema like this
User
const schema = {
name: String
id: String
...
}
Post
const schema = {
text: String
user: Id // reference of the user who made this post.
}
Now on the UI, I want to load a set of posts and the associated users who made the post, how do I get a Post alongside the User who made the respective Post. I am using MongoDB, how do I populate data that are stored in other databases? I am also using Kafka handle async operations, how do I leverage Kafka for this usecase? Or is there a much better way of doing this? The final response of a Post could be something like this.
{
text: 'Some random message',
user: {
name: 'John Doe',
id: 1234
}
}
Also, I know I could make a call to the User service to get the User, then make a call to the Post service to get the Post and merge both objects together, but is there a much better option than this basically? I am thinking in cases where I want to do multiple lookups for a user, e.g to get a User and their associated Posts, Messages, etc, how can I handle scenarios like this, are their any techniques I could leverage for situations like this?
Thank you in advance!
I think your issue is service boundaries are too granular. I would recommend aligning your services to bounded contexts (https://martinfowler.com/bliki/BoundedContext.html). For example if you have a "blog" service with posts and users, its quite alright for the blog service to contain both a mongo and relational database for the different models.
Then you ask the service "give me posts for a user" and it is responsible for combining that data as part of its logic.
If you MUST keep them separate (which i would not recommend for the exact problem you are having) then I would keep a lightweight cache of usernames inside the posts service.
Use that to populate the usernames into the post when you return one. You can either update the cache on a regular basis using events, polling, batches. Or just query the user service on a cache-miss.
When dealing with distributed systems you cannot rely on consistency and synchronous, stable communication like you can in a monolith.
Related
I have a dilemma on how to solve possible redundant data querying.
I am using MongoDB with Apollo server and client. My MongoDB has several collections of data. The main collection consists of IDs pointing to supporting collections.
I am not sure about how to solve the mapping of IDs of my main collection to supporting collections IDs to retrieve the actual values. The thing is that mostly I already have data of supporting collections cached in Apollo client cache.
Do you think I should only query the IDs in my main collection and map IDs to values on the frontend using cached data? Or should I have a resolver that takes IDs in main collection, makes database queries to supporting collections to get value for each ID and then sends prepared data to frontend?
I appreciate any insight! Thank you.
As always, it depends. I assume that this is your setup, with a main collection.
type OtherDoc {
id: String
field: String
}
type MainDoc {
id: String
otherDocs(param: String): [OtherDoc]
}
type Query {
mainDocs: [MainDoc]
}
In such case, querying for mainDocs { id otherDocs("...") { id field } } is definitely a natural way to get this data. It might be redundant, in terms of getting OtherDoc when different param result in the same docs. If so, you may think about querying only their IDs and then querying for separate docs, if the client doesn't have them.
I'd say it's a valid solution, but definitely not something you should consider from the beginning. This optimization will definitely limit the bandwidth, but increase the number of requests. What is more, you don't know when to actually refetch OtherDoc. Well, maybe you do, but you have to think about and build it, where without you have it out-of-the-box.
A different approach, a more cache-friendly one, may change the schema to limit such situations, where your data overlap. This is not always possible due to the business logic, but worth considering if it is.
Context
Hi! I made something like graphql but with just Sequelize. I mean, Sequelize query options are JSON objects so, the client could send the options directly (with correct sanitization).
What I have done
Just for curiosity, I built that, and it works just fine. Now my doubt is: how bad is that?
this is an example of the client using this API
const res = await http.post(APIs.FINDER, {
model: 'User',
options: {
where: {
id: someId
},
attributes: ['name', 'active']
},
include: [
{
as: 'zone',
attributes: ['name']
}
],
order: [['createdAt', 'DESC']]
});
nice right?
Sanitization/Constraints
About sanitization, I have to:
check that the includes have a known limit, eg.: no more than 10 nested includes
check that the params are not SQL strings or other hacks (Sequelize take care about that)
don't allow Sequelize functions, just simple queries
Questions
with that in mind, I think this could be used in production.
Have I missed something that could reject this idea from production usage? (security/usage/etc)
Have graphql some specific feature that makes me prefer it against this custom solution?
Would you use it in a production environment? I can't imagine why not
My thought to the questions:
I don't recommend this style of API. It will expose your backend implementation to the public which make you have difficulty dealing with every security conditions, not to mention the business logic and authorization. Also, it would be hard to improve your performance because the behavior is tightly coupled with the sequelize package.
You can consider this post: GraphQL Mutation Design: Anemic Mutations. A good GraphQL API should be driven by domain and requirement instead of be driven by data.
NO! I've experienced a hard time dealing with this api style.
Actually, this is not a good idea. If you are organizing an one-man full-stack project, it may seem fast in the first place, but the cost of development would skyrocket until you cannot move on. If you are working as a group, you can notice that the client side is tightly coupled with the server side, which is very bad for developing.
In the client side, it only need a finite set of apis instead of apis with infinite possibilities.
In the server side, you can do nothing but hand the data over to sequelize, which make it hard to improve your performance, to add logic layer and to introduce another database system like elastic search into your codebase.
When it comes to designing an API, you can consider Domain Driven Design which known as DDD. It's preferred to use GET Friends?limit=10 api than to use GET { type: 'User", where: ..., include: ..., order: ..., limit: 10 }
By the way, GraphQL is not just a query language, it is a thin API layer in essence (ref). So don't use it as a JSON database but treat it as an API which focuses on the business need.
For example, here is a User model:
{
"User": {
"id": 1,
"name": "Mary",
"age": 20,
"friendIds": [2, 3, 4]
}
}
But in GraphQL, based on what you need, it may become :
type User {
id: ID
name: String
friends: [User]
friendCount: Int
friendsOverAge18: [User]
}
You can read this great article about how to design a GraphQL API: Shopify/graphql-design-tutorial
I have the following react-apollo-wrapped GraphQL query:
user(id: 1) {
name
friends {
id
name
}
}
As semantically represented, it fetches the user with ID 1, returns its name, and returns the id and name of all of its friends.
I then render this in a component structure like the following:
graphql(ParentComponent)
-> UserInfo
-> ListOfFriends (with the list of friends passed in)
This is all working for me. However, I wish to be able to refetch the list of friends for the current user.
I can do this.props.data.refetch() on the parent component and updates will be propagated; however, I'm not sure this is the best practice, given that my GraphQL query looks something more like this,
user(id: 1) {
name
foo1
foo2
foo3
foo4
foo5
...
friends {
id
name
}
}
Whilst the only thing I wish to refetch is the list of friends.
What is the best way to cleanly architect this? I'm thinking along the lines of binding an initially skipped GraphQL fetcher to the ListOfFriends component, which can be triggered as necessary, but would like some guidance on how this should be best done.
Thanks in advance.
I don't know why you question is downvoted because I think it is a very valid question to ask. One of GraphQL's selling points is "fetch less and more at once". A client can decide very granually what it needs from the backend. Using deeply nested graphlike queries that previously required multiple endpoints can now be expressed in a single query. At the same time over-fetching can be avoided. Now you find yourself with a big query, everything loads at once and there are no n+1 query waterfalls. But now you know that a few fields in your big query are subject to change from now and then and you want to actively update the cache with new data from the server. Apollo offers the refetch field but it loads the whole query which clearly is overfetching that was sold to me as not being a concern anymore in GraphQL. Let me offer some solutions:
Premature Optimisation?
The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming. - Donald Knuth
Sometimes we try to optimise too much without measuring first. Write it the easy way first and then see if it is really an issue. What exactly is slow? The network? A particular field in the query? The sheer size of the query?
After you analized what exactly is slow we can start looking into improving:
Refetch and include/skip directives
Using directives you can exclude fields from a query depending on variables. The refetch function can specify different variables than the initial query. This way you can exclude fields when you refetch the query.
Splitting up Queries
Single page apps are a great idea. The HTML is generated client side and the page does not have to make expensive trips to the server to render a new page. But soon SPAs got to big and code splitting became an issue. And now we are basically back to server side rendering and splitting the app into pages. The same might apply to GraphQL. Sometimes queries are too big and should be split. You could split up the queries for UserInfo and ListOfFriends. Inside of the cache the fields will be merged. With query batching both queries will be send in the same request and a GraphQL server that implements per request resource caching correctly (e.g. with Dataloader) will barely notice a difference.
Subscriptions
Maybe you are ready to use subscriptions already. Subscriptions send updates from the server for fields that have changed. This way you could subscribe to a user's friends and get updates in real time. The good news is that Apollo Client, Relay and many server implementations offer support for subscriptions already. The bad news is that it needs websockets that usually put different requirements on your technology stack than pure HTTP.
withApollo() -> this.client.query
This should only be your last resort! Using react-apollo's withApollo higher order component you can directly inject the ApolloClient instance. You can now execute queries using this.client.query(). { user(id: 1) { friendlist { ... } } } can be used to just fetch the friend list and update the cache which will lead to an update of your component. This might look like what you want but can haunt you in later stages of the app.
I contacted Recurly and they don't want to offer any support on this.
I need to know the objects specific format otherwise Recurly will bounce if any extra data is sent or if you don't have the exact arguments for the query (which are different for each function).
Ruby:
subscription = Recurly::Subscription.find('uuid')
subscription.update_attributes(
:plan_code => 'silver',...
PHP, Pyton, XML = (also in the docs)
NodeJS = ?????????? (we have no clue)
Much thanks!!!
PS. This is the response I got from Recurly,
Hi Turk, Thank you for your note. Understanding of the question is
whether there is a Javascript equivalent for updating billing
information. The only supported options we can offer are as follows:
- using the Recurly.js V3 form (https://docs.recurly.com/js/), to create new accounts, add billing info, and create a subscription
- update billing info through the API's. (https://dev.recurly.com/docs/lookup-an-accounts-billing-info)
I hope this is helpful. Thank you again. Regards,
Ian Recurly Support
PSS. I also tried to contact their technicians via IRC - irc://chat.freenode.net:+6697/recurly but no luck again.
Recurly.js is Recurly's name for a front-end web technology that enables the merchant to easily build their subscription checkout and accept payments. Using this library, merchants can tokenize payment information, display price/tax previews, and validate customer inputs. Because it is client-side, it can NOT be used to create or modify accounts, subscriptions, transactions, etc.
In order to perfrom CRUD operations in Recurly, you must use its RESTful Web Services API, either directly or through one of the provided wrappers (in PHP, Ruby, Python, and C#.NET). Recurly itself does not provide a supported API wrapper for NodeJS, although a 3rd-party one exists and has been used successfully. This library contains functions to update an existing account and update an existing subscription.
To update account '1234', you would use:
recurly.accounts.update('1234', {email: 'fake#test.com'}, function(res){
console.log(res['data']);
})
The second argument to this function is a JSON representation of the values that are changing. Available parameters are listed here.
To update subscription '32d7e59b0def37ddfabbd54d1296145b', you would use:
recurly.subscriptions.update('32d7e59b0def37ddfabbd54d1296145b', { quantity: 5 }, function(res){
console.log(res['data']);
})
The second argument to this function is a JSON representation of the values that are changing. Available parameters are listed here.
-Charlie (Sales Engineer # Recurly)
ExtJS 4.1.0
Update 6/6/13:
I have posted this same question on the Sencha forums where there hasn't been much action. The post is more or less the same, but I figured I would add it here just for reference. I am still eager to hear other community members' input on what must be a very common scenario in an ExtJS Application!
http://www.sencha.com/forum/showthread.php?265358-Complex-Model-Save-Decoupling-Data-and-Updating-Related-Stores
Update 7/16/13 (Conclusion?)
The Sencha post garnered very little discussion. I have decided to put the majority of the load of complex save operations on my application server and lazily refresh client stores where need be. This way I can use my own Database wrapper to encompass all of the transactions associated with one complex Domain Object save to guarantee atomicity. If saving a new Order consists of saving the order metadata, ten new instances of OrderContents and potentially other information (addresses residing in other tables, a new customer defined at the time of order creation, etc.) I would much rather send the payload to the application server, rather than establish a vulgar web of callbacks in client-side application code. Data which is associated on a One-to-One basis (such as an Order hasOne Address) is updated in the success callback of the Order.save() operation. More complex data, such as the Order's contents, is lazily handled by simply calling contentStore.sync(). I feel that this is the means to guarantee atomicity without an overwhelming number of client-callbacks
Original Post Content
Given the overall disappointing functionality of saving association-heavy models, I have all but ditched model associations in my application and rely retrieving associated data myself. This is all well and good, but unfortunately does not resolve the issue of actually saving the data and updating ExtJS stores to reflect the changes on the server.
Take for example saving an Order object, which is composed of metadata as well as OrderContents i.e., the parts on the order. The metadata ends up in an Order_Data table in the database, whereas the contents all end up in an Order_Contents table where each row is linked to the parent order via an order_id column.
On the client, retrieving the contents for an order is quite easy to do without any need for associations: var contents = this.getContentsStore().query('order_id', 10).getRange(). However, a major flaw is that this is hinging on the content records being available in the OrderContents ExtJS Store, which would apply if I were using associations NOT returned by the data server with the "main" object.
When saving an order, I send a single request which holds the order's metadata (e.g., date, order number, supplier information, etc.) as well as an array of contents. These pieces of data are picked apart and saved to their appropriate tables. This makes enough sense to me and works well.
All is well until it comes to returning saved/updated records from the application server. Since the request is fired off by calling a OrderObject.save(), there is nothing telling the OrderContents store that new records are available. This would be handled automatically if I were to instead add records to the store and call .sync(), but I feel this complicates the saving process and I would just much rather handle this decoupling on the application server not to mention, saving an entire request is quite nice as well.
Is there a better way to solve this? My current solution is as follows...
var orderContentsStore = this.getOrderContentsStore();
MyOrderObject.save({
success: function(rec, op){
// New Content Records need to be added to the contents store!
orderContentsStore.add(rec.get('contents')); // Array of OrderContent Records
orderContentsStore.commitChanges(); // This is very important
}
});
By calling commitChanges() the records added to the store are considered to be clean (non-phantom, non-dirty) and thus are no longer returned by the store's getModifiedRecords() method; rightly so as the records should not be passed to the application server in the event of a store.sync().
This approach just seems kinda sloppy/hacky to me but I haven't figured out a better solution...
Any input / thoughts are greatly appreciated!
Update 8/26/13 I found that associated data is indeed handled by Ext in the create/update callback on the model's proxy, but finding that data wasn't easy... See my post here: ExtJS 4.1 - Returning Associated Data in Model.Save() Response
Well, it's been a couple months of having this question open and I feel like there is no magically awesome solution to this problem.
My solution is as follows...
When saving a complex model (e.g., a model that would, or does have a few hasMany associations), I save the 'parent' model which includes all associated data (as a property/field on the model!) and then add the (saved) associated data in the afterSave/afterUpdate callback.
Take for example my PurchaseOrder model which hasMany Items and hasOne Address. Take note that the associated data is included in the model's properties, as it will not be passed to the server if it solely exists in the model's association store.
console.log(PurchaseOrder.getData());
---
id: 0
order_num: "PO12345"
order_total: 100.95
customer_id: 1
order_address: Object
id: 0
ship_address_1: "123 Awesome Street"
ship_address_2: "Suite B"
ship_city: "Gnarlyville"
ship_state: "Vermont"
ship_zip: "05401"
...etc...
contents: Array[2]
0: Object
id: 0
sku: "BR10831"
name: "Super Cool Shiny Thing"
quantity: 5
sold_price: 84.23
1: Object
id: 0
sku: "BR10311"
name: "Moderately Fun Paddle Ball"
quantity: 1
sold_price: 1.39
I have Models established for PurchaseOrder.Content and PurchaseOrder.Address, yet the data in the PurchaseOrder is not an instance of these models, rather just the data. Again, this is to ensure that it is passed correctly to the application server.
Once I have an object like described above, I send it off to my application server via .save() as follows:
PurchaseOrder.save({
scope: me,
success: me.afterOrderSave,
failure: function(rec,op){
console.error('Error saving Purchase Order', op);
}
});
afterOrderSave: function(record, operation){
var me = this;
switch(operation.action){
case 'create':
/**
* Add the records to the appropriate stores.
* Since these records (from the server) have an id,
* they will not be marked as dirty nor as phantoms
*/
var savedRecord = operation.getResultSet().records[0]; // has associated!
me.getOrderStore().add(savedRecord);
me.getOrderContentStore().add(savedRecord.getContents()); //association!
me.getOrderAddressStore().add(savedRecord.getAddress()); // association!
break;
case 'update':
// Locate and update records with response from server
break;
}
}
My application server receives the PurchaseOrder and handles saving the data accordingly. I will not go into gross details as this process is largely dependent on your own implementation. My application framework is loosely based on Zend 1.11 (primarily leveraging Zend_Db).
I feel this is the best approach for the following reasons:
No messy string of various model.save() callbacks on the client
Only one request, which is very easy to manage
Atomicity is easily handled on the application server
Less round trips = less potential points of failure to worry about
If you're really feeling lazy, the success method of the callback can simply reload stores.
I will let this answer sit for a bit to encourage discussion.
Thanks for reading!