We want to build a Node.js API that stores schema-less documents in a MongoDB collection. Every document should have a key "no" which orders them in a sequence:
[
{ "no": 1, ... },
{ "no": 2, ... },
{ "no": 3, ... },
{ "no": 4, ... },
...
]
We have the following constraints:
The sequencing, including other parameters, need to be cryptographically signed. Therefore, the server cannot set a sequence number that the client does not know before signing and sending the data.
no must be unique (Not allowed: 1 -> 1 -> 2 -> 3)
There must not be any gaps in the sequencing (Not allowed: 1 -> 2 -> 4 -> 5)
The API is replicated, so there will be a lot of concurrent requests against MongoDB.
The API client is not not a browser application, it actually is a Node.js application as well. There will be only one API client
Our starting point is to have an API that on every storage request returns the next sequence number.
POST /collection { "no": 1, ...}
returns {"next": 2}
Will this work?
On the client side, it could be something like this pseudo code:
let next
module.exports.create = (document, cb) => {
if (!next) next = 1 // here it is probably better to sync the initial no with the db instead always starting with 1
document.no = next
return post('/collection', document, (err, res) => {
if (err) ...
next = res.next
return cb(...)
}
}
If create on the client side is called by many concurrent callers, can there be a case where two or more create requests have duplicate no's?
When you have concurrent API calls and the client has the right to determine the sequencing, it's pretty much impossible for you to achieve what you're trying to if the client has the full rights to determining the sequence.
However, if the sequence no cannot have any gap in between and it must be sequential, why must the client be the one providing the sequence no? You could have easily agreed on a sequencing pattern (e.g. 1,2,3,4 or AB1, AB2, AB3, etc) and let the server side do the inserting of sequence no depending on which request comes in first. Make a collection that generates running number using findAndModify and let server update the sequence no into the database instead.
Related
Context:
My Express.js web server is currently serving an API which wraps a SOAP service (some legacy service which I can't change). The SOAP service takes a dynamic number of items to process and takes about 1.5 seconds to process each request. The Nginx server has a timeout of 60 seconds.
Problem:
For a request to this API which e.g. lets say takes more than 60 seconds to complete, I am observing that the service is getting re-triggered automatically (I am assuming by Express.js). So if in the original request I was expecting to insert lets say 50 records to a table, now due to the re-triggering of the API I am ending up with 100 records inserted (duplication).
Here is a skeleton/sample of log that kind of shows the issue: (sensitive info stripped)
January 10, 2022 15:35:44 [... ee905] - Starting myAwesomeAPI() <-- Original API trigger
January 10, 2022 15:36:44 [... ff870] - Starting myAwesomeAPI() <-- Re-trigger happens
January 10, 2022 15:36:54 [... ee905] - Completed myAwesomeAPI() <-- Original API ends (inserts 50 records in the table)
January 10, 2022 15:37:54 [... ff870] - Completed myAwesomeAPI() <-- Re-triggered API ends (inserting another 50 records in the table resulting in duplication)
What I have tried:
To reproduce the issue and check if the re-triggering can be independent of nginx. With the Nginx timeout set to 60 seconds, I changed my Express server's timeout to 10 seconds and 15 items to process (to force timeout before processing can be complete) using this:
const express = require("express")
const server = express()
server.setTimeout(10000) <-- sets all requests to have a 10 seconds timeout
// myAwesomeAPI code
Testing showed that after 10 seconds, the timeout "did" re-trigger the API and the 15 items were duplicated (I saw 30 records inserted). So this tells me that the API is getting re-triggered by Express.js.
Question(s):
How to stop the re-trigger from happening, is there an express server configuration to enable/disable the auto re-triggering on timeout?
Solutions & Ideas:
Since the max items = 100 (set by team), increasing the Nginx and Express.js timeout to 300 seconds should be a quick but dirty fix. I understand that tying async API calls to some approximation of time is pure foolishness (tell me about trying to explain this to other engineers in my team ;-p), so I would like to avoid this approach.
Create a composite key with some combination of columns and enforce the insert restrictions on the table. Combine this with checking if the composite key is already inserted/present in the table and decide to skip/insert. This approach seems a bit better .
Another approach can be to respond back to the API call immediately on receipt (which will close the request) and then continue with the request processing. Something like this (inspiration): https://www.bennadel.com/blog/3275-you-can-continue-to-process-an-express-js-request-after-the-client-response-has-been-sent.htm.
This will make me independent of platform's timeout settings but will take away the real-time nature of the response being delivered with statuses for different items and add a bit more complexity of tracking the request statuses via other lookups etc.
If you have the ability to alter the front end you can add a transaction ID to it. Store the transaction routine in an object linked to the transaction ID, then if you get an API request for an ongoing transaction you can refer to the ongoing transaction.
Something like this:
let transactions = {};
router.get('/myapi', async (req,res,next) => {
try {
let {transactionID} = req.params;
delete(req.params.transactionID);
let transaction = transactions[transactionID];
if(!transaction) {
transaction = (async () => {
let ret = await SOAPCall(req.params);
// hold onto the transaction for some period of time
let to = setTimeout(()=>{
delete(transactions[transactionID]);
}, 5000);
to.detach(); // don't hold up process exit
return ret;
})();
transactions[transactionID] = transaction;
}
let ret = await transaction;
res.json(ret);
}
catch(err) { next(err) }
});
I'm developing a web app using a Node.js/express backend and MongoDB as a database.
The below example is for an admin dashboard page where I will display cards with different information relating to the users on the site. I might want to show - on the sample page - for example:
The number of each type of user
The most common location for each user type
How many signups there are by month
Most popular job titles
I could do this all in one route, where I have a controller that performs all of these tasks, and bundles them as an object to a url that I can then pull data from using ajax. Or, I could split each task into its own route/controller, with a separate ajax call to each. What I'm trying to decide is what are the best practices around making multiple ajax calls on a single page.
Example:
I am building up a page where I will make an interactive table using DataTables for different types of user ( currently have two: mentors and mentees). This example requires just two data requests (one for each user type), but my final page will be more like 10.
For each user type, I am making an ajax get call for each user type, and building the table from the returned data:
User type 1 - Mentees
$.get('/admin/' + id + '/mentees')
.done(data => {
$('#menteeTable').DataTable( {
data: data,
"columns": [
{ "data": "username"},
{ "data": "status"}
]
});
})
User type 2 - Mentors
$.get('/admin/' + id + '/mentors')
.done(data => {
$('#mentorTable').DataTable( {
data: data,
"columns": [
{ "data": "username"},
{ "data": "position"}
]
});
})
This then requires two routes in my Node.js backend:
router.get("/admin/:id/mentors", getMentors);
router.get("/admin/:id/mentees", getMentees);
And two controllers, that are structured identically (but filter for differnt user types):
getMentees(req, res, next){
console.log("Controller: getMentees");
let query = { accountType: 'mentee', isAdmin: false };
Profile.find(query)
.lean()
.then(users => {
return res.json(users);
})
.catch(err => {
console.log(err)
})
}
This works great. However, as I need to make multiple data requests I want to make sure that I'm building this the right way. I can see several options:
Make individual ajax calls for each data type, and do any heavy lifting on the backend (e.g. tally user types and return) - as above
Make individual ajax calls for each data type, but do the heavy lifting on the frontend. In the above example I could have just as easily filtered out isAdmin users on the data returned from my ajax call
Make fewer ajax calls that request less refined data. In the above example I could have made one call (requiring only one route/controller) for all users, and then filtered data on the frontend to build two tables
I would love some advice on which strategy is most efficient in terms of time spent sourcing data
UPDATE
To clarify the question, I could have achieved the same result as above using a controller setup something like this:
Profile.find(query)
.lean()
.then(users => {
let mentors = [],
mentees = []
users.forEach(user => {
if(user.accountType === 'mentee') {
mentees.push(user);
} else if (user.accountType === 'mentor') {
mentors.push(user);
}
});
return res.json({mentees, mentors});
})
And then make one ajax call, and split the data accordingly. My question is: which is the preferred option?
TL;DR: Option 1
IMO I wouldn't serve unprocessed data to the front-end, things can go wrong, you can reveal too much, it could take a lot for the unspecified client machine to process (could be a low power device with limited bandwidth and battery power for example), you want a smooth user experience, and javascript on the client churning out information from a mass of data would detract from that. I use the back-end for the processing (prepare the information how you need it), JS for retrieving and placing the information (AJAX) on the page and things like switching element states, and CSS for anything moving around (animations and transitions etc) as much as possible before resorting to JS.
Also for the routes, my approach would be each distinct package of information (dataTable) has a route, so you're not overloading a method with too many purposes, keep it simple and maintainable. You can always abstract away anything that's identical and repeated often.
So to answer your question, I'd go with Option 1.
You could also offer a single 'page-load' endpoint, then if anything changes update the individual tables later using their distinct endpoints. This initial 'page-load' call could collate the information from the endpoints on the backend and serve as one package of data to populate all tables initially. One initial request with one lot of well-defined data, then the ability to update an individual table if the user requests it (or there is a push if you get into that).
It is really good question. First of all you should realize how your application will manage with received data. If it is huge amount of data that are not changed on fronend but with different views and whole data needs for these views it might be cached into frontend (like user settings data - application always reads it but rare changes) then you could follow with your second options. Other case if frontend works only with small part of huge amount of database data (like log data for specific user) it is preferably to preprocess (filtering) on server side your first and third options. Actually second options is preferable only for caching unchanged data on frontend as for me.
After clarifying the question you could use grouping for your request and lodash library:
Profile.find(query)
.lean()
.then(users => {
let result = [];
result = _(users)
.groupBy((elem) => elem.accountType)
.map((vals, key) => ({accountType: key, users: vals}))
.value();
});
return res.json(result);
});
Certainly you could map your data as you comfortable. This way allows to get all types of accounts (not only 'mentee' and 'mentor')
Usually there are 3 things in such architectures:
1. Client
2. API Gateway
3. Micro services (Servers)
In your case :
1. Client is JS application code
2. API Gateway + Server is Nodejs/express (Dual responsibility)
Point 1 to be noted
Servers only provides core APIs. So this API for a server should be only a user api like:
/users?type={mentor/mentee/*}&limit=10&pageNo=8
i.e anyone can ask for all data or filtered data using type query string.
Point 2 to be noted
Since Web pages are composed of multiple data points and making call for every data point to the same server increases the round trip and makes the UX worse, API gateways are there. So in this case JS would not directly communicate with core server, it communicates with API Gateway with and APIs like:
/home
The above API internally calls below APIs and aggregates the data in a single json with mentor and mentee list
/users?type={mentor/mentee/*}&limit=10&pageNo=8
This API simply passes the call to core server with query attributes
Now since in your code, API gateway and Core server is merged into single layer, this is how you should setup your code:
getHome(req, res, next){
console.log("Controller: /home");
let queryMentees = { accountType: 'mentee', isAdmin: false };
let queryMentors = { accountType: 'mentor', isAdmin: true };
mentes = getProfileData(queryMentees);
mentors = getProfileData(queryMentors);
return res.json({mentes,mentors});
}
getUsers(req, res, next){
console.log("Controller: /users");
let query = {accountType:request.query.type,isAdmin:request.query.isAdmin};
return res.json(getProfileData(query));
}
And a common ProfileService.js class with a function like:
getProfileData(query){
Profile.find(query)
.lean()
.then(users => {
return users;
})
.catch(err => {
console.log(err)
})
}
More info about API Gateway Pattern here
If you can't estimate how many types need on your app then needs to be use parameters,
If I wrote like this application I don't write multiple function for calling ajax and don't write multiple route and controller,
Client side like this
let getQuery = (id,userType)=>{
$.get('/admin/' + id + '/userType/'+userType)
.done(data => {
let dataTable = null;
switch(userType){
case "mentee":
dataTable = $('#menteeTable');
break;
case "mentor":
dataTable = $('#mentorTable');
break;
//.. you can add more selector for datatables but I wouldn't prefer this way you can generate "columns" property on server like "data" so meaning that you can just use one datatable object on client side
}
dataTable.DataTable( {
data: data,
"columns": [
{ "data": "username"},
{ "data": "status"}
]
});
})
}
My prefer for client side
let getQuery = (id,userType)=>{
$.get('/admin/' + id + '/userType/'+userType)
.done(data => {
$('#dataTable').DataTable( {
data: data.rows,
"columns": data.columns
]
});
})
}
Server response should support {data: [{}...], columns:[{}....]} like this on this scenario Datatables examples
Server side like this
Router just one
router.get("/admin/:id/userType/:userType", getQueryFromDB);
Controller
getQueryFromDB(req, res, next){
let query = { accountType: req.params.userType, isAdmin: false };
Profile.find(query)
.lean()
.then(users => {
return res.json(users);
})
.catch(err => {
console.log(err)
})
}
So main meaning about your question for me that mentees, mentors etc... are parameters like as "id"
make sure that your authentication checked which users have access userType data for both code samples mine and your code, someone can reach your data with just change routing
Have a nice weekend
from performance and smoothness of ui on user device:
Sure it would be better to do 1 ajax request for all core data (which is important to show as soon as possible), and possibly perform more requests for less priority data with some tiny delay. Or do 2 requests: one for 'fast' data and another for 'slow' (if this is applicable) because:
On one hand, many ajax requests could slowdown ui there could be a limitation for amount of ajax requests getting done at same time (it is browser dependent an could be from 2 to 10) so if for ex. in ie there will be limit of 2 then with 10 ajaxes there will be an queue of waiting ajax requests
But on the other hand if there is much data to show or some data takes longer to prepare it could result in long waiting for backend response to show something.
Talking of heavy lifting: It is not good to make such things on UI side anyway, because:
User device can be not good with resources and 'slow'.
Javascript is synchronous and as a consequence, any long loop 'freeze' UI for time it required to run that loop.
Talking of filtering users:
Profile.find(query)
.lean()
.then(users => {
let mentors = [],
mentees = []
users.forEach(user => {
if(user.accountType === 'mentee') {
mentees.push(user);
} else if (user.accountType === 'mentor') {
mentors.push(user);
}
});
return res.json({mentees, mentors});
})
seems to have one problem, possibly query will have sortings and limits, if so final result will be inconsistent, it possibly end up with only mentees or only mentors, i think you should do 2 separate queries to data storage anyways
from project structuring, maintainability, flexibility, reusability, and so on, of course it is good to decouple things as much as possible.
So, finally, imagine you made:
1. many microservices like for each widget 1 backend microcervice but there is a layer which allows to aggregate results to optimize traffic from UI in 1-2 ajax query.
2. many ui modules each working with own data, received from some service, which do 1-2 calls for aggregating backend and distributes different datasets it recieved to many frontend modules.
At back end just make one dynamic parametric method API. you can pass mentor, mentee,admin etc as role.you should have some type of user authentication and authorization to check if user a can see users in role B or not.
Regarding UI its up to user they want one page with drop-down filter or they want URLs to bookmark.
Like multiple url /admin /mentor etc.
or one url with querystring and dropdown./user?role=mentor,/user?role=admin.
Based on url you have to make controllers. I generally prefer drop down and fetch data (by default all mentors might be the selection).
This is a specific invitation suited for invitations of a romantic nature (e.g. dates or engagement parties).
Using the Random package or any other techniques, is there a way to pick the same item from an array on the Client and the Server (for optimistic UI purposes)?
Example:
Meteor.methods({
myMethod(): {
var item = Random.choice([1, 2, 3, 4]); // Should return 2 on both client and server
},
});
I know that Meteor uses a specific seed for every method invocation to generate the same _id for a document from the Client and the Server. So there must be a way to accomplish this!
I'm facing a problem implementing data-synchronization between a server and multiple clients.
I read about Event Sourcing and I would like to use it to accomplish the syncing-part.
I know that this is not a technical question, more of a conceptional one.
I would just send all events live to the server, but the clients are designed to be used offline from time to time.
This is the basic concept:
The Server stores all events that every client should know about, it does not replay those events to serve the data because the main purpose is to sync the events between the clients, enabling them to replay all events locally.
The Clients have its one JSON store, also keeping all events and rebuilding all the different collections from the stored/synced events.
As clients can modify data offline, it is not that important to have consistent syncing cycles. With this in mind, the server should handle conflicts when merging the different events and ask the specific user in the case of a conflict.
So, the main problem for me is to dertermine the diffs between the client and the server to avoid sending all events to the server. I'm also having trouble with the order of the synchronization process: push changes first, pull changes first?
What I've currently built is a default MongoDB implementation on the serverside, which is isolating all documents of a specific user group in all my queries (Currently only handling authentication and server-side database work).
On the client, I've built a wrapper around a NeDB store, enabling me to intercept all query operations to create and manage events per-query, while keeping the default query behaviour intact. I've also compensated for the different ID systems of neDB and MongoDB by implementing custom ids that are generated by the clients and are part of the document data, so that recreating a database won't mess up the IDs (When syncing, these IDs should be consistent across all clients).
The event format will look something like this:
{
type: 'create/update/remove',
collection: 'CollectionIdentifier',
target: ?ID, //The global custom ID of the document updated
data: {}, //The inserted/updated data
timestamp: '',
creator: //Some way to identify the author of the change
}
To save some memory on the clients, I will create snapshots at certain amounts of events, so that fully replaying all events will be more efficient.
So, to narrow down the problem: I'm able to replay events on the client side, I'm also able to create and maintain the events on the client and serverside, Merging the events on serverside should also not be a problem, Also replicating a whole database with existing tools is not an option as I'm only syncing certain parts of the database (Not even entire collections as the documents are assigned different groups in which they should sync).
But what I am having trouble with is:
The process of determining what events to send from the client when syncing (Avoid sending duplicate events, or even all events)
Determining what events to send back to the client (Avoid sending duplicate events, or even all events)
The right order of syncing the events (Push/Pull changes)
Another Question I would like to ask, is whether storing the updates directly on the documents in a revision-like style is more efficient?
If my question is unclear, duplicate (I found some questions, but they didnt help me in my scenario) or something is missing, please leave a comment, I will maintain it as best as I can to keep it simple, as I've just written everything down that could help you understand the concept.
Thanks in advance!
This is a very complex subject, but I'll attempt some form of answer.
My first reflex upon seeing your diagram is to think of how distributed databases replicate data between themselves and recover in the event that one node goes down. This is most often accomplished via gossiping.
Gossip rounds make sure that data stays in sync. Time-stamped revisions are kept on both ends merged on demand, say when a node reconnects, or simply at a given interval (publishing bulk updates via socket or the like).
Database engines like Cassandra or Scylla use 3 messages per merge round.
Demonstration:
Data in Node A
{ id: 1, timestamp: 10, data: { foo: '84' } }
{ id: 2, timestamp: 12, data: { foo: '23' } }
{ id: 3, timestamp: 12, data: { foo: '22' } }
Data in Node B
{ id: 1, timestamp: 11, data: { foo: '50' } }
{ id: 2, timestamp: 11, data: { foo: '31' } }
{ id: 3, timestamp: 8, data: { foo: '32' } }
Step 1: SYN
It lists the ids and last upsert timestamps of all it's documents (feel free to change the structure of these data packets, here I'm using verbose JSON to better illustrate the process)
Node A -> Node B
[ { id: 1, timestamp: 10 }, { id: 2, timestamp: 12 }, { id: 3, timestamp: 12 } ]
Step 2: ACK
Upon receiving this packet, Node B compares the received timestamps with it's own. For each documents, if it's timestamp is older, just place it in the ACK payload, if it's newer place it along with it's data. And if timestamps are the same, do nothing- obviously.
Node B -> Node A
[ { id: 1, timestamp: 11, data: { foo: '50' } }, { id: 2, timestamp: 11 }, { id: 3, timestamp: 8 } ]
Step 3: ACK2
Node A updates it's document if ACK data is provided, then sends back the latest data to Node B for those where no ACK data was provided.
Node A -> Node B
[ { id: 2, timestamp: 12, data: { foo: '23' } }, { id: 3, timestamp: 12, data: { foo: '22' } } ]
That way, both node now have the latest data merged both ways (in case the client did offline work) - without having to send all your documents.
In your case, your source of truth is your server, but you could easily implement peer-to-peer gossiping between your clients with WebRTC, for example.
Hope this helps in some way.
Cassandra training video
Scylla explanation
I think that the best solution to avoid all the event order and duplication issues are to use the pull method. In this way every client maintains its last imported event state (with a tracker for example) and ask the server for the events generated after that last one.
An interesting problem will be to detect the breaking of business invariants. For that you could store on the client the log of applied commands also and in case of a conflict (events were generated by other clients) you could retry the execution of commands from the command log. You need to do that because some commands will not succeed after re-execution; for example, a client saves a document after other user deleted that document in the same time.
I'm creating a mock application with JSON server as the backend and I'm wondering if it is possible to get the total number of records contained at an end point without loading all the records themselves? Assuming the db.json file looks like the JSON snippet below, how would I find out that the end point only has one record without fetching the record itself, provided it's possible?
{
"books": [{
"title": "The Da Vinci Code",
"rating": "0"}]
}
You can simply retrieve the X-Total-Count header
This is a screen-shot of a response headers returned by JSON Server when enabling pagination i.e using the _page parameter (e.g. localhost:3000/contacts?_page=1)
Whenever you fetch the data, json-server actually returns the total count by default (it has an x-total-count property:
Example:
axios
.get("http://localhost:3001/users", {
params: {
_page: 1,
_limit: 10
}
})
.then(res => {
console.log(res.data); // access your data which is limited to "10" per page
console.log(res.headers["x-total-count"]); // length of your data without page limit
});
You've three options. I'd recommend the 3rd one to you:
Return all the records and count them. This could be slow and send a lot of data over the wire but probably is the smallest code change for you. It also opens you up to attacks where people can hammer your server by requesting many records repeatedly.
Add a new endpoint. You could add a new endpoint that simply returns the count. It's simple but slightly annoying having a 2nd endpointime to document and maintain.
Modify the existing endpoint. Return something like
{
count: 157,
rows: [...data]
}
The benefit of 3 is its all in one endpoint. It also nears you toward a point where you can add a skip and take parameter in future to allow pagination of the resultant data.
You will write another end point that returns number of records. Usually also you may want end point for limit and offset to be used with pagination.
let response = await fetch("http://localhost:3001/books?_page=1");
let total = response.headers.get('X-Total-Count');