I'm writing a node.js server script that uses a shared text list data for multiple clients asynchronously.
the clients can read, add or update items of this shared list.
static getitems(){
if (list== undefined) list = JSON.parse(fs.readFileSync("./list.json"));
return list;
}
static additem(newitem){
var key = Object.keys(newitem)[0];
list[key] = newitem[key];
fs.writeFileSync("./list.json", JSON.stringify(list));
}
clients can modify and get the list data using the following express APIs
app.get("/getlist"), (req, res)=>{
res.send(TempMan.getTemplates());
});
app.post("/addlist"), (req, res)=>{
additem(req.body.newitem)
res.status(204).end()
});
with long background in C#, C++ and other desktop programming languages, although I red javascript doesn't run into race condition, I am so worried resource sharing is going to be a problem. I was first thinking of semaphores or shared lock or some other multiple thread management solutions in other languages, but yet read javascript doesn't need such methods.
does such node.js implementation run to resource sharing problem such as simultaneous attempts of file read/write? how can I solve this? do I need some kind of transaction functions I can use in javascript?
Generally speaking, a Node.js program may encounter a resource sharing problem you call, usually, we call it "race condition" problems. It is not due to two threads/processes but it is due to the intrinsic property: async. Assume that there are two async functions, the first one has started but is not finished and it has some await inside, in this situation, the second async function can start. It may cause race conditions if they access the same resource in their code blocks.
I have made a slide to introduce this issue: https://slides.com/grimmer/intro_js_ts_task_queuelib_d4c/fullscreen#/0/12.
Go back to your example code, your code WILL NOT have any race conditions. Even you put any usage of async function inside express routing callback instead of fs.writeFileSync, the reason is that the implementation of Express will await the first async routing callback handler function and only starts to execute the second async routing callback handler function after the first one is finished.
For example:
app.post('/testing1', async (req, res) => {
// Do something here
});
app.post('/testing2', async (req, res) => {
// Do something here
});
is like the below code in the implementation of Express,
async function expressCore() {
await 1st_routing_call_back()
await 2nd_routing_call_back()
}
But please keep in mind that the other server frameworks may not have the same behavior. https://www.apollographql.com/ and https://nestjs.com/ both allow two async routing methods to be executed concurrently. Like below
async function otherServerFrameworkCore() {
1st_routing_call_back()
2nd_routing_call_back()
}
and you need to find a way to avoid race conditions if this is your concern. Either using transaction for DB usage or some npm synchronization libraries which are lightweight and suitable for single Node.js instance program, e.g. https://www.npmjs.com/package/d4c-queue which is made by me. Multi Node.js instances are multi-processes and should have possible race condition issues and DB transaction is a more suitable solution.
Related
I have an node-js application that I'm switching from a single-tenant database to a multi-tenant database. The application code is called from an express api but there are also services that run through a different entrypoints, so req.session is not always available.
Currently I have database function calls all throughout the app like:
database.select.users.findByUserId(123, callback)
Since the app is changing to multi-tenant database, I need to be able to send the postgreSQL schemaName to the database functions. I know I can edit the signature of every database call to this:
database.select.users.findByUserId(schemaName, 123, callback)
But it's very labor intensive, broad sweeping, and is going to create a lot of bugs. I'm hoping to find a safe way to pass the postgres schemaName to the database wrapper, without having a race condition of some kind where this "global" schemaName variable is somehow overwritten by another caller, thus sending the wrong data.
Here's some psuedo-code of what I'm considering writing, but I'm worried it wont be "thread-safe" once we deploy.
// as early as possible in the app call:
database.session.initSchema('schema123');
//session.js
let schema = null;
module.exports.initSchema = function (s) {
schema = s;
};
module.exports.getSchema = function () {
return schema;
};
// before I hit the database, i would call getSchema() and pass it to postgreSQL
This approach works, but what if Caller2 calls initForSchema() with different values while Caller1 hasn't finished executing? How can I distinguish which caller is asking for the data when using one variable like this? Is there any way for me to solve this problem safely without editing the signature of every database function call? Thanks for the advice.
edit
I'm leaning towards this solution:
database.session.initSchema('schema123');
//then immediately call
database.select.users.findByUserId(123, callback);
The advantage here is that nothing asynchonous happens between the two calls, which should nullify the race condition possibility, while keeping the original findByUserId signature.
I don't think doing what you're thinking will work because I don't see a way you're going to get around those race conditions. If you do:
app.use((request, response, next) => {
// initialize database schema
next()
})
It would be ideal because then you can do it only once across all routes, but another request might hit the server a millisecond later and it changes the schema again.
Alternatively you can do that in each separate route, which would work, but then you're doing just as much work as just doing it in the database call in the first place. If you have to reinitialize the schema in each route then it's the same thing as just doing it in the call itself.
I was thinking for a while about a solution and then best I can come up with is whether or not you can do it in the connection pool itself. I have no idea what package you're using or how it's creating DB connections. But something like this:
const database = connection.getInstance('schemaExample')
// continue to do database things here
Just to show an example of what I'm thinking. That way you can create multiple connection pools for the different schemas on startup and you can just query on the one with the correct schema avoiding all the race conditions.
The idea being that even if another request comes in now and uses a different schema, it will be executing on a different database connection.
I have a small development web server, that I use to write missing translations into files.
app.post('/locales/add/:language/:namespace', async (req, res) => {
const { language, namespace } = req.params
// I'm using fs.promises
let current = await fs.readFile(`./locales/${language}/${namespace}.json`, 'utf8')
current = JSON.parse(current)
const newData = JSON.stringify({ ...req.body, ...current }, null, 2)
await fs.writeFile(`./locales/${language}/${namespace}.json`, newData)
})
Obviously, when my i18n library does multiple writes into one file like this:
fetch('/locales/add/en/index', { body: `{"hello":"hello"}` })
fetch('/locales/add/en/index', { body: `{"bye":"bye"}` })
it seems like the file is being overwritten and only the result of the last request is saved. I cannot just append to the file, because it's JSON. How to fix this?
You will have to use some sort of concurrency control to keep two concurrent requests that are both trying to write to the same resources form interfering with each other.
If you have lots of different files that you may be writing to and perhaps multiple servers writing to it, then you pretty much have to use some sort of file locking, either OS-supplied or manually with lock files and have subsequent requests wait for the file lock to be cleared. If you have only on server writing to the file and a manageable number of files, then you can create a file queue that keeps track of the order of requests and when the file is busy and it can return a promise when it's time for a particular request to do its writing
Concurrency control is always what databases are particularly good at.
I have no experience with either of these packages, but these are the general idea:
https://www.npmjs.com/package/lockfile
https://www.npmjs.com/package/proper-lockfile
These will guarantee one at a time access. I don't know if they will guarantee that multiple requests are granted access in the precise order they attempted to acquire the lock. If you need that, you might have to add that on top with some sort of queue.
Some discussion of this topic here: How can I lock a file while writing to it asynchronously
I've got a Node.js / Express application where sometimes I need to perform a non critical async task that doesn't require waiting for the result (for example, a call to save some data in an analytics platform):
router.post("/", function (req, res, next) {
criticalTask()
.then(result => {
res.json({success: true});
nonCriticalTask();
})
.catch(next)
}
Is there a guarantee that the nonCriticalTask() gets executed completely without terminating it in the middle? Are there any restrictions on this?
In the end I couldn't find any documentation on this. After lots of experiments and logging, seems that nonCriticalTask() doesn't terminate and is executed by node and node doesn't exit if there are tasks still executing or handles are in use, e.g. DB connection is open.
So it seems to work for my nonCriticalTask() that does analytics. That being said, it's probably a bad design practice to rely on the node engine for anything critical running in the background like this, and other approaches should be considered, e.g. persistent queues.
Recently, I have been developing web application and I realize that I am not making use of the asynchronous property at all. Hence I am ending up with a lot of nested callbacks.
For example, if the user want to get a file from the server through a particular API, I will have code similar to this,
db.query(<select list of permitted files_names>, function(err, filenames) {
async.each(file_names, function(name, next) {
//open each file to put into array
});
})
This code needs to query database to get a list of file names before looping asynchronously and putting each file content into an array. Finally it will return the finished array to the client.
With the nested callback, and async library, this code is behaving like a synchronous code.
names = db.querySync(//select list of permitted files_names);
for(name in names) {
//open each file to put into array
}
I am better off writing synchronous code like this since it is much neater. My use case might be a little strange but most of my api behaves in similar manner and that makes me think why do I even need asynchronous function?
Can someone please enlighten me if there are any differences between these two codes in term of performance? How do I make use of non-blocking property to enhance the performance in this use case?
If you're writing callback functions you're using by definition using async calls. The callback function fires only when the operation is complete or has errored out. You don't need a fancy library to use these, this is the backbone of how Node's event-loop driven subsystem operates.
Node strongly advises against using "Sync" calls. The Node core only includes a handful as a convenience, they're there as last-resort tools. Many libraries don't even support them so you absolutely must get used to writing async code. In the browser environment, for example, you simply cannot use blocking calls without jamming up the JavaScript runtime and stalling the page.
I prefer using Promises line Bluebird implements to keep code orderly. There are other ways, like the async library, which can help manage otherwise complicated nesting patterns.
Some of the perks include things like Promise.all method runs a series of promises to completion and then triggers a next step, and Promise.map which iterates over a list, running async code for each element, then advancing when the list is complete.
If you're disciplined about organizing your code it's not too bad. Node does require a lot more attention being paid to the order of operations than in a traditional sync-by-default language like Ruby, Python or Java, but you can get used to it. Once you start working with async code rather than fighting it you can often do a ton of work quickly, efficiently, and with a minimum of fuss, in many cases more effectively than in other languages where you must juggle threads plus locking and/or deal with IPC.
Yes, there is a difference in the two codes in terms of performance.
In synchronous code:
names = db.querySync(//select list of permitted files_names);
you are calling the database here to give list of names. Assume , this takes 10 sec. So for this time, nodeJS as it is single threaded gos into blocking state. After 10 sec, it executes the rest of the code . Assume this for loop takes 5 sec and some code takes 5 sec.
for(name in names) {
//open each file to put into array
}
//some code
Therefore it takes a total time of 20 sec.
whereas in Asynchronous code:
db.query(<select list of permitted files_names>, function(err, filenames) {
NodeJs will ask the database to give list of names to a callback. Assume that it takes 10 sec. And immediately it goes into the next step(some code), but not into the blocking state. Assume that some code takes 5 sec.
async.each(file_names, function(name, next) {
//open each file to put into array
});
})
//some code.
After 5 sec, it will check whether it has an i/o operations to be performed. Once the call back is returned. It will execute the function(name, next) {..} for the 5 sec.
So the total time here is 15sec.
In this manner the performance is improved.
If the asynchronous code should be clear and neat then make use of closures & promises.
For ex: Above asynchronous code can be written as
fun = function(err, filenames) {
async.each(file_names, function(name, next) {
//open each file to put into array
}
db.query(<select list of permitted files_names>, fun);
The benefit is simple: By using asynchronous code, the current thread (remember, Node.js is single-threaded) is able to handle other requests while the current request is waiting on something (like a database query) to return.
If you use synchronous code instead, the current thread will block while it waits, and it won't be able to handle other requests in the meantime. In other words, you lose concurrency.
To keep your asynchronous code clean, look into promises (to avoid deeply nested callbacks) and ES7 async/await (to avoid callbacks at all and write asynchronous code that looks just like synchronous code).
This question extends that of What is Node.js' Connect, Express and "middleware"?
I'm going the route of learning Javascript -> Node.js -> Connect -> Express -> ... in order to learn about using a modern web development stack. I have a background in low-layer networking, so getting up and going with Node.js' net and http modules was easy. The general pattern of using a server to route requests to different handlers seemed natural and intuitive.
Moving to Connect, I'm afraid I don't understand the paradigm and the general flow of data of this "middleware". For example, if I create some middleware for use with Connect ala;
// example.js
module.exports = function (opts) {
// ...
return function(req, res, next) {
// ...
next();
};
};
and "use" it in Connect via
var example = require('./example');
// ...
var server = connect.createServer();
// ...
server.use(example(some_paramater));
I don't know when my middleware gets called. Additionally, if I'm use()'ing other middlware, can I be guaranteed on the order in which the middleware is called? Furthuremore, I'm under the assumption the function next() is used to call the next (again, how do I establish an ordering?) middleware; however, no parameters (req, res, next) are passed. Are these parameters passed implicitly somehow?
I'm guessing that the collection of middleware modules used are strung together, starting with the http callback -> hence a bunch of functionality added in the middle of the initial request callback and the server ending a response.
I'm trying to understand the middleware paradigm, and the flow of information/execution.
Any help is greatly appreciated. Thank you for reading
The middleware is called as a chain of functions, with order based on middleware definition order(time) with matching routes (if applicable).
Taking in account that req and res objects are travelling through chain so you can reuse/improve/modify data in them along the chain.
There are two general use cases for middleware: generic and specific.
Generic is as you have defined in example above: app.use, it will apply to every single request. Each middleware have to call next() inside, if it wants to proceed to next middleware.
When you use app.get('/path', function(... this actual function is middleware as well, just inline defined. So it is sort of fully based on middlewares, and there is no endware :D
The chain order is based on definition order. So it is important to define middleware in sync manner or order-reliable async manner. Otherwise different order of middleware can break logic, when chain of middleware depends on each other.
Some middleware can be used to break the chain return next(new Error());. It is useful for example for validation or authentication middleware.
Another useful pattern of use for middleware is to process and parse request data, like cookies, or good example of such app.use(express.bodyParser());.