I've got a Node.js / Express application where sometimes I need to perform a non critical async task that doesn't require waiting for the result (for example, a call to save some data in an analytics platform):
router.post("/", function (req, res, next) {
criticalTask()
.then(result => {
res.json({success: true});
nonCriticalTask();
})
.catch(next)
}
Is there a guarantee that the nonCriticalTask() gets executed completely without terminating it in the middle? Are there any restrictions on this?
In the end I couldn't find any documentation on this. After lots of experiments and logging, seems that nonCriticalTask() doesn't terminate and is executed by node and node doesn't exit if there are tasks still executing or handles are in use, e.g. DB connection is open.
So it seems to work for my nonCriticalTask() that does analytics. That being said, it's probably a bad design practice to rely on the node engine for anything critical running in the background like this, and other approaches should be considered, e.g. persistent queues.
Related
I'm writing a node.js server script that uses a shared text list data for multiple clients asynchronously.
the clients can read, add or update items of this shared list.
static getitems(){
if (list== undefined) list = JSON.parse(fs.readFileSync("./list.json"));
return list;
}
static additem(newitem){
var key = Object.keys(newitem)[0];
list[key] = newitem[key];
fs.writeFileSync("./list.json", JSON.stringify(list));
}
clients can modify and get the list data using the following express APIs
app.get("/getlist"), (req, res)=>{
res.send(TempMan.getTemplates());
});
app.post("/addlist"), (req, res)=>{
additem(req.body.newitem)
res.status(204).end()
});
with long background in C#, C++ and other desktop programming languages, although I red javascript doesn't run into race condition, I am so worried resource sharing is going to be a problem. I was first thinking of semaphores or shared lock or some other multiple thread management solutions in other languages, but yet read javascript doesn't need such methods.
does such node.js implementation run to resource sharing problem such as simultaneous attempts of file read/write? how can I solve this? do I need some kind of transaction functions I can use in javascript?
Generally speaking, a Node.js program may encounter a resource sharing problem you call, usually, we call it "race condition" problems. It is not due to two threads/processes but it is due to the intrinsic property: async. Assume that there are two async functions, the first one has started but is not finished and it has some await inside, in this situation, the second async function can start. It may cause race conditions if they access the same resource in their code blocks.
I have made a slide to introduce this issue: https://slides.com/grimmer/intro_js_ts_task_queuelib_d4c/fullscreen#/0/12.
Go back to your example code, your code WILL NOT have any race conditions. Even you put any usage of async function inside express routing callback instead of fs.writeFileSync, the reason is that the implementation of Express will await the first async routing callback handler function and only starts to execute the second async routing callback handler function after the first one is finished.
For example:
app.post('/testing1', async (req, res) => {
// Do something here
});
app.post('/testing2', async (req, res) => {
// Do something here
});
is like the below code in the implementation of Express,
async function expressCore() {
await 1st_routing_call_back()
await 2nd_routing_call_back()
}
But please keep in mind that the other server frameworks may not have the same behavior. https://www.apollographql.com/ and https://nestjs.com/ both allow two async routing methods to be executed concurrently. Like below
async function otherServerFrameworkCore() {
1st_routing_call_back()
2nd_routing_call_back()
}
and you need to find a way to avoid race conditions if this is your concern. Either using transaction for DB usage or some npm synchronization libraries which are lightweight and suitable for single Node.js instance program, e.g. https://www.npmjs.com/package/d4c-queue which is made by me. Multi Node.js instances are multi-processes and should have possible race condition issues and DB transaction is a more suitable solution.
I have a small development web server, that I use to write missing translations into files.
app.post('/locales/add/:language/:namespace', async (req, res) => {
const { language, namespace } = req.params
// I'm using fs.promises
let current = await fs.readFile(`./locales/${language}/${namespace}.json`, 'utf8')
current = JSON.parse(current)
const newData = JSON.stringify({ ...req.body, ...current }, null, 2)
await fs.writeFile(`./locales/${language}/${namespace}.json`, newData)
})
Obviously, when my i18n library does multiple writes into one file like this:
fetch('/locales/add/en/index', { body: `{"hello":"hello"}` })
fetch('/locales/add/en/index', { body: `{"bye":"bye"}` })
it seems like the file is being overwritten and only the result of the last request is saved. I cannot just append to the file, because it's JSON. How to fix this?
You will have to use some sort of concurrency control to keep two concurrent requests that are both trying to write to the same resources form interfering with each other.
If you have lots of different files that you may be writing to and perhaps multiple servers writing to it, then you pretty much have to use some sort of file locking, either OS-supplied or manually with lock files and have subsequent requests wait for the file lock to be cleared. If you have only on server writing to the file and a manageable number of files, then you can create a file queue that keeps track of the order of requests and when the file is busy and it can return a promise when it's time for a particular request to do its writing
Concurrency control is always what databases are particularly good at.
I have no experience with either of these packages, but these are the general idea:
https://www.npmjs.com/package/lockfile
https://www.npmjs.com/package/proper-lockfile
These will guarantee one at a time access. I don't know if they will guarantee that multiple requests are granted access in the precise order they attempted to acquire the lock. If you need that, you might have to add that on top with some sort of queue.
Some discussion of this topic here: How can I lock a file while writing to it asynchronously
I have a node js process that creates a web3 websocket connection, like so:
web3 = new Web3('ws://localhost:7545')
When the process completes (I send it a SIGTERM), it does not exit, but rather hangs forever with no console output.
I registered a listener on SIGINT and SIGTERM to observe at what handles the process has outstanding with process._getActiveRequests() and process._getActiveHandles(), I see this:
Socket {
connecting: false,
_hadError: false,
_handle:
TCP {
reading: true,
owner: [Circular],
onread: [Function: onread],
onconnection: null,
writeQueueSize: 0 },
<snip>
_peername: { address: '127.0.0.1', family: 'IPv4', port: 7545 },
<snip>
}
For completeness, here is the code that's listening for the signals:
async function stop() {
console.log('Shutting down...')
if (process.env.DEBUG) console.log(process._getActiveHandles())
process.exit(0)
}
process.on('SIGTERM', async () => {
console.log('Received SIGTERM')
await stop()
})
process.on('SIGINT', async () => {
console.log('Received SIGINT')
await stop()
})
Looks like web3 is holding a socket open, which makes sense since I never told it to close the connection. Looking through the documentation and googling, it doesn't look like there's a close or end method for the web3 object.
Manually closing the socket in stop above allows the process to successfully exit:
web3.currentProvider.connection.close()
Anyone have a more elegant or officially sanctioned solution? It feels funny to me that you have to manually do this rather than have the object destroy itself on process end. Other clients seem to do this automatically without explicitly telling them to close their connections. Perhaps it is cleaner to tell all the clients created by your node process to close their handles/connections on shutdown anyway, but to me, this was unexpected.
It feels funny to me that you have to manually do this rather than have the object destroy itself on process end
It feels funny because you have probably been exposed to more synchronous programming compared to asynchronous. Consider the below code
fs = require('fs')
data = fs.readFileSync('file.txt', 'utf-8');
console.log("Read data", data)
When you run above you get the output
$ node sync.js
Read data Hello World
This is a synchronous code. Now consider the asynchronous version of the same
fs = require('fs')
data = fs.readFile('file.txt', 'utf-8', function(err, data) {
console.log("Got data back from file", data)
});
console.log("Read data", data);
When you run you get the below output
$ node async.js
Read data undefined
Got data back from file Hello World
Now if you think as a synchronous programmer, the program should have ended at the last console.log("Read data", data);, but what you get is another statement printed afterwards. Now this feels funny? Let's add a exit statement to the process
fs = require('fs')
data = fs.readFile('file.txt', 'utf-8', function(err, data) {
console.log("Got data back from file", data)
});
console.log("Read data", data);
process.exit(0)
Now when you run the program, it ends at the last statement.
$ node async.js
Read data undefined
But the file is not actually read. Why? because you never gave time for JavaScript engine to execute the pending callbacks. Ideally a process automatically finishes when there is no work left for it to do (no pending callbacks, function calls etc...). This is the way asynchronous world works. There are some good SO threads and articles you should look into
https://medium.freecodecamp.org/walking-inside-nodejs-event-loop-85caeca391a9
https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/
How to exit in Node.js
Why doesn't my Node.js process terminate once all listeners have been removed?
How does a node.js process know when to stop?
So in the async world you need to either tell the process to exit or it will automatically exit when there are no pending tasks (which you know how to check - process._getActiveRequests() and process._getActiveHandles())
The provider API for the JavaScript web3 module has gone through some substantial change recently due to the implementation of EIP-1193 and the impending release of Web3 1.0.0.
Per the code, it looks like web3.currentProvider.disconnect() should work. This method also accepts optional code and reason arguments, as described in the MDN reference docs for WebSocket.close(...).
Important: you'll notice that I referenced the source code above and not the documentation. That's because at present the disconnect method is not considered part of the public API. If you use it in your code, you should be sure to add a test case for it, as it could break at any time! From what I can see, WebSocketProvider.disconnect was introduced in web3#1.0.0-beta.38 and is still present in the latest release as of today, which is web3#1.0.0-beta.55. Given that the stable 1.0.0 release is due to drop very soon, I don't think it's likely that this will change much between now and web3#1.0.0, but there's no holds barred when it comes to the structure of internal APIs.
I've discussed making the internal providers public at length with the current maintainer, Samuel Furter, aka nividia on GitHub. I don't fully agree with his decision to keep it internal here, but in his defense he's the only maintainer at present and he's had his hands very full with stabilizing the long-standing work in progress on the 1.0 branch.
As a result of these discussions, my opinion at the moment is that those who need a stable API for their WebSocket provider should write an EIP-1193 compatible provider of their own, and publish it on NPM for others to use. Please follow semver for this, and include a similar disconnect method in your own public API. Bonus points if you write it in TypeScript, as this gives you the ability to explicitly declare class members as public, protected, or private.
If you do this, be aware that EIP-1193 is still in draft status, so you'll need to keep an eye on the EIP-1193 discussions on EthereumMagicians and in the Provider Ring Discord to stay on top of any changes that might occur.
At the end of your node js process, simply call:
web3.currentProvider.connection.close()
I'm writing a node backend and I'm a little confused how should I deal with async functions. I've read about process.nextTick(), but how often should I use. Most of my code is based on callbacks, like database calls, which are asynchronous by themselves. But I also have a few functions of my own, that should be async.
So which one is a good example of async function?
function validateUser1(user, callback) {
process.nextTick(function() {
//validate user, some regex and stuff
callback(err, user);
});
}
function validateUser2(user, callback) {
//validate user, some regex and stuff
process.nextTick(callback, err, user);
}
function validateUser3(user, callback) {
process.nextTick(function() {
//validate user, some regex and stuff
process.nextTick(callback, err, user);
});
}
I don't know whether I should wrap everything in process.nextTick , or wrap just the callback? or both?
And overall, the idea with node.js is to write lots of small functions rather than bigger ones, and call them asynchronously to not block other events, right?
If you have just CPU code (no I/O) you should try and go as far along as you can. Avoid async and tiny functions which fragment your code unnecessarily.
Take the opportunity and write clean, readable, linear code whenever possible. Only revert to async when absolutely necessary, such as stream I/O (file or network).
Consider this. Even if you have 1000+ lines of JS code, it will still be executed blazingly fast. You really do not need to fragment it (unless proven to be too cumbersome, such as very deep loops, but you have to measure it first)!
If you don't test the linear code first and actually SEE that you need to fragment it, you'll end up with premature optimization, which is a bad thing for maintainability.
I'd really go straight away with this:
function validateUser1(user, callback) {
//validate user, some regex and stuff
callback(err, user);
}
And if possible, remove the function altogether (but this is a matter of how you write the rest of the code).
Also, don't use nextTick() if you don't really need it. I've implemented a cloud server with many TCP/IP sockets, database connections, logging, file reading and a lot of I/O, but NOT ONCE did I use nextTick() and it runs really smooth.
process.nextTick() will execute your callback before continuing with the event loop. This will block your thread and can stop incoming connections from being handled if the callback you passed to process.nextTick() is something CPU expensive like encrypting, calculating PI etc.
From what I understand you try to make your functions asynchronous by passing them to process.nextTick(). That is not how it works.
When you pass something to process.nextTick() it will execute before the eventloop is executed the next time. This will not make your function non-blocking, as the function you execute is still running in the main thread. Only I/O Operations can be non-blocking.
Therefore it is irrelevant if you wrap your CPU-intensive functions with process.nextTick() or just execute them right away.
If you want to read more background information, here is the resource: https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/#process-nexttick
I still confused with the answer provided.
I watched short course on Lynda.com about NodeJS (Advanced NodeJS).
The guy provides the following example of using process.nextTick()
function hideString(str, done) {
process.nextTick(()=> {
done(str.replace(/[a-zA-Z]/g, 'X'))
})
}
hideString("Hello World", (hidden) => {
console.log( hidden );
});
console.log('end')
If you do not use, console.log('end') will be printed first. not async.
I understood it as to write async code, you will need process.nextTick.
Than it is not clear how async code is written in JS on frontend without process.next Tick()
I have a very simple utility script that I've written in JavaScript for node.js which reads a file, does a few calculations, then writes an output file. The source in its current form looks something like this:
fs.readFile(inputPath, function (err, data) {
if (err) throw err;
// do something with the data
fs.writeFile(outputPath, output, function (err) {
if (err) throw err;
console.log("File successfully written.");
});
});
This works fine, but I'm wondering if there is any disadvantage in this case to using the synchronous variety of these functions instead, like this:
var data = fs.readFileSync(inputPath);
// do something with the data
fs.writeFileSync(outputPath, output);
console.log("File successfully written.");
To me, this is much simpler to read and understand than the callback variety. Is there any reason to use the former method in this case?
I realize that speed isn't an issue at all with this simple script I'm running locally, but I'm interested in understanding the theory behind it. When does using the async methods help, and when does it not? Even in a production application, if I'm only reading a file, then waiting to perform the next task, is there any reason to use the asynchronous method?
What matters is what ELSE your node process needs to do while the synchronous IO happens. In the case of a simple shell script that is run at the command line by a single user, synchronous IO is totally fine since if you were doing asychronous IO all you'd be doing is waiting for the IO to come back anyway.
However, in a network service with multiple users you can NEVER use ANY synchronous IO calls (which is kind of the whole point of node, so believe me when I say this). To do so will cause ALL connected clients to halt processing and it is complete doom.
Rule of thumb: shell script: OK, network service: verboten!
For further reading, I made several analogies in this answer.
Basically, when node does asynchronous IO in a network server, it can ask the OS to do many things: read a few files, make some DB queries, send out some network traffic, and while waiting for that async IO to be ready, it can do memory/CPU things in the main event thread. Using this architecture, node gets pretty good performance/concurrency. However, when a synchronous IO operation happens, the entire node process just blocks and does absolutely nothing. It just waits. No new connections can be received. No processing happens, no event loop ticks, no callbacks, nothing. Just 1 synchronous operation stalls the entire server for all clients. You must not do it at all. It doesn't matter how fast it is or anything like that. It doesn't matter local filesystem or network request. Even if you spend 10ms reading a tiny file from disk for each client, if you have 100 clients, client 100 will wait a full second while that file is read one at a time over and over for clients 1-99.
Asynchronous code does not block the flow of execution, allowing your program to perform other tasks while waiting for an operation to complete.
In the first example, your code can continue running without waiting for the file to be written. In your second example, the code execution is "blocked" until the file is written. This is why synchronous code is known as "blocking" while asynchronous is known as "non-blocking."