Nodejs: synchronizing multiple simultaneous file changes - javascript

I have a small development web server, that I use to write missing translations into files.
app.post('/locales/add/:language/:namespace', async (req, res) => {
const { language, namespace } = req.params
// I'm using fs.promises
let current = await fs.readFile(`./locales/${language}/${namespace}.json`, 'utf8')
current = JSON.parse(current)
const newData = JSON.stringify({ ...req.body, ...current }, null, 2)
await fs.writeFile(`./locales/${language}/${namespace}.json`, newData)
})
Obviously, when my i18n library does multiple writes into one file like this:
fetch('/locales/add/en/index', { body: `{"hello":"hello"}` })
fetch('/locales/add/en/index', { body: `{"bye":"bye"}` })
it seems like the file is being overwritten and only the result of the last request is saved. I cannot just append to the file, because it's JSON. How to fix this?

You will have to use some sort of concurrency control to keep two concurrent requests that are both trying to write to the same resources form interfering with each other.
If you have lots of different files that you may be writing to and perhaps multiple servers writing to it, then you pretty much have to use some sort of file locking, either OS-supplied or manually with lock files and have subsequent requests wait for the file lock to be cleared. If you have only on server writing to the file and a manageable number of files, then you can create a file queue that keeps track of the order of requests and when the file is busy and it can return a promise when it's time for a particular request to do its writing
Concurrency control is always what databases are particularly good at.
I have no experience with either of these packages, but these are the general idea:
https://www.npmjs.com/package/lockfile
https://www.npmjs.com/package/proper-lockfile
These will guarantee one at a time access. I don't know if they will guarantee that multiple requests are granted access in the precise order they attempted to acquire the lock. If you need that, you might have to add that on top with some sort of queue.
Some discussion of this topic here: How can I lock a file while writing to it asynchronously

Related

How to limit flow between streams in NodeJS

I have a readStream piped to writeStream. Read stream reads from an internet and write stream writes to my local database instance. I noticed that read speed is much faster than write speed and my app memory usage rises until it reaches
JavaScript heap out of memory
I suspect that it accumulates read data in the NodeJS app like this:
How can I limit read stream so it reads only what write stream is capable of writing at the given time?
Ok so long story short - mechanism you need to be aware of to solve these kind of issues is backpressure. It is not a problem when you are using standard node's pipe(). I am using custom fan-out to multiple streams thus it happened
You can read about it here https://nodejs.org/en/docs/guides/backpressuring-in-streams/
This solution is not ideal as it will block read-stream whenever any of fan-out write streams is blocked but it gives general idea on how to approach this problem
combinedStream.pipe(transformer).on('data', async (data: DbObject) => {
const writeStream = dbClient.getStreamForTable(data.table);
if (!writeStream.write(data.csv)) {
combinedStream.pause();
await new Promise((resolve) => writeStream.once('drain', resolve));
combinedStream.resume();
}
});

How to process websocket messages sequentially

I am receiving dozens of messages per WebSocket which can arrive with a few milliseconds of difference. I need to process these data with operations which can sometimes take a little time (insertions in DB for example).
In order to process a new message received, it is imperative that the previous one has finished being processed.
My first idea was to prepare a queue with node.js Bull ( with Redis ), but I'm afraid it's too long to run. The processing of these messages must remain fast.
I tried to use JS iterators/generators ( something I never used until now ) and I tested something like this :
const ws = new WebSocket(`${this.baseUrl}${this.path}`)
const duplex = WebSocket.createWebSocketStream(ws, { encoding: 'utf8' })
const messageGenerator = async function* (duplex) {
for await (const message of duplex) {
yield message
}
}
for await (let msg of messageGenerator(socketApi.duplex)) {
console.log('start process')
await this.messageHandler.handleMessage(msg, user)
console.log('end process')
}
log :
start process
start process
end process
end process
Unfortunately, as you can see, messages continue to be processed without waiting for the previous one to finish. Do you have a solution to this problem?
Should I finally use a queue with Redis to process the messages?
Thanks
I am not a nodeJS guy but I have thought about the same issue multiple times in other languages. I have concluded that it really matters how slow are the message process operations, because if they are too slow (slower than a certain threshold depending on the msg per second value), this can cause a bottleneck on the websocket connection and when this bottleneck builds up it can cause extreme delays in future messages.
If await and async have identical behaviour as in python, if you process any operation using them, your processing will be asynchronous, which means that it indeed will not wait for the previous one to be processed.
So far I have though of two options:
Keep processing the messages asynchronously, but write some additional logic in the code processing them, which manages the order mess. For example, confirm that the previous message has been already processed before proceeding with the current message. This logic can be complex and slow because it runs in a separate thread and doesn't block the websocket messages.
Process the messages synchronously, one by one, but extremely fast by doing one single operation: storing them in Redis. This is way faster than storing them in database and in most cases will be enough fast not to cause bottlenecks in the WS connection. Then use separate process to get these messages from Redis and process them.

javascript manage shared resource access for asynchronous functions

I'm writing a node.js server script that uses a shared text list data for multiple clients asynchronously.
the clients can read, add or update items of this shared list.
static getitems(){
if (list== undefined) list = JSON.parse(fs.readFileSync("./list.json"));
return list;
}
static additem(newitem){
var key = Object.keys(newitem)[0];
list[key] = newitem[key];
fs.writeFileSync("./list.json", JSON.stringify(list));
}
clients can modify and get the list data using the following express APIs
app.get("/getlist"), (req, res)=>{
res.send(TempMan.getTemplates());
});
app.post("/addlist"), (req, res)=>{
additem(req.body.newitem)
res.status(204).end()
});
with long background in C#, C++ and other desktop programming languages, although I red javascript doesn't run into race condition, I am so worried resource sharing is going to be a problem. I was first thinking of semaphores or shared lock or some other multiple thread management solutions in other languages, but yet read javascript doesn't need such methods.
does such node.js implementation run to resource sharing problem such as simultaneous attempts of file read/write? how can I solve this? do I need some kind of transaction functions I can use in javascript?
Generally speaking, a Node.js program may encounter a resource sharing problem you call, usually, we call it "race condition" problems. It is not due to two threads/processes but it is due to the intrinsic property: async. Assume that there are two async functions, the first one has started but is not finished and it has some await inside, in this situation, the second async function can start. It may cause race conditions if they access the same resource in their code blocks.
I have made a slide to introduce this issue: https://slides.com/grimmer/intro_js_ts_task_queuelib_d4c/fullscreen#/0/12.
Go back to your example code, your code WILL NOT have any race conditions. Even you put any usage of async function inside express routing callback instead of fs.writeFileSync, the reason is that the implementation of Express will await the first async routing callback handler function and only starts to execute the second async routing callback handler function after the first one is finished.
For example:
app.post('/testing1', async (req, res) => {
// Do something here
});
app.post('/testing2', async (req, res) => {
// Do something here
});
is like the below code in the implementation of Express,
async function expressCore() {
await 1st_routing_call_back()
await 2nd_routing_call_back()
}
But please keep in mind that the other server frameworks may not have the same behavior. https://www.apollographql.com/ and https://nestjs.com/ both allow two async routing methods to be executed concurrently. Like below
async function otherServerFrameworkCore() {
1st_routing_call_back()
2nd_routing_call_back()
}
and you need to find a way to avoid race conditions if this is your concern. Either using transaction for DB usage or some npm synchronization libraries which are lightweight and suitable for single Node.js instance program, e.g. https://www.npmjs.com/package/d4c-queue which is made by me. Multi Node.js instances are multi-processes and should have possible race condition issues and DB transaction is a more suitable solution.

Does long time IO operation in Electron main process block the UI

I know that CPU intensive work in main process will block the UI process. I have another question, does long time IO operation in main process block the UI.
Recently, I use electron to develop a desktop application of file management.
Step 1:
My UI process use asynchronous IPC (provided by Electron) to tell the main-process to fetch data of file list from network, (only fetch meta data of file, not contain file content)
Step 2:
Main-process fetch data of file list from network and then store the file list into sqlite(I use TypeORM ), and then select parts of the file list from sqlite, and response them back to UI-process
Sometimes the step2 costs tens of seconds (for example, I fetch 10000 items of file data from network ), and my UI will be slowed down.
So, I have two question:
+ Does long time IO operation in main process block the UI ?
+ What's the best way to do IO operation(database or local file) in electron applcation ?
Potentially, I/O can block your application. Node offers blocking and non-blocking I/O operations. You'll want to use the non-blocking variants.
The Node docs have a section on blocking vs non-blocking I/O. Two code samples from that page, one blocking, one non-blocking:
const fs = require('fs');
const data = fs.readFileSync('/file.md'); // blocks here until file is read
const fs = require('fs');
fs.readFile('/file.md', (err, data) => {
if (err) throw err;
});
The second question ("what's the best way?") is opinionated and off-topic so I'll focus on the first:
Does long time IO operation in main process block the UI ?
No it does not. I/O in electron happens either from the Chromium side or the Node.js side - in both cases JavaScript's I/O execution model uses an event loop. The action is queued and then performed either on a threadpool in the background (like dns queries for example) or using native operating system async non-blocking I/O facilities (like socket writes).
The one caveat is the browsers do offer some (older) APIs that are blocking (like a synchronous XMLHttpRequest), but you are likely not using those.
For more details see our event loop and timers tutorial.

Is moving database works to child processes a good idea in node.js?

I just started getting into child_process and all I know is that it's good for delegating blocking functions (e.g. looping a huge array) to child processes.
I use typeorm to communicate with the mysql database. I was wondering if there's a benefit to move some of the asynchronous database works to child processes. I read it in another thread (Unfortunately I couldn't find it in the browser history) that there's no good reason to delegate async functions to child processes. Is it true?
example code:
child.js
import {createConnection} "./dbConnection";
import {SomeTable} from "./entity/SomeTable";
process.on('message', (m)=> {
createConnection().then(async connection=>{
let repository = connection.getRepository(SomeTable);
let results = await repository
.createQueryBuilder("t")
.orderBy("t.postId", "DESC")
.getMany();
process.send(results);
})
});
main.js
const cp = require('child_process');
const child = cp.fork('./child.js');
child.send('Please fetch some data');
child.on('message', (m)=>{
console.log(m);
});
The big gain about Javascript is its asynchronous nature...
What happens when you call an asynchronous function is that the code continues to execute, not waiting for the answer. And just when the function is done, and an answer is given does it then continue on with that part.
Your database call is already asynchronous. So you would spawn another node process for completely nothing. Since your database takes all the heat, having more nodeJS processes wouldn't help on that part.
Take the same example but with a file write. What could make the write to the disk faster? Nothing much really... But do we care? Nope because our NodeJS is not blocked and keeps answering requests and handling tasks. The only thing that you might want to check is to not send a thousand file writes at the same time, if they are big there would be a negative impact on the file system, but since a write is not CPU intensive, node will run just fine.
child processes really are a great tool, but it is rare to need it. I too wanted to use some when I heard about them, but the thing is that you will certainly not need them at all... The only time I decided to use it was to create a CPU intensive worker. It would make sure it spawns one child process per Core (since node is single threaded) and respawn any faulty ones.

Categories