Node.js - How can I prevent interrupted child processes from surviving?

Node.js - How can I prevent interrupted child processes from surviving? - javascript

I have found that some child processes are failing to terminate if the calling script is interrupted.
Specifically, I have a module that uses Ghostscript to perform various actions: extract page images, create a new pdf from a slice, etc. I use the following to execute the command and return a through stream of the child's stdout:
function spawnStream(command, args, storeStdout, cbSuccess) {
storeStdout = storeStdout || false;
const child = spawn(command, args);
const stream = through(data => stream.emit('data', data));
let stdout = '';
child.stdout.on('data', data => {
if (storeStdout === true) stdout += data;
stream.write(data);
});
let stderr = '';
child.stderr.on('data', data => stderr += data);
child.on('close', code => {
stream.emit('end');
if (code > 0) return stream.emit('error', stderr);
if (!!cbSuccess) cbSuccess(stdout);
});
return stream;
}
This is invoked by function such as:
function extractPage(pathname, page) {
const internalRes = 96;
const downScaleFactor = 1;
return spawnStream(PATH_TO_GS, [
'-q',
'-sstdout=%stderr',
'-dBATCH',
'-dNOPAUSE',
'-sDEVICE=pngalpha',
`-r${internalRes}`,
`-dDownScaleFactor=${downScaleFactor}`,
`-dFirstPage=${page}`,
`-dLastPage=${page}`,
'-sOutputFile=%stdout',
pathname
]);
}
which is consumed, for example, like this:
it('given a pdf pathname and page number, returns the image as a stream', () => {
const document = path.resolve(__dirname, 'samples', 'document.pdf');
const test = new Promise((resolve, reject) => {
const imageBlob = extract(document, 1);
imageBlob.on('data', data => {
// do nothing in this test
});
imageBlob.on('end', () => resolve(true));
imageBlob.on('error', err => reject(err));
});
return Promise.all([expect(test).to.eventually.equal(true)]);
});
When this is interrupted, for example if the test times out or an unhandled error occurs, the child process doesn't seem to receive any signal and survives. It's a bit confusing, as no individual operation is particularly complex and yet the process appears to survive indefinitely, using 100% of CPU.
☁ ~ ps aux | grep gs | head -n 5
rwick 5735 100.0 4.2 3162908 699484 s000 R 12:54AM 6:28.13 gs -q -sstdout=%stderr -dBATCH -dNOPAUSE -sDEVICE=pngalpha -r96 -dDownScaleFactor=1 -dFirstPage=3 -dLastPage=3 -sOutputFile=%stdout /Users/rwick/projects/xan-desk/test/samples/document.pdf
rwick 5734 100.0 4.2 3171100 706260 s000 R 12:54AM 6:28.24 gs -q -sstdout=%stderr -dBATCH -dNOPAUSE -sDEVICE=pngalpha -r96 -dDownScaleFactor=1 -dFirstPage=2 -dLastPage=2 -sOutputFile=%stdout /Users/rwick/projects/xan-desk/test/samples/document.pdf
rwick 5733 100.0 4.1 3154808 689000 s000 R 12:54AM 6:28.36 gs -q -sstdout=%stderr -dBATCH -dNOPAUSE -sDEVICE=pngalpha -r96 -dDownScaleFactor=1 -dFirstPage=1 -dLastPage=1 -sOutputFile=%stdout /Users/rwick/projects/xan-desk/test/samples/document.pdf
rwick 5732 100.0 4.2 3157360 696556 s000 R 12:54AM 6:28.29 gs -q -sstdout=%stderr -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=%stdout /Users/rwick/projects/xan-desk/test/samples/document.pdf /Users/rwick/projects/xan-desk/test/samples/page.pdf
I thought to use a timer to send a kill signal to the child but selecting an arbitrary interval to kill a process seems like it would effectively be trading an known problem for an unknown one and kicking that can down the road.
I would really appreciate any insight into what I'm missing here. Is there a better option to encapsulate child processes so the termination of the parent is more likely to precipitate the child's interrupt?

listen to error event
child.on('error', function(err) {
console.error(err);
// code
try {
// child.kill() or child.disconnect()
} catch (e) {
console.error(e);
}
});

Related

Pipe spawned process stdout to function on flush

My goal is to spawn another binary file in a child process, then handle the stdout on a line-by-line basis (And do some processing against that line). For testing I'm using Node. To do this I tried a readable & writable stream but it a type error is throwed saying "The argument 'stdio' is invalid. Received Writable"
const rs = new stream.Readable();
const ws = new stream.Writable();
const child = cp.spawn("node", [], {
stdio: [process.stdin, ws, process.stderr]
})
let count = 0;
ws.on("data", (data) => {
console.log(data, count)
});
Anyone have any ideas?

One way is to use the stdio streams returned by spawn and manually pipe them:
const child = cp.spawn("node", [])
child.stdin.pipe(process.stdin)
child.stdout.pipe(ws)
child.stderr.pipe(process.stderr)
let count = 0;
ws.on("data", (data) => {
console.log(data, count)
});

ffmpeg running in cloudfunction silently fails/never finishes

I am trying to implement a Cloudfunction which would run ffmpeg on a Google bucket upload. I have been playing with a script based on https://kpetrovi.ch/2017/11/02/transcoding-videos-with-ffmpeg-in-google-cloud-functions.html
The original script needs little tuning as the library evolved a bit. My current version is here:
const {Storage} = require('#google-cloud/storage');
const storage = new Storage();
const ffmpeg = require('fluent-ffmpeg');
const ffmpeg_static = require('ffmpeg-static');
console.log("Linking ffmpeg path to:", ffmpeg_static)
ffmpeg.setFfmpegPath(ffmpeg_static);
exports.transcodeVideo = (event, callback) => {
const bucket = storage.bucket(event.bucket);
console.log(event);
if (event.name.indexOf('uploads/') === -1) {
console.log("File " + event.name + " is not to be processed.")
return;
}
// ensure that you only proceed if the file is newly createdxxs
if (event.metageneration !== '1') {
callback();
return;
}
// Open write stream to new bucket, modify the filename as needed.
const targetName = event.name.replace("uploads/", "").replace(/[.][a-z0-9]+$/, "");
console.log("Target name will be: " + targetName);
const remoteWriteStream = bucket.file("processed/" + targetName + ".mp4")
.createWriteStream({
metadata: {
//metadata: event.metadata, // You may not need this, my uploads have associated metadata
contentType: 'video/mp4', // This could be whatever else you are transcoding to
},
});
// Open read stream to our uploaded file
const remoteReadStream = bucket.file(event.name).createReadStream();
// Transcode
ffmpeg()
.input(remoteReadStream)
.outputOptions('-c:v copy') // Change these options to whatever suits your needs
.outputOptions('-c:a aac')
.outputOptions('-b:a 160k')
.outputOptions('-f mp4')
.outputOptions('-preset fast')
.outputOptions('-movflags frag_keyframe+empty_moov')
// https://github.com/fluent-ffmpeg/node-fluent-ffmpeg/issues/346#issuecomment-67299526
.on('start', (cmdLine) => {
console.log('Started ffmpeg with command:', cmdLine);
})
.on('end', () => {
console.log('Successfully re-encoded video.');
callback();
})
.on('error', (err, stdout, stderr) => {
console.error('An error occured during encoding', err.message);
console.error('stdout:', stdout);
console.error('stderr:', stderr);
callback(err);
})
.pipe(remoteWriteStream, { end: true }); // end: true, emit end event when readable stream ends
};
This version correctly runs and I can see this in logs:
2020-06-16 21:24:22.606 Function execution took 912 ms, finished with status: 'ok'
2020-06-16 21:24:52.902 Started ffmpeg with command: ffmpeg -i pipe:0 -c:v copy -c:a aac -b:a 160k -f mp4 -preset fast -movflags frag_keyframe+empty_moov pipe:1
It seems the function execution ends before the actual ffmpeg command, which then never finishes.
Is there a way to make the ffmpeg "synchronous" or "blocking" so that it finishes before the function execution?

From google cloud documentation it seems the function should accept three arguments: (data, context, callback) have you tried this or do you know that context is optional? From the docs it seems that if the function accepts three arguments is treated as a background function, if it accepts only two arguments, is treated as a background function only if it returns a Promise.
More than this some other point:
1: here no callback function is called, if in your tests your function exited with that log line, it is another point suggesting that calling the second argument as a callback function is a required step to make process finish:
if (event.name.indexOf('uploads/') === -1) {
console.log("File " + event.name + " is not to be processed.")
return;
}
I would suggest to add some other console.log (or many other, if you prefer) to clarify the flow: in your question you pasted only 1 log line, it is not so much helpful more to say it is logged after the system log line
the link you used as tutorial is almost three years old, it could be that google cloud has changed its interface in the mean while.
Once said that, if acceptint three arguments rather than only two doesn't solve your problem, you can try changing your function in a Promise:
exports.transcodeVideo = (event, callback) => new Promise((resolve, reject) => {
const bucket = storage.bucket(event.bucket);
console.log(event);
if (event.name.indexOf('uploads/') === -1) {
console.log("File " + event.name + " is not to be processed.")
return resolve(); // or reject if this is an error case
}
// ensure that you only proceed if the file is newly createdxxs
if (event.metageneration !== '1') {
return resolve(); // or reject if this is an error case
}
// Open write stream to new bucket, modify the filename as needed.
const targetName = event.name.replace("uploads/", "").replace(/[.][a-z0-9]+$/, "");
console.log("Target name will be: " + targetName);
const remoteWriteStream = bucket.file("processed/" + targetName + ".mp4")
.createWriteStream({
metadata: {
//metadata: event.metadata, // You may not need this, my uploads have associated metadata
contentType: 'video/mp4', // This could be whatever else you are transcoding to
},
});
// Open read stream to our uploaded file
const remoteReadStream = bucket.file(event.name).createReadStream();
// Transcode
ffmpeg()
.input(remoteReadStream)
.outputOptions('-c:v copy') // Change these options to whatever suits your needs
.outputOptions('-c:a aac')
.outputOptions('-b:a 160k')
.outputOptions('-f mp4')
.outputOptions('-preset fast')
.outputOptions('-movflags frag_keyframe+empty_moov')
// https://github.com/fluent-ffmpeg/node-fluent-ffmpeg/issues/346#issuecomment-67299526
.on('start', (cmdLine) => {
console.log('Started ffmpeg with command:', cmdLine);
})
.on('end', () => {
console.log('Successfully re-encoded video.');
resolve();
})
.on('error', (err, stdout, stderr) => {
console.error('An error occured during encoding', err.message);
console.error('stdout:', stdout);
console.error('stderr:', stderr);
reject(err);
})
.pipe(remoteWriteStream, { end: true }); // end: true, emit end event when readable stream ends
});
Hope this helps.

How to make a certain number of functions run parallel in loop in NodeJs?

I'm looking for a way to run 3 same-functions at once in a loop and wait until it finish and continues to run another 3 same-functions. I think it involves a loop, promise API. But my solution is fail. It would be great if you could tell me what did I do wrong and how to fix it.
Here is what I have done so far:
I have a download function (call downloadFile), an on-hold function (call runAfter) and a multi download function (call downloadList). They look like this:
const https = require('https')
const fs = require('fs')
const { join } = require('path')
const chalk = require('chalk') // NPM
const mime = require('./MIME') // A small module read Json and turn it to object. It returns a file extension string.
exports.downloadFile = url => new Promise((resolve, reject) => {
const req = https.request(url, res => {
console.log('Accessing:', chalk.greenBright(url))
console.log(res.statusCode, res.statusMessage)
// console.log(res.headers)
const ext = mime(res)
const name = url
.replace(/\?.+/i, '')
.match(/[\ \w\.-]+$/i)[0]
.substring(0, 250)
.replace(`.${ext}`, '')
const file = `${name}.${ext}`
const stream = fs.createWriteStream(join('_DLs', file))
res.pipe(stream)
res.on('error', reject)
stream
.on('open', () => console.log(
chalk.bold.cyan('Download:'),
file
))
.on('error', reject)
.on('close', () => {
console.log(chalk.bold.cyan('Completed:'), file)
resolve(true)
})
})
req.on('error', reject)
req.end()
})
exports.runAfter = (ms, url) => new Promise((resolve, reject) => {
setTimeout(() => {
this.downloadFile(url)
.then(resolve)
.catch(reject)
}, ms);
})
/* The list param is Array<String> only */
exports.downloadList = async (list, options) => {
const opt = Object.assign({
thread: 3,
delayRange: {
min: 100,
max: 1000
}
}, options)
// PROBLEM
const multiThread = async (pos, run) => {
const threads = []
for (let t = pos; t < opt.thread + t; t++) threads.push(run(t))
return await Promise.all(threads)
}
const inQueue = async run => {
for (let i = 0; i < list.length; i += opt.thread)
if (opt.thread > 1) await multiThread(i, run)
else await run(i)
}
const delay = range => Math.floor(
Math.random() * (new Date()).getHours() *
(range.max - range.min) + range.min
)
inQueue(i => this.runAfter(delay(opt.delayRange), list[i]))
}
The downloadFile will download anything from the link given. The runAfter will delay a random ms before excute downloadFile. The downloadList receive a list of URL and pass each of it to runAfter to download. And that (downloadList) is where the trouble begin.
If I just pass the whole list through simple loop and execute a single file at once. It's easy. But if I pass a large requests, like a list with 50 urls. It would take long time. So I decide to make it run parallel at 3 - 5 downloadFile at once, instead of one downloadFile. I was thinking about using async/await and Promise.all to solve the problem. However, it's crash. Below is the NodeJs report:
<--- Last few GCs --->
[4124:01EF5068] 75085 ms: Scavenge 491.0 (493.7) -> 490.9 (492.5) MB, 39.9 / 0.0 ms (average mu = 0.083, current mu = 0.028) allocation failure
[4124:01EF5068] 75183 ms: Scavenge 491.4 (492.5) -> 491.2 (493.2) MB, 29.8 / 0.0 ms (average mu = 0.083, current mu = 0.028) allocation failure
<--- JS stacktrace --->
==== JS stack trace =========================================
0: ExitFrame [pc: 00B879E7]
Security context: 0x03b40451 <JSObject>
1: multiThread [04151355] [<project folder>\inc\Downloader.js:~62] [pc=03C87FBF](this=0x03cfffe1 <JSGlobal Object>,0,0x041512d9 <JSFunction (sfi = 03E2E865)>)
2: inQueue [041513AD] [<project folder>\inc\Downloader.js:70] [bytecode=03E2EA95 offset=62](this=0x03cfffe1 <JSGlobal Object>,0x041512d9 ...
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
Writing Node.js report to file: report.20200428.000236.4124.0.001.json
Node.js report completed
Apparently, a sub-function of downloadList (multiThread) is a cause but I couldn't read those number (seems like a physical address of RAM or something), so I have no idea how to fix it. I'm not a professional engineer so I would appreciate if you could give me a good explanation.
Addition information:
NodeJs version: 12.13.1
Localhost: Aspire SW3-013 > 1.9GB (2GB in spec) / Intel Atom CPU Z3735F
Connecting to Internet via WiFi (Realtek drive)
OS: Windows 10 (no other choice)
In case you might ask:
Why wrapping Promise for downloadFile? For further application, like I can put it in other app which only require one download at a time.
Does runAfter important? Maybe no, just a little challenges for myself. But it could be useful if servers require delay download time.
Homework or Business? None, hobby only. I plan to build a app to fetch and download image from API of Unsplash. So I prefer a good explanation what I did wrong and how to fix it rather then a code that simple works.

Your for-loop in multiThread never ends because your continuation condition is t < opt.thread + t. This will always be true if opt.thread is not zero. You will have an infinite loop here, and that's the cause of your crash.
I suspect you wanted to do something like this:
const multiThread = async (pos, run) => {
const threads = [];
for (let t = 0; t < opt.thread && pos+t < list.length; t++) {
threads.push(run(pos + t));
}
return await Promise.all(threads);
};
The difference here is that the continuation condition for the loop should be limiting itself to a maximum of opt.thread times, and also not going past the end of the number of entries in the list array.
If the list variable isn't global (ie, list.length is not available in the multiThread function), then you can leave out the second part of the condition and just handle it in the run function like this so that any values of i past the end of the list are ignored:
inQueue(i => {
if (i < list.length) this.runAfter(delay(opt.delayRange), list[i])
})

Exit stream manually before `on('exit')` event is reached while using filewalker

I am using filewalker npm package to walk through files.
I am trying to limit amount of files that would be read in stream by exiting stream once a specific condition is met. (e.g. streamed 5 file paths) rather than waiting exit event. (This is due to huge amount of files and I want to paginate stream)
getFilePath: (dirPath, fileMatchExpression) => {
return new Promise((resolve, reject) => {
filewalker(dirPath)
.on('file', filePath => {
if (filePath.match(fileMatchExpression)){
resolve(filePath)
// how to force exit on this line?
}
})
.on('error', err => reject(err))
.on('done', _ => console.log('DONE!!'))
.walk()
})
Is there a way to cancel/exit stream by manually?

While this is not answer to the question, I solved this issue by replacing filewalker library with walk. It basically does the same thing, except it has next function which allows me to control if program will execute next or not.
const walk = require('walk')
let walker = walk.walk(dataDirPath, {followingLinks: false})
let counter = 0
walker.on('file', (root, stat, next) => {
console.log(root + '/' + stat.name);
counter++
if(counter < 1) next()
})

Node.js Serialport synchronous write-read

Does anyone have any example code to use the node.js serialport module in a blocking/synchronous way?
What I am trying to do is send a command to a micro-controller and wait for the response before sending the next command.
I have the sending/receiving working but the data just comes in with the listener
serial.on( "data", function( data) {
console.log(data);
});
Is there a way to wait for the returned data after doing a
serial.write("Send Command");
Should I be setting a global flag or something?
I am still new to the async programming style of node.js
Thanks

There is no such option and it's actually not necessary. One way of doing this is to maintain a queue of commands. Something like this:
function Device (serial) {
this._serial = serial;
this._queue = queue;
this._busy = false;
this._current = null;
var device = this;
serial.on('data', function (data) {
if (!device._current) return;
device._current[1](null, data);
device.processQueue();
});
}
Device.prototype.send = function (data, callback) {
this._queue.push([data, callback]);
if (this._busy) return;
this._busy = true;
this.processQueue();
};
Device.prototype.processQueue = function () {
var next = this._queue.shift();
if (!next) {
this._busy = false;
return;
}
this._current = next;
this._serial.write(next[0]);
};

This can now be done using the serialport-synchronous library in npm.
Consider the following serial port flow:
1. << READY
2. >> getTemp
3. << Received: getTemp
4. << Temp: 23.11
We can get the temperature value with the following code:
import { SerialPortController } from 'serialport-synchronous'
const TEMP_REGEX = /^Temp: (\d+\.\d+)$/
const ERROR_REGEX = /^ERROR$/
const READY_REGEX = /^READY$/
const controller = new SerialPortController({
path: '/dev/ttyUSB0',
baudRate: 19200,
handlers: [{
pattern: READY_REGEX,
callback: main // call the main() function when READY_REGEX has matched.
}]
})
// push the log events from the library to the console
controller.on('log', (log) => console[log.level.toLowerCase()](`${log.datetime.toISOString()} [${log.level.toUpperCase()}] ${log.message}`))
// open the serial port connection
controller.open()
async function main () {
try {
// send the getTemp text to the serialport
const result = await controller.execute({
description: 'Querying current temperature', // optional, used for logging purposes
text: 'getTemp', // mandatory, the text to send
successRegex: TEMP_REGEX, // mandatory, the regex required to resolve the promise
bufferRegex: TEMP_REGEX, // optional, the regex match required to buffer the response
errorRegex: ERROR_REGEX, // optional, the regex match required to reject the promise
timeoutMs: 1000 // mandatory, the maximum time to wait before rejecting the promise
})
// parse the response to extract the temp value
const temp = result.match(TEMP_REGEX)[1]
console.log(`\nThe temperature reading was ${temp}c`)
} catch (error) {
console.error('Error occured querying temperature')
console.error(error)
}
}
Output looks something like this:
2022-07-20T01:33:56.855Z [INFO] Connection to serial port '/dev/ttyUSB0' has been opened
2022-07-20T01:33:58.391Z [INFO] << READY
2022-07-20T01:33:58.392Z [INFO] Inbound message matched unsolicited handler pattern: /^READY$/. Calling custom handler function
2022-07-20T01:33:58.396Z [INFO] Querying current temperature
2022-07-20T01:33:58.397Z [INFO] >> [TEXT] getTemp
2022-07-20T01:33:58.415Z [INFO] << Received: getTemp
2022-07-20T01:33:58.423Z [INFO] << Temp: 23.11
2022-07-20T01:33:58.423Z [DEBUG] Received expected response, calling resolve handler
The temperature reading was 23.11c

We Keep Coding

JavaScript is the programming language of the Web.

Node.js - How can I prevent interrupted child processes from surviving? - javascript

listen to error event child.on('error', function(err) { console.error(err); // code try { // child.kill() or child.disconnect() } catch (e) { console.error(e); } });

Related

Pipe spawned process stdout to function on flush

ffmpeg running in cloudfunction silently fails/never finishes

How to make a certain number of functions run parallel in loop in NodeJs?

Exit stream manually before `on('exit')` event is reached while using filewalker

Node.js Serialport synchronous write-read

Categories

Resources