node.js run function in child process? - javascript

I have a node.js application that receives a file, via a web request and then will apply a conversion process to this file. Since the task is long running this needs to run separate to the main thread.
At the moment I have just called the necessary code via a setTimeout() call. To isolate the main application from the conversion process I would like to move it out into a child process, since it is long running and I would like to isolate the main code from the work being done (am I worrying too much?). At the moment I am calling:
const execFile = require('child_process').execFile;
const child = execFile('node', './myModule.js', (error, stdout, stderr) => {
if (error) {
throw error;
}
console.log(stdout);
});
Is this the right approach in node.js, or is there of simply starting a child process with the module and params specified, but not have to specify 'node' as the executable?

Just seen that node.js provides the 'fork' function, for executing modules, though they will need to be written as if they were expecting command line arguments, processing the process.argv array.
The command call being:
child_process.fork(modulePath[, args][, options])
More details here.
In my specific case forking probably doesn't make sense, since there is already a fork being made by the node.js library I am using.

Related

How to run child_process.exec correctly on an ajax request?

There is a server, that I have an access to, but do not have ownership on. It serves a node js / express application on a default port 3000. There are several scripts, that are usually run either manually from the terminal or by cron job. What I want to do is to have a button on the client-side and make an ajax request to a certain route and execute a node js command with inline arguments. For example:
node script.js 123
All routes are set and working. I have a CliController file that handles requests and has to run the command above. Currently I am using the following code:
cp.exec(`node script.js ${ip}`, function (err, stdout, stderr) {
if (err) {
console.log(err);
}
console.log(stdout);
console.log(stderr);
});
The script.js file is in root folder of the project, but the project itself was built by using express-generator and is being served using node bin/www command. There is a service/process on the server that runs nodemon to restart this project if it fails as well. Therefore I do not have access to output of that particular process.
If I run the command above in the terminal (from the root folder of the project, to be precise), it works fine and I see the output of the script. But if I press the button on the webpage to make a request, I am pretty sure that the script does not execute, because it has to make an update to database and I do not see any changes. I also tried to use child_process.spawn and child_process.fork and failed to get it working.
I also tried to kill nodemon and quickly start the project again to the see console output. If I do this, everything works.
What am I doing wrong ?
The process invoked may be in a blocking state, hence the parent script is simply waiting for the children process to terminate, or return something.
We can avoid this behaviour right into the shell command, by adding & (ampersand control operator) at the end.
This makes a command running in the background. (Notice, you can still control the children(s) process using the PID's and POSIX signals, this is another subject, but very related and you might find it very handy pretty soon).
Also notice that killing/stopping the parent script will also kill the children(s). This can be avoided using nohup.
This is not linked to JavaScript or node.js, but to bash, and can be used with anything in the shell.
cp.exec(`node script.js ${ip} &`, function (err, stdout, stderr) {
if (err) {
console.log(err);
}
console.log(stdout);
console.log(stderr);
});
Bash reference manual

Firebase Cloud Functions to run Python script

I have an Express application that spawns a Python process to execute a Python script. When I do a firebase serve, I can see that my endpoint is being hit, which then runs the Python process. However, the process doesn't seem to be executing.
const runPythonScript = () => {
return new Promise((resolve, reject) => {
let value;
const spawn = require('child_process').spawn;
const pythonProcess = spawn('python', ['./myScript.py']);
pythonProcess.stdout.on('data', (data: string) => {
console.log('Am I being hit?') // This line is not being hit
value = JSON.parse(data);
});
pythonProcess.on('exit', (code: number) => {
if (code === 0) {
resolve(value);
}
else {
reject(value);
}
});
});
}
From the comment in the code above, the listener for stdout 'data' is not being hit. I'm not too familiar with Firebase, but my idea is to use Firebase Hosting for my frontend and then Firebase Cloud Functions to run my Express server. Is there anything that I need to do in order for my application to run the Python script?
From what I've gathered from other StackOverflow posts (here), I can't run a Python process, perhaps because Firebase Cloud Functions does not have Python installed. So instead, I need to package my Python script into an executable (as described here), so that Firebase Cloud Functions can just run the executable. Is this correct? If so, I would prefer not to have to package all of my Python scripts. Is there a better approach to handling this? Is it free?
From what I've gathered from other StackOverflow posts, I can't run a Python process, perhaps because Firebase Cloud Functions does not have Python installed.
This is true.
So instead, I need to package my Python script into an executable (as described here), so that Firebase Cloud Functions can just run the executable. Is this correct?
You can certainly try this, but I don't recommend it. It sounds like a lot of work for little benefit, especially when you have other options.
Is there a better approach to handling this?
You can write Cloud Functions natively in python. You just won't be able to use Firebase tools to test and deploy them. Google Cloud has everything you need to get started.

Will spawning a python script twice from NodeJS mess with the stdout.flush returned values?

I have a route on my NodeJS server which will spawn a python script like so:
const pythonProcess = spawn("python3",["/var/www/ip/uploads/script.py", project_num])
I then listen for a stdout flush like so:
pythonProcess.stdout.on('data', (data) => {
})
If the route is called by 2 different clients at roughly the same time, will they interfere with each other? I have seen some cases of the 2 clients getting the same data returned by stdout, when it should have been different. This might be caused by my python script, but I also have a hunch that getting data from stdout can be a bit shoddy.

Node.js Spawn vs. Execute

In an online training video I am watching to learn Node, the narrator says that "spawn is better for longer processes involving large amounts of data, whereas execute is better for short bits of data."
Why is this? What is the difference between the child_process spawn and execute functions in Node.js, and when do I know which one to use?
The main difference is that spawn is more suitable for long-running processes with huge output. That's because spawn streams input/output with a child process. On the other hand, exec buffers output in a small (by default 200K) buffer. exec first spawns a subshell, and then tries to execute your process. To cut a long story short, use spawn in case you need a lot of data streamed from a child process and exec if you need features like shell pipes, redirects or even more than one program at a time.
Some useful links - DZone Hacksparrow
child process created by spawn()
does not spawn a shell
streams the data returned by the child process (data flow is constant)
has no data transfer size limit
child process created by exec()
does spawn a shell in which the passed command is executed
buffers the data (waits till the process closes and transfers the data in on chunk)
maximum data transfer up to Node.js v.12.x was 200kb (by default), but since Node.js v.12x was increased to 1MB (by default)
-main.js (file)
var {spawn, exec} = require('child_process');
// 'node' is an executable command (can be executed without a shell)
// uses streams to transfer data (spawn.stout)
var spawn = spawn('node', ['module.js']);
spawn.stdout.on('data', function(msg){
console.log(msg.toString())
});
// the 'node module.js' runs in the spawned shell
// transfered data is handled in the callback function
var exec = exec('node module.js', function(err, stdout, stderr){
console.log(stdout);
});
-module.js (basically returns a message every second for 5 seconds than exits)
var interval;
interval = setInterval(function(){
console.log( 'module data' );
if(interval._idleStart > 5000) clearInterval(interval);
}, 1000);
the spawn() child process returns the message module data every 1 second for 5 seconds, because the data is 'streamed'
the exec() child process returns one message only module data module data module data module data module data after 5 seconds (when the process is closed) this is because the data is 'buffered'
NOTE that neither the spawn() nor the exec() child processes are designed for running node modules, this demo is just for showing the difference, (if you want to run node modules as child processes use the fork() method instead)
A good place to start is the NodeJS documentation.
For 'spawn' the documentation state:
The child_process.spawn() method spawns a new process using the given command, with command line arguments in args. If omitted, args defaults to an empty array.
While for 'exec':
Spawns a shell then executes the command within that shell, buffering any generated output. The command string passed to the exec function is processed directly by the shell and special characters (vary based on shell) need to be dealt with accordingly.
The main thing appears to be whether you need handle the output of the command or not, which I imagine could be the factor impacting performance (I haven't compared). If you care only about process completion then 'exec' would be your choice. Spawn opens streams for stdout and stderr with ondata events, exec just returns a buffer with stdout and stderr as strings.
A quote from the official docs:
For convenience, the child_process module provides a handful of synchronous and asynchronous alternatives to child_process.spawn() and child_process.spawnSync(). Each of these alternatives are implemented on top of child_process.spawn() or child_process.spawnSync().

How Can I write an AWS Lambda Script that Runs a Protractor / Selenium Browser Automation Script?

I am very much enjoying AWS Lambda functions, and I'm wondering if what I want to do here is possible. On my local machine, I have a Protractor config file :
// conf.js
exports.config = {
framework: 'jasmine',
seleniumAddress: 'http://127.0.0.1:4444/wd/hub',
specs: ['automation-script.js'],
capabilities: {
browserName: 'chrome'
}
}
and a script that loads up a browser window with a certain url:
describe('Protractor Demo App', function() {
it('should have a title', function() {
browser.driver.get('https://github.com/');
// Click around and do things here.
});
});
The purpose my scripts right now are not to black-box test an application that I'm developing, but instead to automate common browser tasks that I don't feel like doing.
Currently, I'm running the protractor script through my local command shell like this:
protractor protractor.conf.js
I'm wondering if it is possibly to run protractor from within another node.js script. My thinking is that I could have the Lambda function kick off a protractor job, possibly by using the browsers available from Browserstack or Sauce Labs, but I can't figure out how to run protractor from a Node.js script.
This is a really interesting question. Our organization has been probing how much of our CI/CD pipeline can be done in a serverless fashion. This is right up that alley.
Unfortunately, I don't think there is an elegant way to run protractor from another Node script. That is, protractor doesn't seem to expose an API that makes it easy to consume in such a manner.
It's been asked for, but (as a relative newcomer to protractor) the comment right before the issue was closed doesn't contain enough detail for me to know how to take that approach. So, the not-so-elegant approach:
Child Process
Prior comments notwithstanding, you can indeed run protractor from within another Node script, including a Node script executing in AWS' Lambda environment. There may be prettier/better ways to do this, but I took this answer and based the following Lambda function on it:
'use strict';
module.exports.runtest = (event, context, callback) => {
var npm = require('npm');
var path = require('path');
var childProcess = require('child_process');
var args = ['conf.js'];
npm.load({}, function() {
var child = childProcess
.fork(path.join(npm.root, 'protractor/bin/protractor'), args)
.on('close', function(errorCode) {
const response = {
statusCode: 200,
body: JSON.stringify({
message: `Selenium Test executed on BrowserStack! Child process Error Code: ${errorCode}`,
}),
};
callback(null, response);
});
process.on('SIGINT', child.kill);
});
};
var args = ['conf.js']; points to the protractor config file, which in turn points to the test (index.js in this case):
exports.config = {
'specs': ['./index.js'],
'seleniumAddress': 'http://hub-cloud.browserstack.com/wd/hub',
'capabilities': {
'browserstack.user': '<BROWSERSTACK_USER>',
'browserstack.key': '<BROWSERSTACK_KEY>',
'browserName': 'chrome'
}
};
Repository here.
Notes
npm is a runtime dependency using this approach, meaning it has to be packaged into your deployable. This makes for a relatively large lambda function. At ~20mb, it's big enough that you don't get to edit code inline in the AWS console anymore. An approach that didn't package npm as a runtime dependency would be much nicer.
Don't forget Lambda has a hard 5 minute time limit. Your tests will need to complete in less time than that.
Watch the clock. In many instances, my toy example only uses a browser for a couple of seconds, but the overhead (of connecting to BrowserStack, mostly, I presume) makes the Lambda take 12-30 seconds altogether. Paying for 30 seconds of compute to use a browser for 2.5 seconds doesn't sound like a win. Larger batches of tests might be less wasteful.
You do get CloudWatch logging of the child process without doing any extra plumbing yourself, which is nice.
Disclaimer: My example has only been happy-path tested, and I'm no expert on child processes in Node.

Categories