Node.js Spawn vs. Execute - javascript

In an online training video I am watching to learn Node, the narrator says that "spawn is better for longer processes involving large amounts of data, whereas execute is better for short bits of data."
Why is this? What is the difference between the child_process spawn and execute functions in Node.js, and when do I know which one to use?

The main difference is that spawn is more suitable for long-running processes with huge output. That's because spawn streams input/output with a child process. On the other hand, exec buffers output in a small (by default 200K) buffer. exec first spawns a subshell, and then tries to execute your process. To cut a long story short, use spawn in case you need a lot of data streamed from a child process and exec if you need features like shell pipes, redirects or even more than one program at a time.
Some useful links - DZone Hacksparrow

child process created by spawn()
does not spawn a shell
streams the data returned by the child process (data flow is constant)
has no data transfer size limit
child process created by exec()
does spawn a shell in which the passed command is executed
buffers the data (waits till the process closes and transfers the data in on chunk)
maximum data transfer up to Node.js v.12.x was 200kb (by default), but since Node.js v.12x was increased to 1MB (by default)
-main.js (file)
var {spawn, exec} = require('child_process');
// 'node' is an executable command (can be executed without a shell)
// uses streams to transfer data (spawn.stout)
var spawn = spawn('node', ['module.js']);
spawn.stdout.on('data', function(msg){
console.log(msg.toString())
});
// the 'node module.js' runs in the spawned shell
// transfered data is handled in the callback function
var exec = exec('node module.js', function(err, stdout, stderr){
console.log(stdout);
});
-module.js (basically returns a message every second for 5 seconds than exits)
var interval;
interval = setInterval(function(){
console.log( 'module data' );
if(interval._idleStart > 5000) clearInterval(interval);
}, 1000);
the spawn() child process returns the message module data every 1 second for 5 seconds, because the data is 'streamed'
the exec() child process returns one message only module data module data module data module data module data after 5 seconds (when the process is closed) this is because the data is 'buffered'
NOTE that neither the spawn() nor the exec() child processes are designed for running node modules, this demo is just for showing the difference, (if you want to run node modules as child processes use the fork() method instead)

A good place to start is the NodeJS documentation.
For 'spawn' the documentation state:
The child_process.spawn() method spawns a new process using the given command, with command line arguments in args. If omitted, args defaults to an empty array.
While for 'exec':
Spawns a shell then executes the command within that shell, buffering any generated output. The command string passed to the exec function is processed directly by the shell and special characters (vary based on shell) need to be dealt with accordingly.
The main thing appears to be whether you need handle the output of the command or not, which I imagine could be the factor impacting performance (I haven't compared). If you care only about process completion then 'exec' would be your choice. Spawn opens streams for stdout and stderr with ondata events, exec just returns a buffer with stdout and stderr as strings.

A quote from the official docs:
For convenience, the child_process module provides a handful of synchronous and asynchronous alternatives to child_process.spawn() and child_process.spawnSync(). Each of these alternatives are implemented on top of child_process.spawn() or child_process.spawnSync().

Related

Node spawn child process doesn't execute the command after exec child process in aws node 10 lambda

I am attempting to run 2 child processes, but one seems to be blocked and eventually times out the node lambda.
Environment:
AWS node 10 lambda running in a docker container.
Accesses ffmpeg and ffprobe via a lambda layer in the /opt/bin directory.
child_process.exec
I am running ffprobe in a child_process.exec to get the file format of an audio file. I am using exec because the output is a small json response (which shouldn't consume much memory).
child_process.spawn
Shortly after I run ffmpeg to convert the audio file to mp3 using child_process.spawn.
The problem is the FFMPEG child_process.spawn command doesn't run after ffprobe (even though ffprobe successfully completes). If I don't run the ffprobe command the FFMPEG command runs perfectly.
Which leads me to believing this is an issue with how I am dealing with child processes in node.
Is it possible the child_process.exec ffprobe command is somehow still running/ blocking the new ffmpeg (child_process.spawn) command from running - if so how do I check this?
When I access the running processes in the docker container only the new ffmpeg command seems to be running, although it consumes no memory and just hangs - seemingly doing nothing. I even tried launching the ffmpeg command from the docker cli (avoiding using the node env) and this works fine and runs as expected.
So it seems my issue wasn't really between exec and spawn, I am not 100% sure but I think it could be that the child process was preserved in the container and resumed in the next invocation of the lambda.
Changing to child_process.spawnSync waits until the child process exits and keeps things cleaner and I haven't encountered this problem since using this.
A more thorough explanation from someone else would be really appreciated.

How to fork a process of another module

TL;DR : How does one fork a process that is located outside of the current running process?
I'm trying to use child_process of Nodejs in order to start another nodejs process on the parent's process exit.
I successfully executed the process with the exec but I need the child process be independent of the parent, so the parent can exit without waiting for the child, hence I tried using spawn with the detached: true, stdio: 'ignore' option and unref()ing the process:
setting options.detached to true makes it possible for the child process to continue running after the parent exits.
spawn('node MY_PATH', [], {detached: true, stdio: 'ignore'}).unref();
This yields the :
node MY_PATH ENOENT error. which unfortunately I've failed resolve.
After having troubles achieving this with spawn and reading the documentationagain i figured i should actually use fork:
The child_process.fork() method is a special case of child_process.spawn() used specifically to spawn new Node.js processes.
fork() doesnt take a command as its' first argument, but a modulePath which i can't seem to fit since the script I'm trying to run as a child process isnt in the directory of the current running process, but in a dependency of his.
Back to the starting TL;DR - how does one fork a process that is located outside of the current running process?
Any help would be much appreciated!
EDIT:
Providing a solution to the spawn ENOENT error could be very helpfull too!
Following code should allow you to do what you need.
var Path = require('path');
var Spawn = require('child_process').spawn;
var relative_filename = '../../node_modules/bla/bla/bla.js';
Spawn('node', [Path.resolve(__dirname, relative_filename)], {detached: true, stdio: 'ignore'}).unref();
process.exit(0);

node.js run function in child process?

I have a node.js application that receives a file, via a web request and then will apply a conversion process to this file. Since the task is long running this needs to run separate to the main thread.
At the moment I have just called the necessary code via a setTimeout() call. To isolate the main application from the conversion process I would like to move it out into a child process, since it is long running and I would like to isolate the main code from the work being done (am I worrying too much?). At the moment I am calling:
const execFile = require('child_process').execFile;
const child = execFile('node', './myModule.js', (error, stdout, stderr) => {
if (error) {
throw error;
}
console.log(stdout);
});
Is this the right approach in node.js, or is there of simply starting a child process with the module and params specified, but not have to specify 'node' as the executable?
Just seen that node.js provides the 'fork' function, for executing modules, though they will need to be written as if they were expecting command line arguments, processing the process.argv array.
The command call being:
child_process.fork(modulePath[, args][, options])
More details here.
In my specific case forking probably doesn't make sense, since there is already a fork being made by the node.js library I am using.

limit on the number of childProcess.exec() node.js

I'm using node.js to spawn multiple phantomjs workers through childProcess.exec and writing the buffer output to DB. I wanted to know the max number of processes that can be spawned via node before node crashes. Phantomjs script is complex which includes logging in and doing stuff, and takes close to 5seconds to return an output.
code looks like this:
childProcess.exec('./phantomsjs testScript.js', function(err, stdout, stderr){}

Node.js why does a child process not start right away?

I'm trying to write a global node command line program that will take any (windows or unix) console command I give to it and execute it in a new console window. I also want the program to exit after it has spawned its process so the console I'm using isn't blocked by a node script that has a child process running.
This is a simple version of what I have so far:
myScript.js:
var exec = require('child_process').exec;
exec("start startScript.cmd"); // windows start command opens a new cmd window
process.exit(0);
startScript.cmd:
mkdir test
I have also tried this (But this doesn't work even without the process.exit):
myScript.js:
var spawn = require('child_process').spawn;
var child = spawn('start', ['startScript.cmd'], { detached: true, stdio: ['ignore', 'ignore', 'ignore']});
child.unref();
process.exit(0);
The problem is, calling process.exit() seems to prevent the child process from fully starting and so nothing happens unless I do some setTimeout shenanigans. However the behavior seems random. On a different computer it behaves like I want it to. Both computers have the same version of node (v0.10.33).
The directory test is never made unless I remove the process.exit line or use a setTimeout on it.
Any idea why this happens or how to get around it? Keep in mind I don't want to wait until the child process is finished. I want to be able to return to my command line immediately.
Thanks!
I figured it out.
The answer is: don't use exec and don't use start or call or any other weird windows command in the spawn call. Set the CWD if need be in the spawn options and either spawn what you want directly or make a OS specific script that calls your command if it still isn't working. One example of a windows script is literally just:
%*
That way, start or any other weird windows command will be executed correctly (but do you really need them?).

Categories