How to run simultaneous Node child processes - javascript

TL;DR: I have an endpoint on an Express server that runs some cpu-bound logic in a child_process. The problem is that if the server gets more than one request for that endpoint it won't run both requests simultaneously- it queues them up and runs them one-at-a-time. Is there a way to use Node child_process so that my server will perform multiple child processes simultaneously?
Long-Version: The major downfall of Node is that it is single-threaded and a logic-heavy (cpu-bound) request can make the server stop dead in its tracks so that it can't take anymore requests until that logic is finished running. I thought that I could work around this using child_process, which is working great in freeing up my server to take other requests. BUT- it will only execute child_processes one at a time, creating a queue that can get pretty backed-up. I also have a Node cluster setup so that my server is split into 8 separate "virtual servers" (8-core machine), so I guess I can technically run 8 of these child processes at once, but I want to be able to handle more traffic than that. Looking for a solution that will still allow me to use Node and Express, please only suggest using different technologies if you are absolutely sure this can't be efficiently done in my current environment. Thanks in advance for the help!
Endpoint:
app.get('/cpu-exec-file', function(req, res) {
child_process.execFile('node', ['./blocking_tasks/mathCruncher.js'], {timeout:30000}, function(err, stdout, stderr) {
var data = JSON.parse(stdout);
res.send(data);
})
});
mathCruncher.js:
var obj = {}
function myLoop (i) {
setTimeout(function () {
obj[i] = Math.random() * 100;
if (--i) {
myLoop(i);
} else {
string = JSON.stringify(obj);
console.log(string); // goes to stdout.
}
}, 1000)
};
myLoop(10);

Is there a way to use Node child_process so that my server will perform multiple child processes simultaneously?
message queue and back-end process.
i do exactly what you're wanting, using RabbitMQ. there are several other great messaging systems out there, like ZeroMQ and even Redis w/ some pub-sub libraries on top of it.
the gist of it is to send a request to your queueing system and have another process pick up the message, then run the process to do the work.
if you need a response from the worker, you can use bi-directional messaging with either a Request/Reply setup, or use status messages for really-long-running things.
if you're interested in the RabbitMQ side of things, I have a free email course on various patterns with RabbitMQ, including Request/Reply and status emails: http://derickbailey.com/email-courses/rabbitmq-patterns-for-applications/
and if you're interested in ground-up training on RMQ w/ Node, check out my training course at http://rabbitmq4devs.com

Related

How to run child process in Mean Stack

I have an Mean application which uses nodejs, angularjs and expressjs.
Here I have called my server from the angular controller as below
Angular Controller.js
$http.post('/sample', $scope.sample).then(function (response) {
--
--
}
and in Server.js as below
app.post('/sample', userController.postsample);
Here I am doing my operation with mongodb in that post sample from above code.
Here I got struck how to do my calculation part like I have a big calculation which takes some time (assume 1 hour) to complete. So from client side I will trigger that calculation from my angular controller.
My problem is that calculation should run in separately so that other UIs and operations of other pages should not be interupted.
I had seen that child process in nodejs but I didn't understand how to trigger or exec that from child process from controller and if it get request in app.post then is it possible to access other pages.
EDIT:
I have planned to do in Spawn a child_process but I have another problem continuing the above.
Lets consider application contains 3 users and 2 users are accessing the application at same time.
My case is If first person triggered the child_process name it as first operation and it is in process and at that moment when second person need to trigger the process name it as 2nd operation as he also needed to calculate.
Here my questions are
What happens if another person started the spawn command. If it hangs or keep in queue or both execute parallel.
If 2nd operation is in queue then when it will start the operation.
If 2nd operation is in queue then how can i know how many are in queue at a point of time
Can anyone help to solve.
Note: the question was edited - see updates below.
You have few options to do it.
The most straightforward way would be to spawn the child process from your Express controller that would return the response to the client once the calculation is done, but if it takes so long then you may have problems with socket timeouts etc. This will not block your server or the client (if you don't use "Sync" function on the server and synchronous AJAX on the client) but you will have problems with the connection hanging for so long.
Another option would be to use WebSocket or Socket.io for those requests. The client could post a message to the server that it wants some computation to get started and the server could spawn the child process, do other things and when the child returns just send the message to the client. The disadvantage of that is a new way of communication but at least there would be no problems with timeouts.
To see how to combine WebSocket or Socket.io with Express, see this answer that has examples for both WebSocket and Socket.io - it's very simple actually:
Differences between socket.io and websockets
Either way, to spawn a child process you can use:
spawn
exec
execFile
fork
from the core child_process module. Just make sure to never use any functions with "Sync" in their name for what you want to do because those would block your server from serving other requests for the entire time of waiting for the child to finish - which may be an hour in your case, but even if it would be a second it could still ruin the concurrency completely.
See the docs:
https://nodejs.org/api/child_process.html
Update
Some update for the edited question. Consider this example shell script:
#!/bin/sh
sleep 5
date -Is
It waits for 5 seconds and prints the current time. Now consider this example Node app:
let child_process = require('child_process');
let app = require('express')();
app.get('/test', (req, res) => {
child_process.execFile('./script.sh', (err, data) => {
if (err) {
return res.status(500).send('Error');
}
res.send(data);
});
});
app.listen(3344, () => console.log('Listening on 3344'));
Or using ES2017 syntax:
let child_process = require('mz/child_process');
let app = require('express')();
app.get('/test', async (req, res) => {
try {
res.send((await child_process.execFile('./script.sh'))[0]);
} catch (err) {
res.status(500).send('Error');
}
});
app.listen(3344, () => console.log('Listening on 3344'));
It runs that shell script for requests on GET /test and returns the result.
Now start two requests at the same time:
curl localhost:3344/test & curl localhost:3344/test & curl localhost:3344/test &
and see what happens. If the returned times differ by 5 seconds and you get one response after another with 5 seconds intervals then the operations are queued. If you get all responses at the same time with more or less the same timestamp then those are all run in parallel.
Sometimes it's best to make an experiment like this to see what happens.

Node Cluster is not dispatching task to another worker available

This is my first question of Stack-overflow so please pardon me for any mistake or insufficient information in this question.
So, I am trying to use cluster module of nodeJS for my server and I run nodeJS through my windows machine. I know nodeJS does not have any scheduling policy for cluster module in windows so I have explicitly set the scheduling_policy to rr as mentioned by nodeJS docs.
But the problem is when I am trying to keep one worker busy by putting it in an infinite loop; server is not dispatching the request to another worker available and free when we tried to request the server for '/' resource.
Please help me why it is not dispatching the request to other workers.
var cluster=require('cluster');
if(cluster.isMaster){
var cores=require('os').cpus().length;
console.log("Master Cluster setting up :-"+cores+" workers");
for(var i=0;i<cores;i++)
cluster.fork();
cluster.on('online',(worker)=>{
console.log("Worker with Process ID :- "+worker.process.pid+" online");
});
cluster.on('exit',(worker)=>{
console.log("worker "+worker.process.pid+" died...So setting up a new worker");
cluster.fork();
});
}
else{
var app=require('express')();
app.get('/',(req,res)=>{
console.log("Process with pid "+process.pid+" is handling this request");
while(true);
res.write("Yes!");
res.end();
//while(true);
})
app.listen('3000');
}
The above code you written working good. But you need to send concurrent requests to your node server instead of sending requesting in loop then, you will see the power of node cluster module.
Node cluster module has two approaches for distributing incoming connections.
The first one (and the default one on all platforms except Windows), is the round-robin approach.
The second approach is where the master process creates the listen socket and sends it to interested workers. The workers then accept incoming connections directly
You can use http://blog.remarkablelabs.com/2012/11/benchmarking-and-load-testing-with-siege for sending concurrent request.Now you will see switching to your process.

Meteor - How to Run Multiple Server Processes Simultaneously?

My Meteor app needs to run 13 separate server processes, each on a setInterval. Essentially, I am pinging 13 different external APIs for new data, and performing calculations on the response and storing the results in Mongo. Each process looks something like this:
Meteor.setInterval(function () {
try {
var response = Meteor.http.call("GET", <url>, {
params: <params>,
timeout: <timeout>
});
}
catch (err) {
console.log(err);
}
if (response.statusCode === 200) {
// handle the response
...
}
}, 10000);
Unfortunately, Meteor chokes up after only three of these interval functions are turned on and running side by side. I start getting socket hangup errors and JS Allocation Failed errors thrown in console. I presume this has something to do with Node's single-threading. Does anybody know what the solution is for this? I've looked long and hard... I'm really wondering if I have to split out the back-end from 1 Meteor app with 13 processes (which doesn't seem to run) to 13 Meteors (or Node.js apps), each with 1 process. Thanks!
Try https://atmospherejs.com/vsivsi/job-collection
Benefits:
Jobs can be added to a queue, and you have granular control over when they succeed or fail... failed jobs can easily be re-queued.
It's automatically clustered against all of your Meteor processes that are tied to the same collection.
Status update: a large part of the problem has to do with Node being single-threaded. I solved the CPU limitation problem by splitting out this monolithic Meteor app into 13 microservices Meteor apps, all connected to the same MongoDB replica set.
This way, all cores on the CPU are being utilized, rather than Meteor trying to handle all requests and processes on just one.

Node.js: Closing all Redis clients on shutdown

Today, I integrated Redis into my node.js application and am using it as a session store. Basically, upon successful authentication, I store the corresponding user object in Redis.
When I receive http requests after authentication, I attempt to retrieve the user object from Redis using a hash. If the retrieval was successful, that means the user is logged in and the request can be fulfilled.
The act of storing the user object in Redis and the retrieval happen in two different files, so I have one Redis client in each file.
Question 1:
Is it ok having two Redis clients, one in each file? Or should I instantiate only one client and use it across all areas of the application?
Question 2:
Does the node-redis library provide a method to show a list of connected clients? If it does, I will be able to iterate through the list, and call client.quit() for each of them when the server is shutting down.
By the way, this is how I'm implementing the "graceful shutdown" of the server:
//Gracefully shutdown and perform clean-up when kill signal is received
process.on('SIGINT', cleanup);
process.on('SIGTERM', cleanup);
function cleanup() {
server.stop(function() {
//todo: quit all connected redis clients
console.log('Server stopped.');
//exit the process
process.exit();
});
};
In terms of design and performance, it's best to create one client and use it across your application. This is pretty easy to do in node. I'm assuming you're using the redis npm package.
First, create a file named redis.js with the following contents:
const redis = require('redis');
const RedisClient = (function() {
return redis.createClient();
})();
module.exports = RedisClient
Then, say in a file set.js, you would use it as so:
const client = require('./redis');
client.set('key', 'value');
Then, in your index.js file, you can import it and close the connection on exit:
const client = require('./redis');
process.on('SIGINT', cleanup);
process.on('SIGTERM', cleanup);
function cleanup() {
client.quit(function() {
console.log('Redis client stopped.');
server.stop(function() {
console.log('Server stopped.');
process.exit();
});
});
};
Using multiple connections may be required by how the application uses Redis.
For instance, as soon as a connection is used the purpose of listening to a pub/sub channel, then it can only be used for this and nothing else. Per the documentation on SUBSCRIBE:
Once the client enters the subscribed state it is not supposed to issue any other commands, except for additional SUBSCRIBE, PSUBSCRIBE, UNSUBSCRIBE and PUNSUBSCRIBE commands.
So if your application needs to subscribe to channels and use Redis as general value cache, then it needs two clients at a minimum: one for subscribing to channels and one for using Redis as a cache.
There are also Redis commands that are blocking like BLPOP. A busy web server normally replies to multiple requests at once. Suppose that for answering request A the server uses its Redis client to issue a blocking command. Then request B comes and the server needs to answer Redis with a non-blocking command but the client is still waiting for the blocking command issued for request A to finish. Now the response to request B is delayed by another request. This can be avoided by using a different client for the second request.
If you do not use any of the facilities that require more than one connection, then you can and should use just one connection.
If the way you use Redis is such that you need more than one connection, and you just need a list of connections but no sophisticated connection management, you could just create your own factory function: it would call redis.createClient() and save the client before returning it. Then at shutdown time, you could go over the list of saved clients and close them. Unfortunately, node-redis does not provide such functionality built-in.
If you need more sophisticated client management than the factory function described above, then the typical way to manage the multiple connections created is to use a connection pool but node-redis does not provide one. I usually access Redis through Python code so I don't have a recommendation for Node.js libraries, but an npm search shows quite a few candidates.

What's the most efficient node.js inter-process communication library/method?

We have few node.js processes that should be able to pass messages,
What's the most efficient way doing that?
How about using node_redis pub/sub
EDIT: the processes might run on different machines
If you want to send messages from one machine to another and do not care about callbacks then Redis pub/sub is the best solution. It's really easy to implement and Redis is really fast.
First you have to install Redis on one of your machines.
Its really easy to connect to Redis:
var client = require('redis').createClient(redis_port, redis_host);
But do not forget about opening Redis port in your firewall!
Then you have to subscribe each machine to some channel:
client.on('ready', function() {
return client.subscribe('your_namespace:machine_name');
});
client.on('message', function(channel, json_message) {
var message;
message = JSON.parse(json_message);
// do whatever you vant with the message
});
You may skip your_namespace and use global namespace, but you will regret it sooner or later.
It's really easy to send messages, too:
var send_message = function(machine_name, message) {
return client.publish("your_namespace:" + machine_name, JSON.stringify(message));
};
If you want to send different kinds of messages, you can use pmessages instead of messages:
client.on('ready', function() {
return client.psubscribe('your_namespace:machine_name:*');
});
client.on('pmessage', function(pattern, channel, json_message) {
// pattern === 'your_namespace:machine_name:*'
// channel === 'your_namespace:machine_name:'+message_type
var message = JSON.parse(message);
var message_type = channel.split(':')[2];
// do whatever you want with the message and message_type
});
send_message = function(machine_name, message_type, message) {
return client.publish([
'your_namespace',
machine_name,
message_type
].join(':'), JSON.stringify(message));
};
The best practice is to name your processes (or machines) by their functionality (e.g. 'send_email'). In that case process (or machine) may be subscribed to more than one channel if it implements more than one functionality.
Actually, it's possible to build a bi-directional communication using redis. But it's more tricky since it would require to add unique callback channel name to each message in order to receive callback without losing context.
So, my conclusion is this: Use Redis if you need "send and forget" communication, investigate another solutions if you need full-fledged bi-directional communication.
Why not use ZeroMQ/0mq for IPC? Redis (a database) is over-kill for doing something as simple as IPC.
Quoting the guide:
ØMQ (ZeroMQ, 0MQ, zmq) looks like an embeddable networking library
but acts like a concurrency framework. It gives you sockets that carry
atomic messages across various transports like in-process,
inter-process, TCP, and multicast. You can connect sockets N-to-N with
patterns like fanout, pub-sub, task distribution, and request-reply.
It's fast enough to be the fabric for clustered products. Its
asynchronous I/O model gives you scalable multicore applications,
built as asynchronous message-processing tasks.
The advantage of using 0MQ (or even vanilla sockets via net library in Node core, minus all the features provided by a 0MQ socket) is that there is no master process. Its broker-less setup is best fit for the scenario you describe. If you are just pushing out messages to various nodes from one central process you can use PUB/SUB socket in 0mq (also supports IP multicast via PGM/EPGM). Apart from that, 0mq also provides for various different socket types (PUSH/PULL/XREP/XREQ/ROUTER/DEALER) with which you can create custom devices.
Start with this excellent guide:
http://zguide.zeromq.org/page:all
For 0MQ 2.x:
http://github.com/JustinTulloss/zeromq.node
For 0MQ 3.x (A fork of the above module. This supports PUBLISHER side filtering for PUBSUB):
http://github.com/shripadk/zeromq.node
More than 4 years after the question being ask there is an interprocess communication module called node-ipc. It supports unix/windows sockets for communication on the same machine as well as TCP, TLS and UDP, claiming that at least sockets, TCP and UDP are stable.
Here is a small example taken from the documentation from the github repository:
Server for Unix Sockets, Windows Sockets & TCP Sockets
var ipc=require('node-ipc');
ipc.config.id = 'world';
ipc.config.retry= 1500;
ipc.serve(
function(){
ipc.server.on(
'message',
function(data,socket){
ipc.log('got a message : '.debug, data);
ipc.server.emit(
socket,
'message',
data+' world!'
);
}
);
}
);
ipc.server.start();
Client for Unix Sockets & TCP Sockets
var ipc=require('node-ipc');
ipc.config.id = 'hello';
ipc.config.retry= 1500;
ipc.connectTo(
'world',
function(){
ipc.of.world.on(
'connect',
function(){
ipc.log('## connected to world ##'.rainbow, ipc.config.delay);
ipc.of.world.emit(
'message',
'hello'
)
}
);
ipc.of.world.on(
'disconnect',
function(){
ipc.log('disconnected from world'.notice);
}
);
ipc.of.world.on(
'message',
function(data){
ipc.log('got a message from world : '.debug, data);
}
);
}
);
Im currently evaluating this module for a replacement local ipc (but could be remote ipc in the future) as a replacement for an old solution via stdin/stdout. Maybe I will expand my answer when I'm done to give some more information how and how good this module works.
i would start with the built in functionality that node provide.
you can use process signalling like:
process.on('SIGINT', function () {
console.log('Got SIGINT. Press Control-D to exit.');
});
this signalling
Emitted when the processes receives a signal. See sigaction(2) for a
list of standard POSIX signal names such as SIGINT, SIGUSR1, etc.
Once you know about process you can spwn a child-process and hook it up to the message event to retrive and send messages. When using child_process.fork() you can write to the child using child.send(message, [sendHandle]) and messages are received by a 'message' event on the child.
Also - you can use cluster. The cluster module allows you to easily create a network of processes that all share server ports.
var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', function(worker, code, signal) {
console.log('worker ' + worker.process.pid + ' died');
});
} else {
// Workers can share any TCP connection
// In this case its a HTTP server
http.createServer(function(req, res) {
res.writeHead(200);
res.end("hello world\n");
}).listen(8000);
}
For 3rd party services you can check:
hook.io, signals and bean.
take a look at node-messenger
https://github.com/weixiyen/messenger.js
will fit most needs easily (pub/sub ... fire and forget .. send/request) with automatic maintained connectionpool
we are working on multi-process node app, which is required to handle large number of real-time cross-process message.
We tried redis-pub-sub first, which failed to meet the requirements.
Then tried tcp socket, which was better, but still not the best.
So we switched to UDP datagram, that is much faster.
Here is the code repo, just a few of lines of code.
https://github.com/SGF-Games/node-udpcomm
I needed IPC between web server processes in another language (Perl;) a couple years ago. After investigating IPC via shared memory, and via Unix signals (e.g. SIGINT and signal handlers), and other options, I finally settled on something quite simple which works quite well and is fast. It may not fit the bill if your processes do not all have access to the same file system, however.
The concept is to use the file system as the communication channel. In my world, I have an EVENTS dir, and under it sub dirs to direct the message to the appropriate process: e.g. /EVENTS/1234/player1 and /EVENTS/1234/player2 where 1234 is a particular game with two different players. If a process wants to be aware of all events happening in the game for a particular player, it can listen to /EVENTS/1234/player1 using (in Node.js):
fs.watch
(or fsPromises.watch)
If a process wanted to listen to all events for a particular game, simply watch /EVENTS/1234 with the 'recursive: true' option set for fs.watch. Or watch /EVENTS to see all msgs -- the event produced by fs.watch will tell you the which file path was modified.
For a more concrete example, I my world I have the web browser client of player1 listening for Server-Sent Events (SSE), and there is a loop running in one particular web server process to send those events. Now, a web server process servicing player2 wants to send a message (IPC) to the server process running the SSEs for player1, but doesn't know which process that might be; it simply writes (or modifies) a file in /EVENTS/1234/player1. That directory is being watched -- via fs.watch -- in the web server process handling SSEs for player1. I find this system very flexible, and fast, and it can also be designed to leave a record of all messages sent. I use it so that one random web server process of many can communicate to one other particular web server process, but it could also be used in an N-to-1 or 1-to-N manner.
Hope this helps someone. You're basically letting the OS and the file system do the work for you. Here are a couple links on how this works in MacOS and Linux:
https://developer.apple.com/library/archive/documentation/Darwin/Conceptual/FSEvents_ProgGuide/Introduction/Introduction.html#//apple_ref/doc/uid/TP40005289
https://man7.org/linux/man-pages/man7/inotify.7.html
Any module you're using in whatever language is hooking into an API like one of these. It's been 30+ years since I've fiddled much with Windows, so I don't know how file system events work there, but I bet there's an equivalent.
EDIT (more info on different platforms from https://nodejs.org/dist/latest-v19.x/docs/api/fs.html#fswatchfilename-options-listener):
Caveats#
The fs.watch API is not 100% consistent across platforms, and is unavailable in some situations.
On Windows, no events will be emitted if the watched directory is moved or renamed. An EPERM error is reported when the watched directory is deleted.
Availability#
This feature depends on the underlying operating system providing a way to be notified of file system changes.
On Linux systems, this uses inotify(7).
On BSD systems, this uses kqueue(2).
On macOS, this uses kqueue(2) for files and FSEvents for directories.
On SunOS systems (including Solaris and SmartOS), this uses event ports.
On Windows systems, this feature depends on ReadDirectoryChangesW.
On AIX systems, this feature depends on AHAFS, which must be enabled.
On IBM i systems, this feature is not supported.
If the underlying functionality is not available for some reason, then fs.watch() will not be able to function and may throw an exception. For example, watching files or directories can be unreliable, and in some cases impossible, on network file systems (NFS, SMB, etc) or host file systems when using virtualization software such as Vagrant or Docker.
It is still possible to use fs.watchFile(), which uses stat polling, but this method is slower and less reliable.
EDIT2: https://www.npmjs.com/package/node-watch is a wrapper that may help on some platforms
Not everybody knows that pm2 has an API thanks to which you can communicate to its processes.
// pm2-call.js:
import pm2 from "pm2";
pm2.connect(() => {
pm2.sendDataToProcessId(
{
type: "process:msg",
data: {
some: "data",
hello: true,
},
id: 0,
topic: "some topic",
},
(err, res) => {}
);
});
pm2.launchBus((err, bus) => {
bus.on("process:msg", (packet) => {
packet.data.success.should.eql(true);
packet.process.pm_id.should.eql(proc1.pm2_env.pm_id);
done();
});
});
// pm2-app.js:
process.on("message", (packet) => {
process.send({
type: "process:msg",
data: {
success: true,
},
});
});

Categories