What I am trying to do:
I write a node.js client/server application which uses persistent websocket connections (no longpolling). I want to stress test my server in order to tune it's performance and know the limits so that I can block new incoming connections due to overloads. Unfortunately I am stuck at one specific amount of socket connections every time, thus I think it may be related to OS or node settings.
The problem:
I have one client which creates 10k socket connections and the server can handle this perfectly fine. When I start the second client with another 10k connections I start getting the following errors on my client application once the server helds 14k concurrent connections:
engine.io-client:socket socket close with reason: "transport error"
+0ms engine.io-client:socket socket error {"type":"TransportError","description":{"code":"ENOBUFS","errno":"ENOBUFS","syscall":"connect","address":"127.0.0.1","port":5433,"type":"error","target":{"domain":null,"_events":{},"_eventsCount":4,"_socket":null,"_ultron":null,"_closeReceived":false,"bytesReceived":0,"readyState":0,"supports":{"binary":true},"extensions":{},"_isServer":false,"url":"ws://127.0.0.1:5433/socket.io/?EIO=3&transport=websocket","protocolVersion":13,"binaryType":"buffer"}}}
+0ms
My client1.js and client2.js look like this:
function createClient(){
var socket = require('socket.io-client')('http://127.0.0.1:5433', {transports:['websocket']});
socket.connect();
// Send a bit data every 2.5s to simulate a bit traffic
setInterval(function () {
socket.emit("sessionCheck", "ok");
}, 2500);
}
var clients=0;
var id =setInterval(function() {
createClient();
clients++;
if(clients >= 10000)
clearInterval(id);
}, 1);
My server.js looks like this:
var http = require('http');
var httpServer = http.createServer(handler);
var io = require('socket.io')(httpServer, {pingTimeout: 60 * 1000, pingInterval: 5 * 1000});
io.on('connection', function (socket) {
socket.on('sessionCheck', function(data){onSessionCheck(socket,data)});
});
setInterval(function(){
console.log(io.engine.clientsCount + " websockets are connected");
}, 5000);
httpServer.listen(5433, "0.0.0.0", 5000); // port, hostname, backlog
console.log("server running now...");
function onSessionCheck(){
//console.log("Incoming check");
}
function handler (req, res) {
fs.readFile(__dirname + '/index.html',
function (err, data) {
if (err) {
res.writeHead(500);
return res.end('Error loading index.html');
}
res.writeHead(200);
res.end(data);
})
};
You said in Slack you already know about –max-old-space-size and you tried it and it did not help. Now I can confirm that using this option (i tried values 2048-65000) prevents me from getting more than ~18-20k concurrent connections from one server to one server. Removing the option gives me ~29500 concurrent connections. I use $20 DigitalOcean vps (2 GB, 2 core) and 29500 takes all the free memory of my server - that is why i cant have more connections for now.
UPDATE
Linux?
Try check tuning the following values on server:
net.nf_conntrack_max (i use 65536)
net.netfilter.nf_conntrack_max (i use 65536)
cat /proc/sys/fs/nr_open gives 1048576 for me
cat /proc/sys/fs/file-nr gives 5056 0 2097152 for me
cat /proc/sys/fs/file-max gives 2097152 for me
net.ipv4.ip_local_port_range (i use 2000 65535)
net.core.somaxconn (i use 65000)
ulimit -n gives me 1048576
This may give you an inkling how to set the params https://easyengine.io/tutorials/linux/sysctl-conf/
Maybe you should check some of the values for your client-servers (like ip_local_port_range, file-nr, ulimit -n)
You might want to check logs (/var/log/messages, /var/log/syslog, dmesg) as well.
I finally got 34469 connections and then oom-killer killed node on my client machine
The solution is, as I already assumed, an option in windows (MaxUserPort), which you can change in the regedit. Check the MSDN description to learn more about it: https://support.microsoft.com/en-us/kb/196271 . You can set it to a maxmimum value of 65534.
Related
I decided that i want to investigate what is the best possible way to handle big amount of traffic with NodeJS server, i did a small test on 2 digital ocean servers which has 1GB RAM / 2 CPUs
No-Cluster server code:
// Include Express
var express = require('express');
// Create a new Express application
var app = express();
// Add a basic route – index page
app.get('/', function (req, res) {
res.redirect('http://www.google.co.il');
});
// Bind to a port
app.listen(3000);
console.log('Application running');
Cluster server code:
// Include the cluster module
var cluster = require('cluster');
// Code to run if we're in the master process
if (cluster.isMaster) {
// Count the machine's CPUs
var cpuCount = require('os').cpus().length;
// Create a worker for each CPU
for (var i = 0; i < cpuCount; i += 1) {
cluster.fork();
}
// Code to run if we're in a worker process
} else {
// Include Express
var express = require('express');
// Create a new Express application
var app = express();
// Add a basic route – index page
app.get('/', function (req, res) {
res.redirect('http://www.walla.co.il');
});
// Bind to a port
app.listen(3001);
console.log('Application running #' + cluster.worker.id);
}
And i sent stress test requests to those servers, i excepted that the cluster server will handle more requests but it didn't happen, both servers crashed on the same load, although 2 node services were running on the cluster and 1 service on the non-cluster.
Now i wonder why ? Did i do anything wrong?
Maybe something else is making the servers reach its breakpoint? both servers crashed at ~800 rps
Now i wonder why ? did i do anything wrong?
Your test server doesn't do anything other than a res.redirect(). If your request handlers use essentially no CPU, then you aren't going to be CPU bound at all and you won't benefit from involving more CPUs. Your cluster will be bottlenecked at the handling of incoming connections which is going to be roughly the same with or without clustering.
Now, add some significant CPU usage to your request handler and you should get a different result.
For example, change to this:
// Add a basic route – index page
app.get('/', function (req, res) {
// spin CPU for 200ms to simulate using some CPU in the request handler
let start = Date.now();
while (Date.now() - start < 200) {}
res.redirect('http://www.walla.co.il');
});
Running tests is a great thing, but you have to be careful what exactly you're testing.
What #jfriend00 says is correct; you aren't actually doing enough heavy lifting to justify this, however, you're not actually sharing the load. See here:
app.listen(3001);
You can't bind two services onto the same port and have the OS magically load-balance them[1]; try adding an error handler on app.listen() and see if you get an error, e.g.
app.listen(3001, (err) => err ? console.error(err));
If you want to do this, you'll have to accept everything in your master, then instruct the workers to do the task, then pass the results back to the master again.
It's generally easier not to do this in your Node program though; your frontend will still be the limiting factor. An easier (and faster) way may be to put a special purpose load-balancer in front of multiple running instances of your application (i.e. HAProxy or Nginx).
[1]: That's actually a lie; sorry. You can do this by specifying SO_REUSEPORT when doing the initial bind call, but you can't explicitly specify that in Node, and Node doesn't specify it for you...so you can't in Node.
I'm transforming my application to node.js cluster which I hope it would boost the performance of my application.
Currently, I'm deploying the application to 2 EC2 t2.medium instances. I have Nginx as a proxy and ELB.
This is my express cluster application which is pretty standard from the documentation.
var bodyParser = require('body-parser');
var cors = require('cors');
var cluster = require('cluster');
var debug = require('debug')('expressapp');
if(cluster.isMaster) {
var numWorkers = require('os').cpus().length;
debug('Master cluster setting up ' + numWorkers + ' workers');
for(var i = 0; i < numWorkers; i++) {
cluster.fork();
}
cluster.on('online', function(worker) {
debug('Worker ' + worker.process.pid + ' is online');
});
cluster.on('exit', function(worker, code, signal) {
debug('Worker ' + worker.process.pid + ' died with code: ' + code + ', and signal: ' + signal);
debug('Starting a new worker');
cluster.fork();
});
} else {
// Express stuff
}
This is my Nginx configuration.
nginx::worker_processes: "%{::processorcount}"
nginx::worker_connections: '1024'
nginx::keepalive_timeout: '65'
I have 2 CPUs on Nginx server.
This is my before performance.
I get 1,500 request/s which is pretty good. Now I thought I would increase the number of connections on Nginx so I can accept more requests. I do this.
nginx::worker_processes: "%{::processorcount}"
nginx::worker_connections: '2048'
nginx::keepalive_timeout: '65'
And this is my after performance.
Which I think it's worse than before.
I use gatling for performance testing and here's the code.
import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._
class LoadTestSparrowCapture extends Simulation {
val httpConf = http
.baseURL("http://ELB")
.acceptHeader("application/json")
.doNotTrackHeader("1")
.acceptLanguageHeader("en-US,en;q=0.5")
.acceptEncodingHeader("gzip, defalt")
.userAgentHeader("Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0")
val headers_10 = Map("Content-Type" -> "application/json")
val scn = scenario("Load Test")
.exec(http("request_1")
.get("/track"))
setUp(
scn.inject(
atOnceUsers(15000)
).protocols(httpConf))
}
I deployed this to my gatling cluster. So, I have 3 EC2 instances firing 15,000 requests in 30s to my application.
The question is, is there anything I can do to increase my performance of my application or I just need to add more machines?
The route that I'm testing is pretty simple, I get the request and send it off to RabbitMQ so it can be processed further. So, the response of that route is pretty fast.
You've mentioned that you are using AWS and in the front of your EC2 instances in ELB. As I see you are getting 502 and 503 status codes. These can be sent from ELB or your EC2 instances. Make sure that when doing the load-test you know from where the errors are coming from. You can check this in AWS console in ELB CloudWatch metrics.
Basically HTTPCode_ELB_5XX means your ELB sent 50x. On other hand HTTPCode_Backend_5XX sent 50x. You can also verify that in the logs of ELB. Better explanation of errors of ELB you can find here.
To load-test on AWS you should definitely read this. Point is that ELB is just another set of machines, which needs to scale if your load increases. Default scaling strategy is (cited from the section "Ramping Up Testing"):
Once you have a testing tool in place, you will need to define the growth in the load. We recommend that you increase the load at a rate of no more than 50 percent every five minutes.
That means when you start at some number of concurrent users, lets say 1000, per default you should increase only up to 1500 within 5 minutes. This will guarantee that ELB will scale with load on your servers. Exact numbers may vary and you have to test them on your own. Last time I've tested it sustained load of 1200 req./s w/o an issue and then I've started to receive 50x. You can test it easily running ramp-up scenario from X to Y users from single client and waiting for 50x.
Next very important thing (from part "DNS Resoultion") is:
If clients do not re-resolve the DNS at least once per minute, then the new resources Elastic Load Balancing adds to DNS will not be used by clients.
In short it means that you have to guarantee that TTL in DNS is respected, or that your clients re-resolve and rotate DNS IPs which they received by doing DNS lookup to guarantee round-robin fashion to distributing load. If not (e.g. testing from only one client, not your case) you can skew the results by overloading one instance of ELB by targeting all the traffic only to one instance. That means ELB will not scale at all.
Hope it will help.
I have made small reasearch about patterns supported by zeromq. I would like to describe problem with PUB/SUB pattern, but probably I discover this problem in my recent project also in PUSH/PULL pattern. I use NodeJS zeromq implementation.
I prepare two examples (server.js & client.js). I recognized that first message from server.js is lost every time I restart server (message is send every 1 second). client.js doesn't get first message. It is probably caused by to short time before sending messages. When I start sending messages after some time (e.g. 1 second) everything works fine. I thing that zmq needs some time for initialization connection between publisher and subscriber.
I would like to know when producer (server) is ready to sending messages for subscribed clients. How get this information?
I don't understand why client.js connected and subscribed for messages doesn't get it, because server is not ready for support subscriptions after restart.
Maybe it works like this by design.
server.js:
var zmq = require('zmq');
console.log('server zmq: ' + zmq.version);
var publisher = zmq.socket('pub');
publisher.bindSync("tcp://*:5555");
var i = 0;
var msg = "get_status OK ";
function sendMsg () {
console.log(msg + i);
publisher.send(msg + i);
i++;
setTimeout(sendMsg, 1000);
}
sendMsg();
process.on('SIGINT', function() {
publisher.close();
process.exit();
});
client.js:
var zmq = require('zmq');
console.log('client zmq: ' + zmq.version);
var subscriber = zmq.socket('sub');
subscriber.subscribe("get_status");
subscriber.on('message', function(data) {
console.log(data.toString());
});
subscriber.connect("tcp://127.0.0.1:5555");
process.on('SIGINT', function() {
subscriber.close();
process.exit();
});
In the node zmq lib repo you have stated the supported monitoring events. Subscribing to this will allow you to monitor your connection, in this case the accept event. However don't forget that you'll also have to call the monitor() function on the socket to activate monitoring.
You should end up with something like:
var publisher = zmq.socket('pub');
publisher.on('accept', function(fd, ep) {
sendMsg();
});
publisher.monitor(100, 0);
publisher.bindSync("tcp://*:5555");
I have a problem with codding a node.js program that forwards traffic from an port to another. The scenario goes like this. I forward all traffic from port 55555 to a sshtunnel that have a SOCKS5 opened on port 44444. The thing is that everything works smoothly, until i run the command htop -d 1 and i see high load when i am visiting 2-3 sites simoultaniously. If i go trough SOCKS5 SOCKS sshtunnel directly i see load at peek 1% of a core, but with node.js i se 22% 26% 60% 70% even 100% sometimes. What is happening, why is this? I mean think about when i open like 1000 of those what would happen!!
Here is my first try (proxy1.js) :
var net = require('net');
require('longjohn');
var regex = /^[\x09\x0A\x0D\x20-\x7E]+$/;
var regexIP = /^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$/;
// parse "80" and "localhost:80" or even "42mEANINg-life.com:80"
var addrRegex = /^(([a-zA-Z\-\.0-9]+):)?(\d+)$/;
var addr = {
from: addrRegex.exec(process.argv[2]),
to: addrRegex.exec(process.argv[3])
};
if (!addr.from || !addr.to) {s=
console.log('Usage: <from> <to>');
}
net.createServer(function(from) {
var to = net.createConnection({
host: addr.to[2],
port: addr.to[3]
});
// REQUESTS BEGIN
from.on('data', function(data){
});
from.on('end', function(end){
});
from.on('close', function(close){
});
// error handeling
from.on('error', function(error)
{
});
from.pipe(to);
// REQUESTS END
// RESPONSES BEGIN
to.on('data', function(data){
});
to.on('end', function(end){
});
to.on('close', function(close){
});
to.on('error', function(error)
{
});
to.pipe(from);
// RESPONSES END
}).listen(addr.from[3], addr.from[2]);
Here is my second try (proxy2.js) :
var net = require('net');
var sourceport = 55555;
var destport = 62240;
net.createServer(function(s)
{
var buff = "";
var connected = false;
var cli = net.createConnection(destport,"127.0.0.1");
s.on('data', function(d) {
if (connected)
{
cli.write(d);
} else {
buff += d.toString();
}
});
s.on('error', function() {
});
cli.on('connect', function() {
connected = true;
cli.write(buff);
});
cli.on('error', function() {
});
cli.pipe(s);
}).listen(sourceport);
I also tryed to run cpulimit -l 10 nodejs proxy.js 55555 44444 also makes load and it seems like it is oppening new forks, processes ...
cat /etc/issue
Ubuntu 14.04.3 LTS
nodejs --version
v0.10.25
processor
Intel(R) Xeon(R) CPU E3-1246 v3 # 3.50GHz with 8 cores
RAM
32 RAM (that stays free all the time)
Server config :
Why is the load so big?
How can i write the code to not make that load?
Why 'cpulimit -l 10 nodejs proxy.js 55555 44444' dosen't work as expected?
Why node.js is using CPU and not RAM ?
Thanks in advice.
Port is merely a segment in the memory and writing fast on ports may load the CPU because it may create too many async IO requests. However these requests are even though they are IO bound are indirectly CPU bound.
To avoid this problem you may have to limit too many connection requests by streaming data. Rather than sending 1000 small requests, make 100 large requests.
I'm not sure how to solve this or what exactly is happening. May be socket.io with streaming can help.
I've been experimenting with node-serialport library to access devices connected to a USB hub and send/receive data to these devices. The code works fine on linux but on windows(windows 8.1 and windows 7) I get some odd behaviour. It doesn't seem to work for more than 2 devices, it just hangs when writing to the port. The callback for write method never gets called. I'm not sure how to go about debugging this issue. I'm not a windows person, if someone can give me some directions it would be great.
Below is the code I'm currently using to test.
/*
Sample code to debug node-serialport library on windows
*/
//var SerialPort = require("./build/Debug/serialport");
var s = require("./serialport-logger");
var parsers = require('./parsers');
var ee = require('events');
s.list(function(err, ports) {
console.log("Number of ports available: " + ports.length);
ports.forEach(function(port) {
var cName = port.comName,
sp;
//console.log(cName);
sp = new s.SerialPort(cName, {
parser: s.parsers.readline("\r\n")
}, false);
// sp.once('data', function(data) {
// if (data) {
// console.log("Retrieved data " + data);
// //console.log(data);
// }
// });
//console.log("Is port open " + sp.isOpen());
if(!sp.isOpen()) {
sp.open(function(err) {
if(err) {
console.log("Port cannot be opened manually");
} else {
console.log("Port is open " + cName);
sp.write("LED=2\r\n", function(err) {
if (err) {
console.log("Cannot write to port");
console.error(err);
} else {
console.log("Written to port " + cName);
}
});
}
});
}
//sp.close();
});
});
I'm sure you'd have noticed I'm not require'ing serialport library instead I'm using serialport-logger library it's just a way to use the serialport addons which are compiled with debug switch on windows box.
TLDR; For me it works by increasing the threadpool size for libuv.
$ UV_THREADPOOL_SIZE=20 && node server.js
I was fine with opening/closing port for each command for a while but a feature request I'm working on now needs to keep the port open and reuse the connection to run the commands. So I had to find an answer for this issue.
The number of devices I could support by opening a connection and holding on to it is 3. The issue happens to be the default threadpool size of 4. I already have another background worker occupying 1 thread so I have only 3 threads left. The EIO_WatchPort function in node-serialport runs as a background worker which results in blocking a thread. So when I use more than 3 devices the "open" method call is waiting in the queue to be pushed to the background worker but since they are all busy it blocks node. Then any subsequent requests cannot be handled by node. Finally increasing the thread pool size did the trick, it's working fine now. It might help someone. Also this thread definitely helped me.
As opensourcegeek pointed all u need to do is to set UV_THREADPOOL_SIZE variable above default 4 threads.
I had problems at my project with node.js and modbus-rtu or modbus-serial library when I tried to query more tan 3 RS-485 devices on USB ports. 3 devices, no problem, 4th or more and permanent timeouts. Those devices responded in 600 ms interval each, but when pool was busy they never get response back.
So on Windows simply put in your node.js environment command line:
set UV_THREADPOOL_SIZE=8
or whatever u like till 128. I had 6 USB ports queried so I used 8.