Lots of parallel http requests in node.js - javascript

I've created a node.js script, that scans network for available HTTP pages, so there is a lot of connections i want to run in parallel, but it seems that some of the requests wait for previous to complete.
Following is the code fragment:
var reply = { };
reply.started = new Date().getTime();
var req = http.request(options, function(res) {
reply.status = res.statusCode;
reply.rawHeaders = res.headers;
reply.headers = JSON.stringify(res.headers);
reply.body = '';
res.setEncoding('utf8');
res.on('data', function (chunk) {
reply.body += chunk;
});
res.on('end', function () {
reply.finished = new Date().getTime();
reply.time = reply.finished - reply.started;
callback(reply);
});
});
req.on('error', function(e) {
if(e.message == 'socket hang up') {
return;
}
errCallback(e.message);
});
req.end();
This code performs only 10-20 requests per second, but i need 500-1k requests performance. Every queued request is made to a different HTTP server.
I've tried to do something like that, but it didn't help:
http.globalAgent.maxSockets = 500;

Something else must be going on with your code. Node can comfortably handle 1k+ requests per second.
I tested with the following simple code:
var http = require('http');
var results = [];
var j=0;
// Make 1000 parallel requests:
for (i=0;i<1000;i++) {
http.request({
host:'127.0.0.1',
path:'/'
},function(res){
results.push(res.statusCode);
j++;
if (j==i) { // last request
console.log(JSON.stringify(results));
}
}).end();
}
To purely test what node is capable of and not my home broadband connection the code requests from a local Nginx server. I also avoid console.log until all the requests have returned because it is implemented as a synchronous function (to avoid losing debugging messages when a program crash).
Running the code using time I get the following results:
real 0m1.093s
user 0m0.595s
sys 0m0.154s
That's 1.093 seconds for 1000 requests which makes it very close to 1k requests per second.
The simple code above will generate OS errors if you try to make a lot of requests (like 10000 or more) because node will happily try to open all those sockets in the for loop (remember: the requests don't start until the for loop ends, they are only created). You mentioned that your solution also runs into the same errors. To avoid this you should limit the number of parallel requests you make.
The simplest way of limiting number of parallel requests is to use one of the Limit functions form the async.js library:
var http = require('http');
var async = require('async');
var requests = [];
// Build a large list of requests:
for (i=0;i<10000;i++) {
requests.push(function(callback){
http.request({
host:'127.0.0.1',
path:'/'
},function(res){
callback(null,res.statusCode);
}).end()
});
}
// Make the requests, 100 at a time
async.parallelLimit(requests, 100,function(err, results){
console.log(JSON.stringify(results));
});
Running this with time on my machine I get:
real 0m8.882s
user 0m4.036s
sys 0m1.569s
So that's 10k request in around 9 seconds or roughly 1.1k/s.
Look at the functions available from async.js.

I've found solution for me, it is not very good, but works:
childProcess = require('child_process')
I'm using curl:
childProcess.exec('curl --max-time 20 --connect-timeout 10 -iSs "' + options.url + '"', function (error, stdout, stderr) { }
This allows me to run 800-1000 curl processes simultaneously. Of course, this solution has it's weekneses, like requirement for lots of open file decriptors, but works.
I've tried node-curl bindings, but that was very slow too.

Related

Why is request.on data firing with a delay on NodeJS?

There is a simple web server that accepts data. Sample code below.
The idea is to track in real time how much data has entered the server and immediately inform the client about this. If you send a small amount of data, then everything works well, but if you send more than X data in size, then the on.data event on the server is triggered with a huge delay. I can see that data is transfering for 5 seconds already but on.data event is not trigerred.
on.data event seems to be triggered only when data is uploaded completely to the server, so that's why it works fine with small data (~2..20Mb), but with big data (50..200Mb) it doesnt work well.
Or maybe it is due to some kind of buffering..?
Do you have any suggestions why on.data triggered with delay and how to fix it?
const app = express();
const port = 3000;
// PUBLIC API
// upload file
app.post('/upload', function (request, response) {
request.on('data', chunk => {
// message appears with delay
console.log('upload on data', chunk.length);
// send message to the client about chunk.length
});
response.send({
message: `Got a POST request ${request.headers['content-length']}`
});
});
app.listen(port, () => {
console.log(`Example app listening at http://localhost:${port}`);
});
TLDR:
The delay that you are experiencing probably is the Queueing from Resource scheduling from the browser.
The Test
I did some tests with express, and then I found that it uses http to handle requests/response, so I used a raw http server listener to test this scenario, which has the same situation.
Backend code
This code, based on sample of Node transaction samples, will create a http server and give log of time on 3 situations:
When a request was received
When the first data event fires
When the end event fires
const http = require('http');
var firstByte = null;
var server = http.createServer((request, response) => {
const { headers, method, url } = request;
let body = [];
request.on('error', (err) => {
}).on('data', (chunk) => {
if (!firstByte) {
firstByte = Date.now();
console.log('received first byte at: ' + Date.now());
}
}).on('end', () => {
console.log('end receive data at: ' + Date.now());
// body = Buffer.concat(body).toString();
// At this point, we have the headers, method, url and body, and can now
// do whatever we need to in order to respond to this request.
if (url === '/') {
response.statusCode = 200;
response.setHeader('Content-Type', 'text/html');
response.write('<h1>Hello World</h1>');
}
firstByte = null;
response.end();
});
console.log('received a request at: ' + Date.now());
});
server.listen(8083);
Frontend code (snnipet from devtools)
This code will fire a upload to /upload which some array data, I filled the array before with random bytes, but then I removed and see that it did not have any affect on my timing log, so yes.. the upload content for now is just an array of 0's.
console.log('building data');
var view = new Uint32Array(new Array(5 * 1024 * 1024));
console.log('start sending at: ' + Date.now());
fetch("/upload", {
body: view,
method: "post"
}).then(async response => {
const text = await response.text();
console.log('got response: ' + text);
});
Now running the backend code and then running the frontend code I get some log.
Log capture (screenshots)
The Backend log and frontend log:
The time differences between backend and frontend:
Results
looking at the screenshoots and I get two differences between the logs:
The first, and most important, is the difference between frontend fetch start and backend request recevied, I got 1613ms which is "close" (1430ms) to Resource Scheduling in network timing tab, I think there are more things happening between the frontend fetch call and the node backend event, so I can't direct compare the times:
log.backendReceivedRequest - log.frontEndStart
1613
The second is the difference between receving data on backend, which I got
578ms, close to Request sent (585ms) in network timing tab:
log.backendReceivedAllData - log.backendReceivedFirstData
578
I also changed the frontend code to send different sizes of data and the network timing tab still matches the log
The thing that remains unknown for me is... Why does Google Chrome is queueing my fetch since I'm not running any more requests and not using the bandwidth of the server/host? I readed the conditions for Queueing but not found the reason, maybe is allocating the resources on disk, but not sure: https://developer.chrome.com/docs/devtools/network/reference/#timing-explanation
References:
https://nodejs.org/es/docs/guides/anatomy-of-an-http-transaction/
https://developer.chrome.com/docs/devtools/network/reference/#timing-explanation
I found a problem. It was in nginx config. Nginx was setup like a reverse proxy. By default proxy request buffering is enabled, so nginx grabs first whole request body and only then forwards it to nodejs, so that's why I saw delay.
https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_request_buffering

expressjs is not asynchronously processing my requests

var data = import('./blahblah/data')
app.get('/data', data.getData)
app.listen(3000)
// blahblah/data:
exports.getData = function(req, res) {
setTimeout(function(){
res.send('test')
}, 10000)
};
//code I use to test
for(var i = 0;i<10;i++){
$.ajax({
url: 'http://127.0.0.1:3000/data',
success: function(l){console.log(l);}
});
}
If I send 10 simultaneous requests to this endpoint, then they only receive a reply one at a time 10 seconds apart, with the last request coming 110 seconds after the first request was sent. Aren't nodejs and express supposed to let other requests process when the other requests are running asynchronous code?
Your code is fine, probably something is wrong with the way you're testing it.
Please, make sure that you're sending your requests in parallel.
I tested your code with loadtest utility
loadtest -n 10 -c 10 http://localhost:3000/data
and got expected results
Target URL: http://localhost:3000/data
Max requests: 10
Concurrency level: 10
Agent: none
Completed requests: 10
Total errors: 0
Total time: 10.050681130000001 s
Here is a code snipped I tested:
var app = require('express')();
app.get('/data', function(req, res) {
setTimeout(function(){
res.send('test')
},10000)
});
app.listen(3000)
Update:
Web browsers usually limit the number of parallel requests per host. I just tested it in my Google Chrome and it limited the number of parallel requests by 6:
Here is a detailed timing for one of delayed requests:
Looks like your browser have an even harsher limits.

Node.js - Why are some of my callbacks not executing asynchronously?

Noob question on using callbacks as a control flow pattern with Node and the http class. Based on my understanding of the event loop, all code is blocking, i/o is non-blocking and using callbacks, here's the a simple http server and a pseudo rest function:
// Require
var http = require("http");
// Class
function REST() {};
// Methods
REST.prototype.resolve = function(request,response,callback) {
// Pseudo rest function
function callREST(request, callback) {
if (request.url == '/test/slow') {
setTimeout(function(){callback('time is 30 seconds')},30000);
} else if (request.url == '/test/foo') {
callback('bar');
}
}
// Call pseudo rest
callREST(request, callback);
}
// Class
function HTTPServer() {};
// Methods
HTTPServer.prototype.start = function() {
http.createServer(function (request, response) {
// Listeners
request.resume();
request.on("end", function () {
// Execute only in not a favicon request
var faviconCheck = request.url.indexOf("favicon");
if (faviconCheck < 0) {
//Print
console.log('incoming validated HTTP request: ' + request.url);
//Instantiate and execute on new REST object
var rest = new REST();
rest.resolve(request,response,function(responseMsg) {
var contentType = {'Content-Type': 'text/plain'};
response.writeHead(200, contentType); // Write response header
response.end(responseMsg); // Send response and end
console.log(request.url + ' response sent and ended');
});
} else {
response.end();
}
});
}).listen(8080);
// Print to console
console.log('HTTPServer running on 8080. PID is ' + process.pid);
}
// Process
// Create http server instance
var httpServer = new HTTPServer();
// Start
httpServer.start();
If I open up a browser and hit the server with "/test/slow" in one tab then "/test/foo" in another, I get the following behavior - "foo" responds with "Bar" immediately and then 30 secs late, "slow" responds with "time is 30 seconds". This is what I was expecting.
But if I open up 3 tabs in a browser and hit the server with "/test/slow" successively in each tab, "slow" is being processed and responds serially/synchronously so that the 3 responses appear at 30 second intervals. I was expecting the responses right after each other if they were being processed asynchronously.
What am I doing wrong?
Thank you for your thoughts.
This is actually not the server's fault. Your browser is opening a single connection and re-using it between the requests, but one request can't begin until the previous finishes. You can see this a couple of ways:
Look in the network tab of the Chrome dev tools - the entry for the longest one will show the request in the blocking state until the first two finish.
Try opening the slow page in different browsers (or one each in normal and incognito windows) - this prevents sharing connections.
Thus, this will only happen if the same browser window is making multiple requests to the same server. Also, note that XHR (AJAX) requests will open separate connections so they can be performed in parallel. In the real world, this won't be a problem.

Nodejs http.request strange delay

I have found strange delay of http.request function. Here is my code
var express = require('express');
var http = require('http');
app.set('port', process.env.PORT || 3000);
var app = express();
app.get('/aaa',function(req,res) {
setTimeout(function(){
res.json({"a":1});
},500);
});
app.get('/bbb',function(req,res){
var options = {
host: '127.0.0.1',
port: 3000,
path: '/aaa',
method: 'GET'
};
var request = http.request(options, function(result) {
result.on("data",function(){
});
res.json({"b":2});
});
request.on('error', function() {
res.json({"b":2});
});
request.end();
});
http.createServer(app).listen(app.get('port'), function(){
});
Client call /bbb, then it's handler call /aaa and within 500ms result returns back to client.
I tried to measure response time in different situations using Apache Bench:
1) 1000 requests with 1 concurrent requests.
Average response time: 500ms
2) 1000 requests with 50 concurrent requests.
Average response time: 5000ms
3) 1000 requests with 100 concurrent requests.
Average response time: 10000ms
Why response time is growing?
It's okay when I call /aaa directly
It's not unusual behaviour. The HTTP Client used in the callback to /bbb (http.request) is limited to 5 concurrent sockets per host. In other words, it can only make 5 HTTP requests in parallel. You can find reference to this here in the documentation
Just to confirm you're hitting the limit, you should run your tests using 5 and 6 concurrent requests. You'll see (as I did) average response time drops significantly at 6 concurrent requests. This is because the 6th concurrent request will be queued until one of the 5 preceeding requests to /aaa is completed.
To answer your question about why response time grows: The more concurrency you add in your benchmark, the more average response time will go up because each request has to wait for more requests in the queue to finish before it can get a socket.
You can increase the number of concurrent sockets your HTTP client can handle by modifying the default agent like this:
var http = require("http");
http.globalAgent.maxSockets = 10;
You can also circumvent pooling altogether by passing agent:false to http.get like so:
http.get({hostname:'localhost', port:80, path:'/', agent:false}, function (res) {
// Do stuff
})
Update (8th Feb 2015)
An important change regarding this answer has come up in Node v 0.12.0.
maxSockets are no longer limited to 5. The default is now set to
Infinity with the developer and the operating system given control
over how many simultaneous connections an application can keep open to
a given host.
I had same issue and it is resolved by keeping it very simple get request as below
var req = http.get(requestUrl)
req.end();

Nodejs one request blocks another requests

If one request will be take a long time for execution, another tabs will wait until first request will completed. Why? Please explain me. And how to solve it? Here my code:
var http = require("http");
var url = require("url");
http.createServer(function(request, response) {
if (url.parse(request.url).pathname == '/long')
{
function longlong()
{
var counter = 0;
var startTime = new Date().getTime();
while (new Date().getTime() < startTime + 10000)
{
counter++;
}
return counter;
}
response.writeHead(200, {"Content-Type": "text/plain"});
response.write(longlong() + '');
response.end();
}
else
{
response.writeHead(200, {"Content-Type": "text/plain"});
response.write(0 + '');
response.end();
}
}).listen(8888);
Good read:
http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/
A key line from that write up is:
…however, everything runs in parallel except your code
That means expensive I/O should be async but your code can block. but, it's typically expensive I/O that we're worried about blocking on the server.
See this other question for more details with an example: Node.js blocking the event loop?
Because NodeJS is single threaded ( just like browser JavaScript ). It can only process one code at a time. We get the illusion of concurrency, because NodeJS has very fancy queue of code blocks which are fired in an asynchronous way. However once the block is processed no other block can be processed at the same time.
With NodeJS you have to make sure that every request ends (either successfuly or not), otherwise it may crash beyond any help (entire server, not only the request).
Also using process.nextTick instead of classical loop may help your requests work faster ( i.e. be more scalable ).

Categories