I would like to set up an API using express that can either multithread or multiprocess requests.
For instance, below is an api that sleeps 5 seconds before sending a response. If I call it quickly 3 times, the first response will take 5 seconds, the second will take 10, and the third will take 15, indicating the requests were handled sequentially.
How do I architect an application that can handle the requests concurrently.
const express = require('express')
const app = express()
const port = 4000
function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
app.get('/', (req, res) => {
sleep(5000).then(()=>{
res.send('Hello World!')
})
})
app.listen(port, () => console.log(`Example app listening on port ${port}!`))
Edit: request -> response
If I call it quickly 3 times, the first response will take 5 seconds, the second will take 10, and the third will take 15, indicating the requests were handled sequentially.
That's only because your browser is serializing the requests, because they're all requesting the same resource. On the Node.js/Express side, those requests are independent of one another. If they were sent from three separate clients one right after another, they'd each get a response roughly five seconds later (not after 5, 10, and 15 seconds).
For instance, I updated your code to output the date/time of the response:
res.send('Hello World! ' + new Date().toISOString())
...and then opened http://localhost:4000 in three separate browsers as quickly as I could (I don't appear to be all that quick :-)). The times on the responses were:
16:15:58.819Z
16:16:00.361Z
16:16:01.164Z
As you can see, they aren't five seconds apart.
But if I do that in three windows in the same browser, they get serialized:
16:17:13.933Z
16:17:18.938Z
16:17:23.942Z
If I further update your code so that it's handling three different endpoints:
function handler(req, res) {
sleep(5000).then(()=>{
res.send('Hello World! ' + new Date().toISOString())
})
}
app.get('/a', handler);
app.get('/b', handler);
app.get('/c', handler);
Then even on the same browser, requests for /a, /b, and /c are not serialized.
Related
Using the packages express and ````ftp``` I try to simply get files from an ftp and return
them by HTTP GET to the client requesting.
The first request goes through fine, but when I try calling it again I run into Exception:
Error [ERR_HTTP_HEADERS_SENT]: Cannot set headers after they are sent to the client
I've tried to use the solutions from Error: Can't set headers after they are sent to the client like having a return when sending, not setting , unfortunately none of the worked for me.
This is ALL the code:
const express = require('express');
const ftp = require('ftp');
const app = express();
const port = 3000;
const c = new ftp();
app.get('/files', (req, res) => {
c.on('ready', () => {
c.list((err, list) => {
c.end();
return res.setHeader("Content-Type", "application/json").status(200).send({data: list});
});
});
c.connect({
host: 'xxx',
user: 'xxx',
password: 'xxx',
});
});
app.listen(port, () => {
console.log(`Example app listening on port ${port}`);
});
I think it might be something with the c.list() callback, however I cannot for the love of god find what is wrong with it, as the res.send() does not get called twice anytime.
The problem is you have just one ftp object, and every request subscribes to (and never unsubscribes from) the ready event, and the ready event fires every time you call connect() , which you do for every request. So when the second request calls connect(), the event fires for both the first and second request. This leads to setHeader() being called a second time for the first request, hence the error.
Using once() instead of on() so that the event handler it only called once should resolve the issue, though there are probably better ways to write this code (use a promise API or promisify this one, only initialize a connection to the FTP server once instead of for every request).
There is a simple web server that accepts data. Sample code below.
The idea is to track in real time how much data has entered the server and immediately inform the client about this. If you send a small amount of data, then everything works well, but if you send more than X data in size, then the on.data event on the server is triggered with a huge delay. I can see that data is transfering for 5 seconds already but on.data event is not trigerred.
on.data event seems to be triggered only when data is uploaded completely to the server, so that's why it works fine with small data (~2..20Mb), but with big data (50..200Mb) it doesnt work well.
Or maybe it is due to some kind of buffering..?
Do you have any suggestions why on.data triggered with delay and how to fix it?
const app = express();
const port = 3000;
// PUBLIC API
// upload file
app.post('/upload', function (request, response) {
request.on('data', chunk => {
// message appears with delay
console.log('upload on data', chunk.length);
// send message to the client about chunk.length
});
response.send({
message: `Got a POST request ${request.headers['content-length']}`
});
});
app.listen(port, () => {
console.log(`Example app listening at http://localhost:${port}`);
});
TLDR:
The delay that you are experiencing probably is the Queueing from Resource scheduling from the browser.
The Test
I did some tests with express, and then I found that it uses http to handle requests/response, so I used a raw http server listener to test this scenario, which has the same situation.
Backend code
This code, based on sample of Node transaction samples, will create a http server and give log of time on 3 situations:
When a request was received
When the first data event fires
When the end event fires
const http = require('http');
var firstByte = null;
var server = http.createServer((request, response) => {
const { headers, method, url } = request;
let body = [];
request.on('error', (err) => {
}).on('data', (chunk) => {
if (!firstByte) {
firstByte = Date.now();
console.log('received first byte at: ' + Date.now());
}
}).on('end', () => {
console.log('end receive data at: ' + Date.now());
// body = Buffer.concat(body).toString();
// At this point, we have the headers, method, url and body, and can now
// do whatever we need to in order to respond to this request.
if (url === '/') {
response.statusCode = 200;
response.setHeader('Content-Type', 'text/html');
response.write('<h1>Hello World</h1>');
}
firstByte = null;
response.end();
});
console.log('received a request at: ' + Date.now());
});
server.listen(8083);
Frontend code (snnipet from devtools)
This code will fire a upload to /upload which some array data, I filled the array before with random bytes, but then I removed and see that it did not have any affect on my timing log, so yes.. the upload content for now is just an array of 0's.
console.log('building data');
var view = new Uint32Array(new Array(5 * 1024 * 1024));
console.log('start sending at: ' + Date.now());
fetch("/upload", {
body: view,
method: "post"
}).then(async response => {
const text = await response.text();
console.log('got response: ' + text);
});
Now running the backend code and then running the frontend code I get some log.
Log capture (screenshots)
The Backend log and frontend log:
The time differences between backend and frontend:
Results
looking at the screenshoots and I get two differences between the logs:
The first, and most important, is the difference between frontend fetch start and backend request recevied, I got 1613ms which is "close" (1430ms) to Resource Scheduling in network timing tab, I think there are more things happening between the frontend fetch call and the node backend event, so I can't direct compare the times:
log.backendReceivedRequest - log.frontEndStart
1613
The second is the difference between receving data on backend, which I got
578ms, close to Request sent (585ms) in network timing tab:
log.backendReceivedAllData - log.backendReceivedFirstData
578
I also changed the frontend code to send different sizes of data and the network timing tab still matches the log
The thing that remains unknown for me is... Why does Google Chrome is queueing my fetch since I'm not running any more requests and not using the bandwidth of the server/host? I readed the conditions for Queueing but not found the reason, maybe is allocating the resources on disk, but not sure: https://developer.chrome.com/docs/devtools/network/reference/#timing-explanation
References:
https://nodejs.org/es/docs/guides/anatomy-of-an-http-transaction/
https://developer.chrome.com/docs/devtools/network/reference/#timing-explanation
I found a problem. It was in nginx config. Nginx was setup like a reverse proxy. By default proxy request buffering is enabled, so nginx grabs first whole request body and only then forwards it to nodejs, so that's why I saw delay.
https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_request_buffering
I have a data provider which gives me stock prices via TCP connection. The data provider only allows a static IP to connect to their service.
But since I need to format the data before sending it to my front-end I want to use my express back-end as a proxy.
What that means is:
I need to connect my back-end to my data provider via websocket(socket.io) in order to get the data (back-end acts as client)
I need my back-end to broadcast this received data to my front-end(back-end acts as server)
My question is: Is that possible at all? Is there an easier way to achieve this? Is there a documentation on how to use an express app as websocket server and client at once?
EDIT:
I got this working now. But my current solution kills my AWS EC2 instance because of huge CPU usage. This is how I've implemented it:
const net = require('net');
const app = require('express')();
const httpServer = require('http').createServer(app);
const client = new net.Socket();
const options = {
cors: {
origin: 'http://someorigin.org',
},
};
const io = require('socket.io')(httpServer, options);
client.connect(1337, 'some.ip', () => {
console.info('Connected to some.ip');
});
client.on('data', async (data) => {
// parse data
const parsedData = {
identifier: data.identifier,
someData: data.someData,
};
// broadcast data
io.emit('randomEmitString', parsedData);
});
client.on('close', () => {
console.info('Connection closed');
});
httpServer.listen(8081);
Does anyone have an idea why this causes a huge CPU load? I've tried to profile my code with clinicjs but I couldn't find a apparent problem.
EDIT2: To be more specific: My data provider provides my with stock quotes. So every time a quote changes, I get new data. I then parse this data and emit it via io.emit. Could this be some kind of bottleneck?
This is the profile I get after I run clinicjs:
I don't know how many resources you have on your AWS, but 1,000 clients shouldn't be a problem.
I have personally encountered 2 bottlenecks:
Clients connected with Ajax, not WS (It used to be a common problem with old socket.io)
The socket.io libraries were served by Node, not Nginx / Apache. Node is poor at keeping-alive management.
Check also:
How often do you get data from some.ip? Good idea is aggregate and filter it.
Do you need to notify all customers of everything? Is it enough just to inform interested? (Live zone)
Maybe it is worth moving the serving to serviceWorker.js or Push Events?
As part of the experiment, log yourself events. Receiving data, connecting and disconnecting the client. Observe the server logs.
As part of the debugging process, log events. Receiving data, connecting and disconnecting the client. Observe the server logs.
Or maybe this code is not responsible for the problems, but the data download for the first view. Do you have data in the buffer, or do you read for GET index.html?
To understand what was going on with your situation, I created an elementary TCP server that published JSON messages every 1ms to each client that connects to it. Here is the code for the server:
var net = require('net');
var server = net.createServer(function(socket) {
socket.pipe(socket);
});
server.maxConnections = 10
server.on('close', () => console.log('server closed'))
server.on('error', (err) => console.error(err))
server.on('listening', () => console.log('server is listening'))
server.on('connection', (socket) => {
console.log('- client connected')
socket.setEncoding('utf8')
var intervalId = setInterval(() => socket.readyState === "open" &&
socket.write(JSON.stringify({
id: intervalId,
timestamp: Date.now(),
}) + '\n'), 1)
socket.on('error' , (err) => console.error(err))
socket.on('close' , () => {
clearInterval(intervalId)
console.log('- client closed the connection')
})
})
server.listen(1337, '0.0.0.0');
As you see, we set up a setInterval function that will emit a simple JSON message to each connected client every 1 ms.
For the client, I used something very similar to what you have. At first, I tried pushing every message received by the server to the browser to the WebSocket connection. In my case, it also pushed the CPU to 100%. I don't know exactly why.
Nonetheless, even though your data is being updated every 1 ms, it is doubtful that you need to refresh your webpage at that rate. Most websites work at 60 fps. That would mean updating the data every 16ms. So, a straightforward solution would be to batch the data and send it to the browser every 16 ms. Just this modification greatly increases performance. You can go even further by extending the batch time or filtering some of the sent data.
Here is the code for the client, taking advantage of batch messages. Bear in mind that this is a very naive implementation made to show the idea. A better adjustment would be to work the streams with libraries like RxJS.
// tcp-client.js
const express = require('express');
const http = require('http');
const { Server } = require("socket.io");
const net = require('net')
const app = express();
const server = http.createServer(app);
const io = new Server(server);
const client = new net.Socket()
app.get('/', (req, res) => {
res.setHeader('content-type', 'text/html')
res.send(`
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>TCP - Client</title>
</head>
<body>
<script src="/socket.io/socket.io.js"></script>
<script>
var socket = io();
socket.on('msg', (msg) => document.body.textContent = msg);
</script>
</body>
</html>
`);
});
io.on('connection', (socket) => {
console.log('- user connected');
socket.on('disconnect', () => {
console.log('- user disconnected');
});
});
var buffer = []
setInterval(() => {
io.emit("msg", JSON.stringify(buffer))
buffer = []
}, 16)
client.connect(1337, '127.0.0.1', function() {
console.log('- connected to server');
});
client.on('data', function(data) {
buffer.push(data.toString("utf8"))
});
client.on('close', function() {
console.log('- connection to server closed');
});
server.listen(3000, () => {
console.log('listening on 0.0.0.0:3000');
});
In my express app, I have defined 2 endpoints in my application. One for is-sever-up check and one for simulating a blocking operation.
app.use('/status', (req, res) => {
res.sendStatus(200);
});
app.use('/p', (req, res) => {
const { logger } = req;
logger.info({ message: 'Start' });
let i = 0;
const max = 10 ** 10;
while (i < max) {
i += 1;
}
res.send(`${i}`);
logger.info({ message: 'End' });
});
I am using winston for logging and PM2 for clustering using the following command
$ pm2 start bin/httpServer.js -i 0
It has launched 4 instances.
Now, when I visit the routes /p, /p, /status in order in different tabs with around 1 second delay between (request 1 and request 2) and (request 2 and request 3), I expected to get response for request 1 and request 2 after some time but with around 1 second delay and response for request 3 should come instantly.
Actual: The response for request 3 did come instantly but something weird happened with request 1 and request 2. The request 2 didn't even start until request 1 was completed. Here are logs that I got. You can see the time stamp for the end of request 1 and start of request 2.
{"message":"Start","requestId":"5c1f85bd-94d9-4333-8a87-30f3b3885d9c","level":"info","timestamp":"2020-12-28 07:34:48"}
{"message":"End","requestId":"5c1f85bd-94d9-4333-8a87-30f3b3885d9c","level":"info","timestamp":"2020-12-28 07:35:03"}
{"message":"Start","requestId":"f1f86f68-1ddf-47b1-ae62-f75c7aa7a58d","level":"info","timestamp":"2020-12-28 07:35:03"}
{"message":"End","requestId":"f1f86f68-1ddf-47b1-ae62-f75c7aa7a58d","level":"info","timestamp":"2020-12-28 07:35:17"}
Why did the request 1 and request 2 not start at the same time (with 1 second delay, of course)? And if they are running synchronously, why did request 3 respond instantly and not wait for request 1 and 2 to complete?
That's because connection of the header is keep-alive in the response which your node server respond in default. So, connection will be reused when you use browser (curl also could simulate the reused connection situtation). That means multiple request is served by the same instance within a specified time. Even you have multiple node instances.
Note: You could see specified time in response header like this Keep-Alive: timeout=5
If you use browser, open network tab to see response headers.
If you use curl, add -v options to see response headers
You could try to use multiple separated curl command at the same time in terminal. Separated curl command means connection will not be reused. So, you'll get your expected results. You could add a console.log("status test") in /status router. Then, use pm2 logs to see which instance serve the request like following format (these logs are produced by accessing endpoint with browser).
0|server | status test
0|server | status test
0 means the first instance, you will see this is all the same instance to serve request when you use browser to access endpoint. But, if you use curl, you'll find out the number is always changed which mean every request is served by different node instance.
You could see I sent two request at the same time with curl in terminal. Then, different node instance to serve the request. So, the start and end time of console.log are same. In this example, I have 8 event-loop so I could deal with 8 long-processing (synchronous code) request at the same time.
And, you could use curl to simulate the keep-alive situation. Then, you'll see the request is served by same node instance.
curl http://localhost:8080/status http://localhost:8080/status -v -H "Connection: keep-alive"
You also could use connection close to see the request is served by different node instance.
curl http://localhost:8080/status http://localhost:8080/status -v -H "Connection: close"
You could see the different here.
If you want to close the connection in server side, you could use following code.
res.setHeader("Connection", "close")
This is my test code.
const express = require("express")
const app = express();
const port = 8080;
app.use('/status', (req, res) => {
console.log("status tests");
res.sendStatus(200);
});
app.use('/p', (req, res) => {
console.log(new Date() + " start");
let i = 0;
const max = 10 ** 10;
while (i < max) {
i += 1;
}
res.send(`${i}`);
console.log(new Date() + " end");
});
app.listen(port, () => {
return console.log(`server is listening on ${port}`);
});
I have found strange delay of http.request function. Here is my code
var express = require('express');
var http = require('http');
app.set('port', process.env.PORT || 3000);
var app = express();
app.get('/aaa',function(req,res) {
setTimeout(function(){
res.json({"a":1});
},500);
});
app.get('/bbb',function(req,res){
var options = {
host: '127.0.0.1',
port: 3000,
path: '/aaa',
method: 'GET'
};
var request = http.request(options, function(result) {
result.on("data",function(){
});
res.json({"b":2});
});
request.on('error', function() {
res.json({"b":2});
});
request.end();
});
http.createServer(app).listen(app.get('port'), function(){
});
Client call /bbb, then it's handler call /aaa and within 500ms result returns back to client.
I tried to measure response time in different situations using Apache Bench:
1) 1000 requests with 1 concurrent requests.
Average response time: 500ms
2) 1000 requests with 50 concurrent requests.
Average response time: 5000ms
3) 1000 requests with 100 concurrent requests.
Average response time: 10000ms
Why response time is growing?
It's okay when I call /aaa directly
It's not unusual behaviour. The HTTP Client used in the callback to /bbb (http.request) is limited to 5 concurrent sockets per host. In other words, it can only make 5 HTTP requests in parallel. You can find reference to this here in the documentation
Just to confirm you're hitting the limit, you should run your tests using 5 and 6 concurrent requests. You'll see (as I did) average response time drops significantly at 6 concurrent requests. This is because the 6th concurrent request will be queued until one of the 5 preceeding requests to /aaa is completed.
To answer your question about why response time grows: The more concurrency you add in your benchmark, the more average response time will go up because each request has to wait for more requests in the queue to finish before it can get a socket.
You can increase the number of concurrent sockets your HTTP client can handle by modifying the default agent like this:
var http = require("http");
http.globalAgent.maxSockets = 10;
You can also circumvent pooling altogether by passing agent:false to http.get like so:
http.get({hostname:'localhost', port:80, path:'/', agent:false}, function (res) {
// Do stuff
})
Update (8th Feb 2015)
An important change regarding this answer has come up in Node v 0.12.0.
maxSockets are no longer limited to 5. The default is now set to
Infinity with the developer and the operating system given control
over how many simultaneous connections an application can keep open to
a given host.
I had same issue and it is resolved by keeping it very simple get request as below
var req = http.get(requestUrl)
req.end();