In using Node.js to query some public APIs via HTTP requests. Therefore, I'm using the request module. I'm measuring the response time within my application, and see that my application return the results from API queries about 2-3 times slower than "direct" requests via curl or in the browser. Also, I noticed that connections to HTTPS enabled services usually take longer than plain HTTP ones, but this can be a coincidence.
I tried to optimize my request options, but to no avail. For example, I query
https://www.linkedin.com/countserv/count/share?url=http%3A%2F%2Fwww.google.com%2F&lang=en_US
I'm using request.defaults to set the overall defaults for all requests:
var baseRequest = request.defaults({
pool: {maxSockets: Infinity},
jar: true,
json: true,
timeout: 5000,
gzip: true,
headers: {
'Content-Type': 'application/json'
}
});
The actual request are done via
...
var start = new Date().getTime();
var options = {
url: 'https://www.linkedin.com/countserv/count/share?url=http%3A%2F%2Fwww.google.com%2F&lang=en_US',
method: 'GET'
};
baseRequest(options, function(error, response, body) {
if (error) {
console.log(error);
} else {
console.log((new Date().getTime()-start) + ": " + response.statusCode);
}
});
Does anybody see optimization potential? Am I doing something completely wrong? Thanks in advance for any advice!
There are several potential issues you'll need to address given what I understand from your architecture. In no particular order they are:
Using request will always be slower than using http directly since as the wise man once said: "abstraction costs". ;) In fact, to squeeze out every possible ounce of performance, I'd handle all HTTP requests using node's net module directly. For HTTPS, it's not worth rewriting the https module. And for the record, HTTPS will always be slower than HTTP by definition due to both the need to handshake cryptographic keys and do the crypt/decrypt work on the payload.
If your requirements include retrieving more than one resource from any single server, assure that those requests are made in order with the http KeepAlive set so you can benefit from the already open socket. The time it takes to handshake a new TCP socket is huge compared to making a request on an already open socket.
assure that http connection pooling is disabled (see Nodejs Max Socket Pooling Settings)
assure that your operating system and shell is not limiting the number of available sockets. See How many socket connections possible? for hints.
if you're using linux, check Increasing the maximum number of tcp/ip connections in linux and I'd also strongly recommend fine tuning the kernel socket buffers.
I'll add more suggestions as they occur to me.
Update
More on the topic of multiple requests to the same endpoint:
If you need to retrieve a number of resources from the same endpoint, it would be useful to segment your requests to specific workers that maintain open connections to that endpoint. In that way, you can be assured that you can get the requested resource as quickly as possible without the overhead of the initial TCP handshake.
TCP handshake is a three-stage process.
Step one: client sends a SYN packet to the remote server.
Step two: the remote server replies to the client with a SYN+ACK.
Step three: the client replies to the remote server with an ACK.
Depending on the client's latency to the remote server, this can add up to (as William Proxmire once said) "real money", or in this case, delay.
From my desktop, the current latency (round-trip time measure by ping) for a 2K octet packet to www.google.com is anywhere between 37 and 227ms.
So assuming that we can rely on a round-trip mean of 95ms (over a perfect connection), the time for the initial TCP handshake would be around 130ms or SYN(45ms) + SYN+ACK(45ms) + ACK(45ms) and this is a tenth of a second just to establish the initial connection.
If the connection requires retransmission, it could take much longer.
And this is assuming you retrieve a single resource over a new TCP connection.
To ameliorate this, I'd have your workers keep a pool of open connections to "known" destinations which they would then advertise back to the supervisor process so it could direct requests to the least loaded server with a "live" connection to the target server.
Actually, I have some new elements good enough to open a real answer. Having a look at the way request uses the HTTP agent can you please try the following :
var baseRequest = request.defaults({
pool: false,
agent: false,
jar: true,
json: true,
timeout: 5000,
gzip: true,
headers: {
'Content-Type': 'application/json'
}
});
This will disable connection pooling and should make it a lot faster.
Related
I am using the free TURN server provided by https://numb.viagenie.ca/. The STUN servers are also public.
I am using the following configuration:
const iceConfiguration = {
iceServers: [
{
url: 'stun:stunserver.stunprotocol.org'
},
{
url: 'stun:stun.sipgate.net:10000'
},
{
url: 'turn:numb.viagenie.ca',
credential: 'mypassword',
username: 'myemail'
}
]
}
I create an offer, send it to the other peer (different NAT) and then attempt to set the remote description with the answer. Upon calling myConnection.setRemoteDescription(answer), it keeps pending indefinitely and does not get resolved. Also, the remote peer can set its remote description without any issues. It all works fine for devices in the same network. So, I guess the problem lies in the relay server.
If so, should I ditch the public Numb server and opt towards using Coturn with DigitalOcean hosting or am I doing something totally wrong here?
Before setting up a brand new TURN server you can try to understand what's actually happening: if you take a trace on the computer with an application like Wireshark, and filter for stun messages, you should be able to see the browser sending Binding Request and Allocate Request methods towards the TURN server.
A missing response from the server may mean that the server is not available, the port is wrong, or a firewall prevents the browser to reach the TURN server.
If instead the credentials are wrong, the browser will receive a 401 error to the Allocate Request with the message-integrity attribute.
You can also verify the TURN URL and credentials by running the WebRTC sample application that deals with ICE candidate gathering at https://webrtc.github.io/samples/src/content/peerconnection/trickle-ice/ .
It seems as though the Numb TURN servers do not actually work. No idea why. But they do show up in the WebRTC trickle ICE sample application.
The npm-request library allows me to construct HTTP requests using a nice JSON-style syntax, like this.
request.post(
{
url: 'https://my.own.service/api/route',
formData: {
firstName: 'John',
lastName: 'Smith'
}
},
(err, response, body) => {
console.log(body)
}
);
But for troubleshooting, I really need to see the HTTP message body of the request as it would appear on the wire. Ideally I'm looking for a raw bytes representation with a Node.js Buffer object. It seems easy to get this for the response, but not the request. I'm particularly interested in multipart/form-data.
I've looked through the documentation and GitHub issues and can't figure it out.
Simplest way to do this is to start a netcat server on any port:
$ nc -l -p 8080
and change the URL to localhost in your code:
https://localhost:8080/v1beta1/text:synthesize?key=API_KEY
Now, any requests made will print the entire, raw HTTP message sent to the localhost server.
Obviously, you won't be able to see the response, but the entire raw request data will be available for you to inspect in the terminal you have netcat running
I figured out how to dump the HTTP message body with Request. In both cases, I'm just copying the same approach that request uses internally.
Multipart Form Uploads
req._form.pipe(process.stdout);
URL-encoded Forms
console.log(req.body);
You could try #jfriend00 suggestion an use a network sniffer like wireshark but as you're fetching an https URL this might not be the easiest route as it requires some setup to intercept TLS connections.
So maybe it would be enough turning on debug mode for the request module itself, you can do that by simply setting require('request').debug = true. As a third option you could go with the dedicated debug module for request here which allows you to view request and response headers and bodies.
I can think of a number of ways to see the bytes of the request:
Turn on debugging in the request module. There are multiple ways to do that documented here including setting NODE_DEBUG=request or require('request').debug = true or using the request-debug module.
Use a network sniffer to see what's actually being sent over the socket, independent of your node.js code.
Create your own dummy http server that does nothing but log the exact incoming request and send your same request to that dummy server so it can log it for you.
Create or use a proxy (like nginx) that can dump the exact incoming request before forwarding it on to its final destination and send the request to the proxy.
Step through the sending of the request in the debugger to see exactly what it is writing to the socket (this may be time consuming, particularly with async callbacks, but will eventually work).
you could use a nodejs server capable of logging the raw request/response string , then direct your request to that server
i gave an example using both http and https server - no dependencies
nodejs getting raw https request and response
I have a Nodejs based server, which uses a middleware which basically redirects the user to a CAS to manage authentication. The CAS server responds with a ticket, and finally my Nodejs server trades the ticket for an user object with the CAS and stores it in a session.
This process works perfectly fine without cluster.
Today i wanted to clusterize my Nodejs server, using https://nodejs.org/api/cluster.html (thing that i've already done without any problem).
So instead of having:
let server = http.createServer(app);
server.listen(PORT, HOST, () => {
// Keep a track that the server has been launched
logger.log(`Server running at http://${HOST}:${PORT}/`);
});
Where everything was working fine, I now have:
if(cluster.isMaster) {
// This is the mother process
// Let's create the threads
for(let i=0; i < NB_WORKERS; i++) {
cluster.fork();
}
// When a child crash... Restart it
cluster.on('exit', (worker, code, signal) => {
logger.log("info", "A child died (PID: %s). It was killed by code or signal %s. Restarting...", worker.process.pid, dode || signal);
cluster.fork();
});
} else {
// This is a child
// Create the server, based on http
let server = http.createServer(app);
server.listen(PORT, HOST, () => {
// Keep a track that the server has been launched
logger.log(`Server running at http://${HOST}:${PORT}/`);
});
}
When i launch the server, it actually starts the server on NB_WORKERS threads, as expected. But when i want to access the app delivered by my Node server with my browser, i have the following error:
which says if you can't see:
XMLHttpRequest cannot load https://localhost:8443/cas/login?
service=http://localhost:3000. No 'Access-Control-Allow-Origin' header is
present on the requested resource. Origin 'http://localhost:3000' is
therefore not allowed access
https://localhost:8443 is were my CAS server is running, and http://localhost:3000 is where my Node server is running.
Note that if i set NB_WORKERS to 1, everything works fine again.
I understand that setting 'Access-Control-Allow-Origin' header in my CAS server config would probably make everything works fine, but i don't understand why it's working with one thread and not with two or more.
What am i missing ?
I finally managed to make it work, so i post here in case someone come across a similar issue.
About Node session
As I said, my Nodejs server stores datas into a session. In this case, it was a simple express-session with the default MemoryStore since i'm still in development.
When clustering, express-session default store is NOT shared between threads. This means that requests supposed identified by the session were sometimes not, depending of which thread handled the request. This caused the authentication middleware to ask the CAS server again.
To make it work, i had to use a persistant store for my session, such as Redis.
About the CORS issue
I'm not really sure about what caused this strange issue, but here what i thought about:
Since my CAS server uses HTTPS protocol, some handshakes were made between both servers due to TSL (Hello, Certificate exchange, Key exchange). Maybe these were broken if not only one thread tried to do it (One makes the Hello, then the response is sent to another thread => this can't work).
Still it's very strange, because it's very low-level, and almost none cluster Node app would work if this was not managed internally.
In order to make the previous handshakes work, i guess that each thread must be identified somehow. A likely way to do it is to assign a custom port to each thread: the process still run on port 3000, but each thread use a custom one to communicate.
Maybe this was causing my issue, but I can't really tell precisely.
So, i just had to manage correctly my session store to make it work. But please let me know if i was wrong somewhere about the CORS issue.
I'm working on a Google Chrome app that contains the following code:
var socket = chrome.sockets.udp;
var PORT = 5005;
var HOST = '127.0.0.1';
socket.create({}, function(sockInfo){
socket.bind(sockInfo.socketId, HOST, PORT, function(result){
socket.onReceive.addListener(function(info){
// do stuff with the packet data
});
});
});
This mechanism worked perfectly when I was sending data on a looparound from localhost. However, when I try to send from a remote machine, the onReceive.addListener callback function never gets called. At first I thought this might be a local network issue, but when I tcpdump -vv udp port 5005 it reads the data I'm sending, so I know it's reaching my machine. This leads me to believe it's a Chrome issue...in my manifest.json file I've set universal "bind" and "send" permissions for UDP, so I don't see why this isn't working. Any thoughts?
François Beaufort's now deleted answer provided a useful suggestion, sadly in a way that was more appropriate for a comment. I'm making this a community wiki so that he does not feel robbed of reputation.
The HOST part indicates which interface you're listening on for data. Setting it to 127.0.0.1 means that you're only listening for the loopback interface, that is not accessible from outside.
You could provide an explicit IP address of the network interface you have, but an easier solution is to say "I want to listen on them all".
Quote from the docs:
Use "0.0.0.0" to accept packets from all local available network interfaces.
That should solve it in your case; in general, one also needs to check that the firewall is letting the requests through; you already did it.
I'd like to measure the latency in opening (and closing) a tcp connection to a server. I don't want to send/receive any data since I want as little overhead as possible to eliminate anything on the server side increasing the request time.
Think of it like a TCP ping. Record current time, connect to host:port, disconnect, calculate delta time.
I believe Javascript doesn't allow opening direct sockets, but I'm hoping given what it can do (e.g. AJAX requests) that it can be used in some shape or form to fit my requirements.
Edit:
Some information about the server:
It's a remote server, so I'd need to be able to handle the error regarding Same Origin
It's not a standard webserver, it doesn't support HEAD (this is why I just wanted to open the connection and not send data)
If I try a GET request, it resets the connection
Even if I could just attempt a connection, and then get a refusal due to the above points, if I can catch the exception I could still use the fact it had connected to determine the latency.
Since we are running inside the application layer, we can only do a latency test over HTTP.
var xhr = new XMLHttpRequest()
xhr.open("HEAD", "/", false)
console.time("latency");
xhr.send();
console.timeEnd("latency");
Code description:
I create an synchronous AJAX request to the current host. I used "HEAD" as the method which is very lightweight and does not receive any contents. So, we can assume that the round-trip for a "HEAD" request is very close to actual ping over ICMP or TCP.
For URL, "/" (current host) is used. Because of Same Origin Policy, you cannot just use any domain like http://google.com unless you are allowed to do so.
I used console.time() and console.timeEnd() to measure the duration which is more accurate than using a regular Date. However, you can also use Date to measure the duration:
//...
var now = (new Date()).getTime()
xhr.send();
var duration = (new Date()).getTime() - now; //ms
UPDATE:
Try this code for measuring the latency even if an exception occurs:
try {
var xhr = new XMLHttpRequest()
xhr.open("HEAD", "http://google.com" + "/?" + Math.random(), false)
console.time("latency");
xhr.send();
}
catch (e){
//block exception
}
finally {
console.timeEnd("latency");
}
Please note that I have also added a random number at the end of the URL to prevent browser caching.