I learned that under HTTP1.1, the max number of default simultaneous persistent connections per host name (origin?) is going to be 6, at least for chrome. I am not asking about the exact number of the limit since I know it varies from browser to browser. I am more curious about when we will open a new connection for new requests - does the browser reuse the same TCP connection somehow or it always starts a new TCP connection unless if it hasn't reached the limit of concurrent requests?
Let's say we are using HTTP1.1 and we have Connection: Keep-Alive
if in the html we have
<script src="https://foo/foo1.js"></script>
<script src="https://foo/foo2.js"></script>
<script src="https://foo/foo3.js"></script>
<script src="https://foo/foo4.js"></script>
<script src="https://foo/foo5.js"></script>
<script src="https://foo/foo6.js"></script>
<script src="https://foo/foo7.js"></script>
will each one of the scripts result in a new TCP connection established or all the subsequent requests will reuse the first TCP connection established by the first script tab? And if each one of these script result in a new TCP connection established, given the browser's limit for concurrent requests being 6, does the 7th request have to wait until the 6th request to be finished in order to establish the connection?
The above example is about initiating requests from HTML tags. What about api calls made from JavaScript? Let's in our javascript we have
const result1 = apiCall1()
const result2 = apiCall2()
const result3 = apiCall3()
const result4 = apiCall4()
const result5 = apiCall5()
const result6 = apiCall6()
const result7 = apiCall7()
And assume the endpoint that those API calls are hitting is all api.foo.com/v1/tasks, my questions are, again: will each one of the api call result in a new TCP connection established or all the subsequent requests will reuse the first TCP connection established by the first api call? And if each one of these api call result in a new TCP connection established, given the browser's limit for concurrent requests being 6, does the 7th request have to wait until the 6th request to be finished in order to establish the connection?
My last question is, compared to http1.1, does http2 address this problem by allowing sending many requests at the same time over one single TCP connection?
will each one of the scripts result in a new TCP connection established or all the subsequent requests will reuse the first TCP connection established by the first script tab?
Yes it would download them one by one, and start to open up more TCP connections to do that, up to the maximum of 6. The 7th request would have to wait for one of the connections to free up before it could be downloaded.
But the reality is, that the first request may have finished by the time later TCP connections are opened so it might not quite reach the 6 limit for only 6 or 7 requests.
What about api calls made from JavaScript? Let's in our javascript
Exact same thing. Limit of 6 per origin. Though one thing to note is certain CORS requests sent without credentials effectively counts as another origin (even though itβs the same actual origin) and so get another 6 connections.
My last question is, compared to http1.1, does http2 address this problem by allowing sending many requests at the same time over one single TCP connection?
Basically yes. Not quite at the same time due to the way TCP works, but as near as possible. See my answer here: What does multiplexing mean in HTTP/2
The process is simple, if you assign keep-alive the connection is remembered for faster handshake so a user can make many requests without having to re-open a costly secure connection.
Now there will always be the syn/ack process to make requests with the server. For the server to respond to every item your user requested a new connection is needed. There's bypassing this a little with cache to help your bandwidth and lessen the requests to server. All connections are ended upon request served.
So in a scenario 100 browsers want to hit your site, each request looks like 1.js 2.js... The output should be in order but this can greatly depend on a lot of things. Your language you're coding in server-sided, how it's handled, serves and if you manage any queues. If you make a request that requires longer processing (will get back to you in the future) other requests could go ahead as long as you're not blocking the event loop (comes down to your server).
Below you can see the process to establish a connection to the server, this is engaged each and every request. The cost to TLS can be improved but initial request is expensive.
Related
I am creating a question answering application using Node.js + Express for my back-end. Front-end sends the question data to the back-end, which in turn makes requests to multiple third-party APIs to get the answer data.
Problem is, some of those third-party APIs take too long to respond, since they have to do some intense processing and calculations. For that reason, i have already implemented a caching system that saves answer data for each different question. Nevertheless, that first request each time might take up to 5 minutes.
Since my back-end server waits and does not respond back to the front-end until data arrives (the connections are being kept open), it can only serve 6 requests concurrently (that's what I have found). This is unacceptable in terms of performance.
What would be a workaround to this problem? Is there a way to not "clog" the server, so it can serve more than 6 users?
Is there a design pattern, in which the servers gives an initial response, and then serves the full data?
Perhaps, something that sets the request to "sleep" and opens up space for new connections?
Your server can serve many thousands of simultaneous requests if things are coded properly and it's not CPU intensive, just waiting for network responses. This is something that node.js is particularly good at.
A single browser, however, will only send a few requests at a time (it varies by browser) to the same endpoint (queuing the others until the earlier ones finish). So, my guess is that you're trying to test this from a single browser. That's not going to test what you really want to test because the browser itself is limiting the number of simultaneous requests. node.js is particularly good at having lots of request in flight at the same time. It can easily do thousands.
But, if you really have an operation that takes up to 5 minutes, that probably won't even work for an http request from a browser because the browser will probably time out an inactive connection still waiting for a result.
I can think of a couple possible solutions:
First, you could make the first http request be to just start the process and have it return immediately with an ID. Then, the client can check every 30 seconds of so after that sending the ID in an http request and your server can respond whether it has the result yet or not for that ID. This would be a client-polling solution.
Second, you could establish a webSocket or socket.io connection from client to server. Then, send a message over that socket to start the request. Then, whenever the server finishes its work, it can just send the result directly to the client over the webSocket or socket.io connection. After receiving the response, the client can either keep the webSocket/socket.io connection open for use again in the future or it can close it.
In the Chrome developer tools, I notice that I have 6 TCP connections to a particular origin. The first 5 connections are idle from what I can tell. For the last of those connections, Chrome is making a call to our amazon S3 to get some images per the application logic. What I notice is that all the requests for that connection are queued till a certain point of time (say T1) and then the images are downloaded. Of course, this scenario is hard to reproduce, so I am looking for some hints on what might be going on.
My questions:
The connection in question does not have the "initial connection" in the timing information, which means that the connection might have been established before in a different tab. Is that plausible?
The other 5 connections for the same origin are to different remote addresses. Is that the reason they cannot be used to retrieve images that the 6th connection is retrieving?
Is there a mechanism to avoid this queueing delay in this scenario on the front end?
From the docs (emphasis mine)
A request being queued indicates that:
The request was postponed by the rendering engine because it's considered lower priority than critical resources (such as
scripts/styles). This often happens with images.
The request was put on hold to wait for an unavailable TCP socket that's about to free up.
The request was put on hold because the browser only allows six TCP connections per origin on HTTP 1. Time spent making disk cache
entries (typically very quick.)
This could be related to the amount of images you are requesting from your amazon service. According to this excerpt, requests on different origins should not impact each other.
If you are loading a lot of images, then considering sprite sheets or something may help you along - but that will depend on the nature of the images you are requesting.
Seems like you are making too many requests at once.
Since there is restriction on maximum number of active requests to 6 in HTTP 1.1 all other requests will get queued until the active requests get completed.
As alternative, you can use HTTP 2 / Speedy at Server which dosen't have any such restriction and many other benefits for applications making huge number of parallel requests.
You can easily enable HTTP 2 on nginx / apache.
How Websockets are implemented?
What is the algorithm behind this new tech (in comparison to Long-Polling)?
How can they be better than Long-Polling in term of performance?
I am asking these questions because here we have a sample code of Jetty websocket implementation (server-side).
If we wait long enough, a timeout will occur, resulting in the
following message on the client.
And that is definately the problem I'm facing when using Long-polling. It stops the process to prevent server overload, doesn't it ?
How Websockets are implemented?
webSockets are implemented as follows:
Client makes HTTP request to server with "upgrade" header on the request
If server agrees to the upgrade, then client and server exchange some security credentials and the protocol on the existing TCP socket is switched from HTTP to webSocket.
There is now a lasting open TCP socket connecting client and server.
Either side can send data on this open socket at any time.
All data must be sent in a very specific webSocket packet format.
Because the socket is kept open as long as both sides agree, this gives the server a channel to "push" information to the client whenever there is something new to send. This is generally much more efficient than using client-driven Ajax calls where the client has to regularly poll for new information. And, if the client needs to send lots of messages to the server (perhaps something like a mnulti-player game), then using an already open socket to send a quick message to the server is also more efficient than an Ajax call.
Because of the way webSockets are initiated (starting with an HTTP request and then repurposing that socket), they are 100% compatible with existing web infrastructure and can even run on the same port as your existing web requests (e.g. port 80 or 443). This makes cross-origin security simpler and keeps anyone on either client or server side infrastructure from having to modify any infrastructure to support webSocket connections.
What is the algorithm behind this new tech (in comparison to
Long-Polling)?
There's a very good summary of how the webSocket connection algorithm and webSocket data format works here in this article: Writing WebSocket Servers.
How can they be better than Long-Polling in term of performance?
By its very nature, long-polling is a bit of a hack. It was invented because there was no better alternative for server-initiated data sent to the client. Here are the steps:
The client makes an http request for new data from the client.
If the server has some new data, it returns that data immediately and then the client makes another http request asking for more data. If the server doesn't have new data, then it just hangs onto the connection for awhile without providing a response, leaving the request pending (the socket is open, the client is waiting for a response).
If, at any time while the request is still pending, the server gets some data, then it forms that data into a response and returns a response for the pending request.
If no data comes in for awhile, then eventually the request will timeout. At that point, the client will realize that no new data was returned and it will start a new request.
Rinse, lather, repeat. Each piece of data returned or each timeout of a pending request is then followed by another ajax request from the client.
So, while a webSocket uses one long-lived socket over which either client or server can send data to the other, the long-polling consists of the client asking the server "do you have any more data for me?" over and over and over, each with a new http request.
Long polling works when done right, it's just not as efficient on the server infrastructure, bandwidth usage, mobile battery life, etc...
What I want is explanation about this: the fact Websockets keep an
open connection between C/S isn't quite the same to Long Polling wait
process? In other words, why Websockets don't overload the server?
Maintaining an open webSocket connection between client and server is a very inexpensive thing for the server to do (it's just a TCP socket). An inactive, but open TCP socket takes no server CPU and only a very small amount of memory to keep track of the socket. Properly configured servers can hold hundreds of thousands of open sockets at a time.
On the other hand a client doing long-polling, even one for which there is no new information to be sent to it, will have to regularly re-establish its connection. Each time it re-establishes a new connection, there's a TCP socket teardown and new connection and then an incoming HTTP request to handle.
Here are some useful references on the topic of scaling:
600k concurrent websocket connections on AWS using Node.js
Node.js w/1M concurrent connections!
HTML5 WebSocket: A Quantum Leap in Scalability for the Web
Do HTML WebSockets maintain an open connection for each client? Does this scale?
Very good explanation about web sockets, long polling and other approaches:
In what situations would AJAX long/short polling be preferred over HTML5 WebSockets?
Long poll - request β wait β response. Creates connection to server like AJAX does, but keep-alive connection open for some time (not long though), during connection open client can receive data from server. Client have to reconnect periodically after connection is closed due to timeouts or data eof. On server side it is still treated like HTTP request same as AJAX, except the answer on request will happen now or some time in the future defined by application logic. Supported in all major browsers.
WebSockets - client β server. Create TCP connection to server, and keep it as long as needed. Server or client can easily close it. Client goes through HTTP compatible handshake process, if it succeeds, then server and client can exchange data both directions at any time. It is very efficient if application requires frequent data exchange in both ways. WebSockets do have data framing that includes masking for each message sent from client to server so data is simply encrypted. support chart (very good)
Overall, sockets have much better performance than long polling and you should use them instead of long polling.
3 mini questions regarding websocket connections
When the client sends data to the server there is latency. When the server sends data to the client is there latency or it is instant?
If the client sends data to the server VERY FAST in a specific row - let's say [1, 2, 3], is there any chance that, due to latency or other reasons, the data to be received by the server in a different row? ( like [2, 1, 3] )
(Same as question #2, but when the server sends the data)
Yes, there is latency. Its still a connection and there is still a chain to navigate. Latency only matters when things are changing and given that it takes X amount of time for the message to reach the client and another X ms for the client to do anything about it, its quite possible the state will change during those ms. In the same way that HTTP requests (WebSockets are just about the same thing) become 'hot', I believe the latency will diminish (all other things being equal) but it will still exist.
No, WebSockets are via TCP, so they'll be in order. UDP transport is fire-and-forget, it doesnt send any notification of receipt and it doesnt regenerate the packets using timing information, so you can send messages faster but can make no assumptions regarding receipt or order or events. Page impressions would be a great example of where you dont care really in what order and you probably dont care too much about when the server receives such a message, WebRTC may bring UDP connections between JS and server but the standard is still emerging. For now, WebSockets connect via an HTTP upgrade, meaning they are TCP, where order information and receipt confirmation is a thing (larger and more messages being sent to and fro).
Same answer! It all happens over TCP so the whole trip is a round-trip, but order is assured.
Let's say my browser post a HTTP request to a domain, before this request finish, another different request (by ajax) was send to the same domain. Since the first request still on-going and not yet terminated, will that mean second request will have to wait first request to finish in order to use the persistent connection that being used by first request? If it is, how to prevent this? If I have a long streaming connection in the first request, does that mean second request will need to hang around for long time?
(Let's assume the maximum persistent connection for browser is one. Actually I don't really understand what this "max persistent connection" does. Does it mean when the persistent connection is over the maximum number, the rest of connection will become non-persistent ? Confusing...)
Can anyone explain this?
Since the first request still on-going and not yet terminated, will that mean second request will have to wait first request to finish in order to use the persistent connection that being used by first request?
No. The two requests are still asynchronous and in parallel (unless the server limits this).
HTTP Keep Alive only means that they are faster because both requests can use the same connection, especially when pipelining them.
However, if there is no pipelining, the browser could also decide to open a second connection for the second request, instead of waiting for the first request to finish and reusing its connection. See Under what circumstances will my browser attempt to re-use a TCP connection for multiple requests? for details.
I don't really understand what this "max persistent connection" does. Does it mean when the persistent connection is over the maximum number, the rest of connection will become non-persistent?
No. When the limit is reached, new requests will have to wait until a connection from the pool becomes usable again.