WebSockets vs XHR for large amounts of data

WebSockets vs XHR for large amounts of data - javascript

I am running SocketIO on NodeJS and I don't care much about wide browsers support as it's my pet project where I want to use all the power of new technologies to ease the development. My concern is about how I should send large amounts of JSON data from server to client and back. Well, these amounts are not as large as could be for video or image binary data, I suppose not larger than hundreds of kilobytes per request.
Two scenarios I see are:
Send a notification via WebSockets from server to client that some data should be fetched. Then client code runs a regular XHR request to server and gets some data via XHR.
Send the whole data set over WebSockets from server to client. In this case I don't need to run any additional requests - I just get all the data via WebSockets.
I saw first case in Meteor.js, so I wondered the reasons of it.
Please share your opinion.

Websockets should support large data sets (up to 16 exabyte in theory), so from that point of view it should work fine. The advantage of XHR is that you will be able to observe progress over time and in general better tested for large data blocks. For example, I have seen websocket server implementations which (thinking retrospectively) wouldn't handle large data well, because they would load the entire data into memory (rather than streaming the data), but that's of course not necessarily the case for socket.io (dunno). Point in case: try it out with socket.io whilst observing memory usage and stability. If it works, definitely go with websockets, because long term the support for big data packages will only get better and definitely not worse. If it turns out to be unstable or if socket.io can't stream larger data files, then use the XHR construct.
Btw, just a google search turned up siofile, haven't looked into it that much, but it might be just the thing you need.

Related

Performance - How big is too big for an ajax request?

I have a web app that's constantly sending and requesting JSON objects to/from the server. These JSON objects can get as big as 20-40kb, and these requests might happen once every 5 to 20 seconds, depending on the user interaction.
I decided to keep my processing on the client side, so the user can use my web app without having to keep an active internet connection, but I need to sync the data to the server every once in a while. I couldn't think of a better solution than storing/processing data in the client as javascript objects and eventually saving them as json on a server. (this would also enable me to serve theses objectes with an API to mobile applications in the future)
I'd like to know how having these relatively large JSON data back and forth could make my application worse in performance, in comparison to just sending simple ajax request of a few bytes and doind all the processing on the server, and how could I make this more optimized?

20-40Kb size JSON objects for requests is pretty small according to tests done by Josh Zeigler, where the DOM Ready even took less than 62milliseconds (MAX, in IE) across 4 Major browsers for a 40KB JSON payload.
The tests were done on a 2011 2.2GHz i7 MacBook Pro with 8GB of RAM.
Here's the detailed test and test results: How Big is TOO BIG for JSON? Credit: Josh Zeigler

How does WebSockets scale compared to standard HTTP?

How does a site programmed using TCP (that is, someone on the site is connected to the server and exchanging information via TCP) scales compared to just serving information via AJAX? Say the information exchanged is the same.
Trying to clarify: I'm asking specifially about scale: I've read that keeping thousands of TCP connections is resources (which?) demanding, as compared to just serving information statically. I want to know if this correct.

WebSockets is a technology that allows the server to push notifications to the client. AJAX on the other hand is a pull technology meaning that the client is sending requests to the server.
So for example if you had an application which needed to receive notifications from the server at regular intervals and update its UI, WebSocket is more adapted and much better. With AJAX you will have to hammer your server with requests at regular intervals to see whether some state changed on the server. With WebSockets, it's the server that will notify the client for some event happening on the server. And this will happen in a single request.
So I guess it would really depend on the type of application you are developing but WebSockets and AJAX are two completely different technologies solving different kind of problems. Which one to choose would depend on your scenario.

Websockets are not a one-for-one with AJAX; they offer substantially different features. Websockets offers the ability to 'push' data to the client. AJAX works by 'pushing' data and returning a response.
The purpose of WebSockets is to provide a low-latency, bi-directional, full-duplex and long-running connection between a browser and server. WebSockets opens up possibilities with browser applications that were previously unavailable using HTTP or AJAX.
However, there is certainly an overlap in purpose between WebSockets and AJAX. For example, when the browser wants to be notified of server events (i.e. push) either AJAX or WebSockets are both viable options. If your application needs low-latency push events then this would be a factor in favor of WebSockets which would definitely scale better in this scenario. On the other hand, if you need to work with existing frameworks and deployed technologies (OAuth, RESTful API's, proxies, etc.) then AJAX is preferable.
If you don't need the specific benefits that WebSockets provides, then it's probably a better idea to stick with existing techniques like AJAX because this allows you to re-use and integrate with an existing ecosystem of tools, technologies, security mechanisms, knowledge bases that have been developed over the last 7 years.
But overall, Websockets will outperform AJAX by a significant factor.

I don't think there's any difference when it comes to scalability between WebSockets and standards TCP connnections. WebSocket is an upgrade from a static one way pipe into a duplex one. The physical resources are the exact same.
The main advantage of WebSockets is that they run over port 80, so it avoids most firewall problems, but you have to first connect over standard HTTP.

Here's a good page that clearly shows the benefits of the WebSocket API compared to Ajax long polling (especially on a large scale): http://www.websocket.org/quantum.html
It basically comes down to the fact that once the initial HTTP handshake is established, data can go back and forth much more quickly because the header overhead is greatly reduced (this is what most people refer to as bidirectional communication).
As an off note, if you only need to be able to push data from the server on a regular basis, but you don't need to make many client-initiated requests, then using HTML5 server-sent events with occasional Ajax requests from the client might be just what you need and much easier to implement then the WebSocket API.

Transferring large amounts of json over http

I have large amounts (gigabytes) of json data that I would like to make available via a restful web service. The consumer of the data will be another service, and it will all be happening on a server (so the browser is not involved). Is there a practical limit on how much data can be transferred over http? Will http timeouts start to occur, or is that more a function of a browser?

There's no size limit for HTTP body. Just like downloading a huge file through web browser. And timeout is a setting of socket connection, over which HTTP is built, so it is not a browser specified feature.
However, I've met the same issue with transporting quite large json object. What need to be considered are network load, serialize/deserialize time, and memory cost. The whole process is slow (2GB of data, via intranet, using JSON.NET and some calculation we take 2-3 minutes) and it costs quite large memory. Fortunately, we just need to do that once everyday and it is a back end process. So we don't pay more attention on it. We just use sync mode for HTTP connection and set a long timeout value to prevent timeout exception (Maybe async is a good choice).
So I think it depends on your hardware and infrastructure.

Chunking WebSocket Transmission

since I'm using WebSocket connections on more regular bases, I was interested in how things work under the hood. So I digged into the endless spec documents for a while, but so far I couldn't really find anything about chunking the transmission stream itself.
The WebSocket protocol calls it data frames (which describes the pure data stream, so its also called non-control frames). As far as I understood the spec, there is no defined max-length and no defined MTU (maximum transfer unit) value, that in turn means a single WebSocket data-frame may contain, by spec(!), an infinite amount of data (please correct me if I'm wrong here, I'm still a student on this).
After reading that, I instantly setup my little Node WebSocket server. Since I have a strong Ajax history (also on streaming and Comet), my expectations originaly were like, "there must be some kind of interactive mode for reading data while it is transfered". But I am wrong there, ain't I ?
I started out small, with 4kb of data.
server
testSocket.emit( 'data', new Array( 4096 ).join( 'X' ) );
and like expected this arrives on the client as one data-chunk
client
wsInstance.onmessage = function( data ) {
console.log( data.length ); // 4095
};
so I increased the payload and I actually was expecting again, that at some point, the client-side onmessage handler will fire repeatly, effectivley chunking the transmission. But to my shock, it never happened (node-server, tested on firefox, chrome and safari client-side). My biggest payload was 80 MB
testSocket.emit( 'data', new Array( 1024*1024*80 ).join( 'X' ) );
and it still arrived in one big data-chunk on the client. Of course, this takes a while even if you have a pretty good connection. Questions here are
is there any possiblity to chunk those streams, similar to the XHR readyState3 mode ?
is there any size limit for a single ws data-frame ?
are websockets not supposed to transfer such large payloads? (which would make me wonder again why there isn't a defined max-size)
I might still look from the wrong perspective on WebSockets, probably the need for sending large data-amounts is just not there and you should chunk/split any data logically yourself before sending ?

First, you need to differentiate between the WebSocket protocol and the WebSocket API within browsers.
The WebSocket protocol has a frame-size limit of 2^63 octets, but a WebSocket message can be composed of an unlimited number of frames.
The WebSocket API within browsers does not expose a frame-based or streaming API, but only a message-based API. The payload of an incoming message is always completely buffered up (within the browser's WebSocket implementation) before providing it to JavaScript.
APIs of other WebSocket implementations may provide frame- or streaming-based access to payload transferred via the WebSocket protocol. For example, AutobahnPython does. You can read more in the examples here https://github.com/tavendo/AutobahnPython/tree/master/examples/twisted/websocket/streaming.
Disclosure: I am original author of Autobahn and work for Tavendo.
More considerations:
As long as there is no frame/streaming API in browser JS WebSocket API, you can only receive/send complete WS messages.
A single (plain) WebSocket connection cannot interleave the payload of multiple messages. So i.e. if you use large messages, those are delivered in order, and you won't be able to send small messages in between while a big message is still on the fly.
There is an upcoming WebSocket extension (extensions are a builtin mechanism to extend the protocol): WebSocket multiplexing. This allows to have multiple (logical) WebSocket connections over a single underlying TCP connection, which has multiple advantages.
Note also: you can open multiple WS connections (over different underlying TCPs) to a single target server from a single JS / HTML page today.
Note also: you can do "chunking" yourself in application layer: send your stuff in smaller WS messages a reassemble yourself.
I agree, in an ideal world, you'd have message/frame/streaming API in browser plus WebSocket multiplexing. That would give all the power and convenience.

RFC 6455 Section 1.1:
This is what the WebSocket Protocol provides: [...] an alternative to HTTP polling for two-way communication from a web page to a remote server.
As stated, WebSockets are for commmunications between a web page and a server. Please note the difference between a web page and a web browser. Examples being used are browser games and chat applications, who excange many small messages.
If you want to send many MB's in one message, I think you're not using WebSockets the way they were intended. If you want to transfer files, then do so using a Plain Old Http Request, answered with Content-Disposition to let the browser download a file.
So if you explain why you want to send such large amounts of data, perhaps someone can help come up with a more elegant solution than using WebSockets.
Besides, a client or server may refuse too large messages (although it isn't explicitly stated how it'll refuse):
RFC 6455 Section 10.4:
Implementations that have implementation- and/or platform-specific
limitations regarding the frame size or total message size after
reassembly from multiple frames MUST protect themselves against
exceeding those limits. (For example, a malicious endpoint can try
to exhaust its peer's memory or mount a denial-of-service attack by
sending either a single big frame (e.g., of size 2**60) or by sending
a long stream of small frames that are a part of a fragmented
message.) Such an implementation SHOULD impose a limit on frame
sizes and the total message size after reassembly from multiple
frames.

Multiple websocket connections

Is there any advantages of having two distinct websocket connections to the same server from the same client? To me this seems a bad design choice, but is there any reason why/where it should work out better?

There are several reasons why you might want to do that but they probably aren't too common (at least not yet):
You have both encrypted and unencrypted data that you are sending/receiving (e.g. some of the data is bulky but not sensitive).
You have both streaming data and latency sensitive data: imagine an interactive game that occasionally has streamed video inside the game. You don't want large media streams to delay receipt of latency sensitive normal game messages.
You have both textual (e.g. JSON control messages) and binary data (typed arrays or blobs) and don't want to bother with adding your own protocol layer to distinguish since WebSockets already does this for you.
You have multiple WebSocket sub-protocols (the optional setting after the URI) that you support and the page wants to access more than one (each WebSocket connection is limited to a single sub-protocol).
You have several different WebSocket services sitting behind the same web server and port. The way the client chooses per connection might depend on URI path, URI scheme (ws or wss), sub-protocol, or perhaps even the first message from client to server.
I'm sure there are other reasons but that's all I can think of off the top of my head.

I found that it can make client logic much simpler when you are only subscribing to updates of certain objects being managed by the server. Instead of devising a custom subscription protocol for a single channel, you can just open a socket for each element.
Let's say you obtained a collection of elements via a REST API at
http://myserver/api/some-elements
You could subscribe to updates of a single element using a socket url like this:
ws://myserver/api/some-elements/42/updates
Of course one can argue that this doesn't scale for complex pages. However, for small and simple appications it might make your life a lot easier.

We Keep Coding

JavaScript is the programming language of the Web.