Websocket frame size limitation

Websocket frame size limitation - javascript

I'm sending huge chunks of JSON data through websockets. The JSON may have over 1000 entries. Due to the frame size limitation, the Websocket protocol automatically splits the JSON into frames, which cannot be helped. As we cannot change the frame size of websockets.
The problem:
When I try to evaluate my JSON using JSON.parse it gives me a parsing error which is obvious because the frames are not complete JSON objects. All this happens in the Websocket onmessage event callback. How can I recieve the huge JSON in differnt frames and still be able to parse it?
I have tried to concat the frames in onmessage still the error persists.
Side question:
How to concatinate a broken JSON properly?

A single WebSocket frame, per RFC-6455 base framing, has a maximum size limit of 2^63 bytes (9,223,372,036,854,775,807 bytes ~= 9.22 exabytes) (correction by #Sebastian)
However, a WebSocket message, made up of 1 or more frames, has no limit imposed on it from the protocol level.
Each WebSocket implementation will handle message and frame limits differently. Such as setting maximum messages sizes for whole message (usually for memory consumption reasons), or offering streaming options for large messages to better utilize memory.
But in your case, it is likely that your chosen WebSocket implementation has a bug and is improperly splitting up the JSON message into multiple messages, instead of multiple frames. You can use the network inspection tooling in Chrome or an external tool like Wireshark to confirm this behavior.

var wsServer = new websocket.server({
httpServer: server,
maxReceivedFrameSize: 131072,
maxReceivedMessageSize: 10 * 1024 * 1024,
autoAcceptConnections: false
});
Change the default maxFrameSize & MessageSize

Since you are dealing with WS, which is low-level, you need to create an application protocol that deals with data that is sent over multiple WS frames. It is up to you to concatenate the data that is in each WS frame (btw, don't concat the frames... concat the data in each frame).
Basically you are reinventing a file transfer protocol.

Related

Messages being delayed when using websockets

I have a program which is using the Websocket TCP: The client is an extension in Chrome and the server is an application written in C++.
When I send small data from the client to the server, it works fine. But when I send large amounts of data (e.g. a source html page), it will be slightly delayed.
For Example:
Client sends: 1,2,3
Server receives: 1,2
Client sends: 4
Server receives: 3
Client sends: 5
Server receives: 4
It's seems like it's a delay.
This is my code client:
var m_cWebsocket = new WebSocket("Servername");
if (m_cWebsocket == null) { return false; }
m_cWebsocket.onopen = onWebsocketOpen(m_cWebsocket); m_cWebsocket.onmessage = onWebsocketMessage;
m_cWebsocket.onerror = onWebsocketError;
m_cWebsocket.onclose = onWebsocketError;
I using m_cWebsocket.send(strMsg) to send data.
Server code
while (true) { recv(sSocket, szBufferTmp, 99990, 0); //recv(sSocket,
szBufferTmp, 99990, MSG_PEEK); //some process }

Since you haven't posted any code to show your implementation of the TCP server or client I can only speculate and try to explain what might be going on here.
That means the potential problems and solutions I outline below may or may not apply to you, but regardless this information should still be helpful to others who might find this question in the future.
TL;DR: (most likely) It's either the server is too slow, the server is not properly waiting for complete 'tcp packets' to be buffered, or the server doesn't know when to properly start and stop and is de-synching while it waits for what it thinks is a 'full packet' as defined by something like a buffer size.
It sounds to me like you are pushing data from the client either faster than the server the server can read, or more likely, the server is buffering a set number of bytes from the current TCP Stream and waiting for the buffer to fill before outputting additional data.
If you are sending this over localhost it's unlikely you are not close to limit of the stream though, and I would expect a server written in C++ would be able to keep up with the javascript client.
So this leads me to believe that the issue is in fact the stream buffer on the C++ side.
Now since the server has no way to know to what data you are sending and or how much of it you are sending, it is common for a TCP stream to utilize a stream buffer that contiguously reads data from the socket until either the buffer has filled to a known size, or until it sees a predefined 'stop character'. This would usually be something like a "line end" or \n character, sometimes \n\r (line feed, carriage feed) depending on your operating system.
Since you haven't specified how you are receiving your data, I'm going to just assume you created either a char or byte buffer of a certain size. I'm a pretty rusty on my C++ socket information so I might be wrong, but I do believe there is a default 'read timeout' on C++ tcp streams as well.
This means you are possibly running into 1 of 2 issues.
Situation 1) You are waiting until that byte/char buffer is filled before outputing it's data. Issue is that will act like a bus that only leaves the station when all seats are filled. If you don't fill all the seats, you server is just sitting and waiting until it gets more data to fill up fully and output your data.
Situation 2) You are running up against the socket read timeout and therefore the function is not getting all the data before outputting the data. This is like a bus that is running by the clock. Every 10 minutes that bus leaves the station, doesn't matter if that bus is full or empty, it's leaving and the next bus will pick up anyone who shows up late. In your case, the TCP stream isn't able to load 1, 2 and 3 onto a bus fast enough, so the bus leaves with just 1, 2 on it because after 20ms of not receiving data, the server is exiting from the function and outputing the data. On the next loop however, there is 3 waiting at the top of the stream buffer ready to get on the next bus out. The Stream will load 3, wait til those 20ms are finished, and then exit before repeating this loop.
I think it's more likely the first situation is occurring though, as I would expect the server to either start catching up, or falling further behind as the 2 servers either begin to sync together, or have internall TPC stream buffer fill up as the server falls further and further behind.
Main point here, you need some way to synchronize the client and the server connections. I would recommend sending a "start byte" and "End byte" to single when a message has begun and finished, so you don't exit the function too early.
Or send a start byte, followed by the packet size in bytes, then filling up the buffer until your buffer has the correct numbers of bytes. Additionally you could include an end byte as well for some basic error checking.
This is a pretty involved topic and hard to really give you a good answer without any code from you, but this should also help anyone in the future who might be having a similar issue.
EDIT I went back and re-read your question and noticed you said it was only with large amounts of data, so I think my original assumption was wrong, and it's more likely situation 2 because the client is sending the data to your server faster than the server can read it, and thus might be bottle necking the connection and the client is only able to send additional data once the server has emptied part of it's TCP stream buffer.
Think of it like a tube of of water. The socket (tube) can only accept (fill up) with so much data (water) before it's full. Once you let some water out the bottom though, you can fill it up a little bit more. The only reason it works for small files is that the file is too small to fill the entire tube.
Additional thoughts: You can see how I was approaching this problem in C# in this question: Continuously reading from serial port asynchronously properly
And another similar question I had previously (again in C#): How to use Task.WhenAny with ReadLineAsync to get data from any TcpClient
It's been awhile since I've played with TCP streams though, so my apologies in that I don't remember all the niche details and caveats of the protocal, but hopefully this information is enough to get you in the ball park for solving your problem.
Full disclaimer, it's been over 2 years since I last touched C++ TCP sockets, and have since worked with sockets/websockets in other languages (such as C# and JavaScript), so I may have some facts wrong about the behavior of C++ TCP sockets specifically, but the core information should still apply. If I got anything wrong, someone in the comments will most likely have the correct information.
Lastly, welcome to stack overflow!

Result: v8 request size exceeds limits. Max allowed 1048576 current request is 1309246

Result: v8 request size exceeds limits. Max allowed 1048576 current request is 1309246.
This is the result i get when i try to send 350kb file 3 times in Base64 format in one request to Parse Cloud Code function. What does it mean? Is it possible to send it somehow?
It's important for me, because i'm using Parse as my server and i'd like to keep my device independent from file storage because of Parse limits (i want to use external storages like amazon in future).

Is it expensive/efficient to send data between processes in Node?

Node allows you to spawn child processes and send data between them. You could use it do execute some blocking code for example.
Documentation says "These child Nodes are still whole new instances of V8. Assume at least 30ms startup and 10mb memory for each new Node. That is, you cannot create many thousands of them."
I was wondering if is it efficient, should I worry about some limitations? Here's example code:
//index.js
var childProcess1 = childProcess.fork('./child1.js');
childProcess1.send(largeArray);
childProcess1.once('message', function(formattedData) {
console.log(formattedData);
return false;
});
//child1.js
process.on('message', function(data) {
data = format(data); //do smth with data, then send it back to index.js
try{
process.send(data);
return false;
}
catch(err){
console.log(err);
return false;
}
});

The documentation is telling you that starting new node processes is (relatively) expensive. It is unwise to fork() every time you need to do work.
Instead, you should maintain a pool of long-running worker processes – much like a thread pool. Queue work requests in your main process and dispatch them to the next available worker when it goes idle.
This leaves us with a question about the performance profile of node's IPC mechanism. When you fork(), node automatically sets up a special file descriptor on the child process. It uses this to communicate between processes by reading and writing line-delimited JSON. Basically, when you process.send({ ... }), node JSON.stringifys it and writes the serialized string to the fd. The receiving process reads this data until hitting a line break, then JSON.parses it.
This necessarily means that performance will be highly dependent on the size of the data you send between processes.
I've roughed out some tests to get a better idea of what this performance looks like.
First, I sent a message of N bytes to the worker, which immediately responded with a message of the same length. I tried this with 1 to 8 concurrent workers on my quad-core hyper-threaded i7.
We can see that having at least 2 workers is beneficial for raw throughput, but more than 2 essentially doesn't matter.
Next, I sent an empty message to the worker, which immediately responded with a message of N bytes.
Surprisingly, this made no difference.
Finally, I tried sending a message of N bytes to the worker, which immediately responded with an empty message.
Interesting — performance does not degrade as rapidly with larger messages.
Takeaways
Receiving large messages is slightly more expensive than sending them. For best throughput, your master process should not send messages larger than 1 kB and should not receive messages back larger than 128 bytes.
For small messages, the IPC overhead is about 0.02ms. This is small enough to be inconsequential in the real world.
It is important to realize that the serialization of the message is a synchronous, blocking call; if the overhead is too large, your entire node process will be frozen while the message is sent. This means I/O will be starved and you will be unable to process any other events (like incoming HTTP requests). So what is the maximum amount of data that can be sent over node IPC?
Things get really nasty over 32 kB. (These are per-message; double to get roundtrip overhead.)
The moral of the story is that you should:
If the input is larger than 32 kB, find a way to have your worker fetch the actual dataset. If you're pulling the data from a database or some other network location, do the request in the worker. Don't have the master fetch the data and then try to send it in a message. The message should contain only enough information for the worker to do its job. Think of messages like function parameters.
If the output is larger than 32 kB, find a way to have the worker deliver the result outside of a message. Write to disk or send the socket to the worker so that you can respond directly from the worker process.

This really depends on your server resources and the number of nodes you need to spin up.
As a rule of thumb:
Try reusing running children as much as possible - this will save you 30ms start up time
Do not start unlimited number of children (1 per request for instance) - you will not run out of RAM
The messaging itself it relatively fast i believe. Would be great to see some metrics though.
Also, note that if you have single CPU or running a cluster (using all available cores) it doesn't make much sense. You still have limited CPU capacity and switching context is more expensive than running single process.

Rate-limiting the data sent down WebSockets

We're sending a lot of data down a websocket (from a Node.js app to the web browser).
The data is binary data in the form of blobs.
Occasionally, the end-user is on a poor connection - and in this case, we'd like to 'skip' messages (leave them out) and make sure we don't cram down more data than the user can receive.
On the server side, we have tried:
function sendBlob(blob, socket) {
console.log('socket.bufferedAmount: ' + socket.bufferedAmount); // Always zero
if (socket.bufferedAmount > 0) {
return; // Never called
}
socket.send(blob);
}
Unfortunately bufferedAmount always returns zero.
Is this the right way to see how much data is being queued but not sent/received in websockets, or is there a better way to achieve this?
(Have also tried logging socket.bufferedAmount on the client-side, but it also always returns zero).

The socket.bufferedAmount property that exists on clients (as well as the ws module for Node) is the amount of bytes that it itself has buffered, not the remote. That means socket.bufferedAmount on the server means how many bytes that are waiting to be sent to the client, and for the client it is how many bytes are waiting to be sent to the server.
The reason you aren't getting any change in the property is that your network is probably indeed sufficient to deliver the data. If you actually want to see a difference in socket.bufferedAmount, then try throttling your browser network access. This can be done with browser extensions or tools like NetLimiter.
If you want to throttle connections by skipping messages, you can think about creating some type of heartbeat system between the client and server. There are many ways you could do this, such as applying this function:
setInterval(function() {
if (socket.bufferedAmount == 0) {
socket.send('heartbeat');
}
}, 1000);
And then detecting missed heartbeats by counting the time interval. This is rather inefficient, but there's also other ways to do this such as responding to sent data from the server (although take into consideration that if you want to send a heartbeat when receiving data, the heartbeat itself might get throttled or other side effects).
An alternative solution would also be available if you were willing to switch to Socket.IO. It has a feature that allows you to send volatile messages, which are messages that are dropped if the client is busy or is not able to accept messages for any reason.
var io = require('socket.io').listen(80);
io.sockets.on('connection', function (socket) {
var timer = setInterval(function () {
socket.volatile.emit('data', 'payload');
}, 100);
socket.on('disconnect', function () {
clearInterval(timer);
});
});
Do note that Socket.IO will be heavier on your application, and doesn't use the native websocket protocol. It will utilize websockets when it is an option, but it is one of many transports. Socket.IO is built on Engine.IO, which uses a fork of the module you're currently using.

The readonly attribute bufferedAmount represents the number of bytes of UTF-8 text that have been queued using send() method.
And your case here shows that you are trying to access it on message received from server.
so the bufferedAmount is not set.

Websocket: Browser dosn't seem to receive data from python server

I'm working on creating a websocket server via python (I'm kinda new to python) and I've made a significant progress, but I am unable to send data to the web browser. I can establish a connection and receive data from the browser, but I cannot send back data. The browser just ignores it. I would assume that if the browser received a package that didn't follow the specifications, it would terminate the connection, but the connection stays active.
Here is the method I am using to encode the data into the frame:
def encode_message(data):
frame = "\x81"
size = len(data)
if size * 8 <= 125:
frame += chr(size)
else:
raise Exception("Uh, oh. Strings larger than 125 bits are not supported")
return frame + data
I am sending the data using sock.sendall(framed_data). What could be the problem? The data for a message like "yo" ends up being 10000001 00000010 01111001 01101111 (spaces added for improved readability). Why doesn't the browser accept a message like this? Doesn't it follow the guidelines outlined in the specification? I am trying to support the most recent websocket version which I believe to be version 13. I am using python version 2.7.3.
I have tried to look at python websocket libraries' source code, but all of them seem to implement a deprecated version of the websocket protocol that has been shown to have vulnerabilities.
Here is the code that calls the function above:
def send(data):
frame = encode_message(data)
print "Sending all..."
sock.sendall(frame) #Socket that handles all communications with client
print "Frame sent :)"
return
I also downloaded wireshark to sniff the packages sent between the server and the socket. The packages sent by my server are identical to those sent from a server that is accepted by the browser. I couldn't see any difference at all. (I looked directly at the hex source)

The second byte of your transmitted message (and the length check in your code) looks wrong. The length of a message is in bytes, not bits.
From RFC6455 §5.2 (my emphasis)
Payload length: 7 bits, 7+16 bits, or 7+64 bits
The length of the "Payload data", in bytes: if 0-125, that is the
payload length.
The reason that nothing is received in the browser is that your message claims to have a 16 byte body. The browser will read the 2 additional bytes you send then block waiting for another 14 bytes that it expects but you don't send.
If you change the second byte to the number of bytes in the message - 0x2 or 00000010 binary - then things should work.

I finally figured out the problem! It took hours of unfun debugging and messing with my code. After closely examining the packages sent back and forth between the server and client I finally realized that there was a problem with my server's connection upgrade response. Whenever it computed a hash, it also added a \n to the end of it. That resulted in a \n\r\n at the end of one of the lines. The client interpreted that as the end of that transmission and everything that followed was parsed using WebSocket protocol. I had another line after that in the header, so it totally messed up my communications with the client. I could still read from the client, but if I tried to write to the client, the data would get messed up.

We Keep Coding

JavaScript is the programming language of the Web.