Why does `WebSocket.binaryType` exist? - javascript

WebSocket has a .binaryType property that determines whether received data is exposed as a Blob or an ArrayBuffer. Why? ArrayBuffer allows you to process the data in a synchronous fashion, but what's the benefit of Blob, and why is it the default?

An ArrayBuffer would have to be fully read before it could be used, and all of the data would have to be in memory at the same time. In contrast, a Blob can be read as a stream, avoiding having to have all the data in memory before you can do anything with it (or at all).
For many purposes, incremental processing of the data from a stream is what you want (such as playing audio or video).
For some purposes, you really need all the data before you can do anything with it, in which case you might use an ArrayBuffer.
As for why Blob is the default, I'll speculate that it's because it's more flexible. When you need incremental processing, you can get it via the Blob (using a stream) without waiting for all the data, and if you need all the data before you start, you can always get an ArrayBuffer from the Blob via its arrayBuffer method.

Related

Read a file into a specific block of memory in Node.js

Can you have Node.js read a file as binary into a specific slot of memory somehow?
let buffer = new Uint8Array
let location = 0
let path = 'foo.binary'
fs.readIntoMemoryLocation(path, buffer, location)
Is something like that possible?
I am wondering because I have a bunch of stuff to process in one array buffer, and would like to add a file to it, rather than the file being its own buffer. Without copying the file from one place to another either. If not in Node.js, then the browser somehow?
Can you have Node.js read a file as binary into a specific slot of memory somehow?
Yes, both the fileHandle.read() and the fs.read() interfaces allow you to pass in a buffer and offset into that buffer that you want it to read data into.
The fs.readFile() interface does not provide that so you will have to open the file yourself and read the data yourself into the buffer and then close the file. If you stat the file to see how big it is, you can still read the whole file into the buffer with one read operation.

Is copying a large blob over to a worker expensive?

Using the Fetch API I'm able to make a network request for a large asset of binary data (say more than 500 MB) and then convert the Response to either a Blob or an ArrayBuffer.
Afterwards, I can either do worker.postMessage and let the standard structured clone algorithm copy the Blob over to a Web Worker or transfer the ArrayBuffer over to the worker context (making effectively no longer available from the main thread).
At first, it would seem that it would be much preferable to fetch the data as an ArrayBuffer, since a Blob is not transferrable and thus, will need to be copied over. However, blobs are immutable and thus, it seems that the browser doesn't store it in the JS heap associated to the page, but rather in a dedicated blob storage space and thus, what's ended up being copied over to the worker context is just a reference.
I've prepared a demo to try out the difference between the two approaches: https://blobvsab.vercel.app/. I'm fetching 656 MB worth of binary data using both approaches.
Something interesting I've observed in my local tests, is that copying the Blob is even faster than transferring the ArrayBuffer:
Blob copy time from main thread to worker: 1.828125 ms
ArrayBuffer transfer time from main thread to worker: 3.393310546875 ms
This is a strong indicator that dealing with Blobs is actually pretty cheap. Since they're immutable, the browser seems to be smart enough to treat them as a reference rather than linking the overlying binary data to those references.
Here are the heap memory snapshots I've taken when fetching as a Blob:
The first two snapshots were taken after the resulting Blob of fetching was copied over the worker context using postMessage. Notice that neither of those heaps include the 656 MBs.
The latter two snapshots were taken after I've used a FileReader to actually access the underlying data, and as expected, the heap grew a lot.
Now, this is what happens with fetching directly as an ArrayBuffer:
Here, since the binary data was simply transferred over the worker thread, the heap of the main thread is small but the worker heap contains the entirety of the 656 MBs, even before reading this data.
Now, looking around at SO I see that What is the difference between an ArrayBuffer and a Blob? mentions a lot of underlying differences between the two structures, but I haven't found a good reference regarding if one should be worried about copying over a Blob between execution contexts vs. what would seem an inherent advantage of ArrayBuffer that they're transferrable. However, my experiments show that copying the Blob might actually be faster and thus I think preferable.
It seems to be up to each browser vendor how they're storing and handling Blobs. I've found this Chromium documentation describing that all Blobs are transferred from each renderer process (i.e. a page on a tab) to the browser process and that way Chrome can even offload the Blob to the secondary memory if needed.
Does anyone have some more insights regarding all of this? If I can choose to fetch some large binary data over the network and move that to a Web Worker should I prefer a Blob or a ArrayBuffer?
No, it's not expensive at all to postMessage a Blob.
The cloning steps of a Blob are
Their serialization steps, given value and serialized, are:
Set serialized.[[SnapshotState]] to value’s snapshot state.
Set serialized.[[ByteSequence]] to value’s underlying byte sequence.
Their deserialization step, given serialized and value, are:
Set value’s snapshot state to serialized.[[SnapshotState]].
Set value’s underlying byte sequence to serialized.[[ByteSequence]].
In other words, nothing is copied, both the snapshot state and the byte sequence are passed by reference, (even though the wrapping JS object is not).
However regarding your full project, I wouldn't advise using Blobs here for two reasons:
The fetch algorithm first fetches as an ArrayBuffer internally. Requesting a Blob adds an extra step there (which consumes memory).
You'll probably need to read that Blob from the Worker, adding yet an other step (which will also consume memory since here the data will actually get copied).

Read contents of Blob in Javascript websocket message

I'm writing WebSocket based clients servers.
I want to send raw binary data that is then decoded by the client.
My problem is two-fold:
while doing manipulations on binary data in c++ (the language I'm writing the servers in) is straight forward, it seems hard in Javascript.
I have found that you can use ArrayBuffers and UInt8Arrays to do most of the manipulations, and that works fine for sending messages.
My problem is that when I try receiving the messages if I declare the message as binary on the server side, it shows up as a Blob on the client. I have tried converting the Blob to an ArrayBuffer like so:
ws.onmessage = function(evt) {
var data = null;
data = await new Response(evt.data).arrayBuffer();
}
But this gives me the error:
SyntaxError: await is only valid in async functions and async generators
It seems that this method is asynchronous, and while I'm sure I could do it this way, it doesn't really seem that good in the first place.
I have realized that sending the data as the text makes evt.data appear as a string, which makes me believe that I can use a JSON format for the packets instead of a binary one.
I really don't want to use JSON though, as some of the packets will be full of a lot of information and I'm scared it will add unnecessary bloat.
I think what I really want is just to be able to read the evt.data as an ArrayBuffer, because it seems like that would be the most performant.
Use the then method of the promise
new Response(evt.data).arrayBuffer().then(buffer=> {
//here you have the buffer
})
then
arrayBuffer
Note: The await can be used inside functions declared with async keyword.

What is best practice to pass binary data between WebWorker and main thread?

To my knowledge you can only pass a string or a object which can be seralize as JSON.
Then what is the best way to pass some image files between workers and main page if I am using WebWorker to download them in the background?
you can only pass a string or a object which can be seralize as JSON.
Your premise is wrong. You can pass every kind of object that is supported by the structured clone algorithm, this can for example be circular-linked data which cannot be represented as JSON. You also can pass ArrayBuffers, ArrayBufferViews, PixelDataArrays, Blobs etc, all the data types that are not known to JSON.
The second parameter of the postMessage method of Workers does even allow you to transfer binary data (such as ArrayBuffers) directly to the worker - it won't even copy anything, though it will neuter the reference that you have.

What is the difference between an ArrayBuffer and a Blob?

I'm reading http://www.html5rocks.com/en/tutorials/file/xhr2/ and trying to figure out the difference between an ArrayBuffer and a Blob.
Aren't both containers comprised of bits? Hence, couldn't both containers be viewed in many ways (as 32-bit chunks, 16-bit chunks, etc.)?
Summary
Unless you need the ability to write/edit (using an ArrayBuffer), then Blob format is probably best.
Detail
I came to this question from a different html5rocks page., and I found #Bart van Heukelom's comments to be helpful, so I wanted to elevate them to an answer here.
I also found helpful resources specific to ArrayBuffer and Blob objects. In summary: despite the emphasis on Blob being immutable/"raw data" Blob objects are easy to work with.
Resources that compare / contrast ArrayBuffer vs Blob:
Mutability
an ArrayBuffer can be changed (e.g. with a DataView)
a Blob is immutable
Source / Availability in Memory
Quoting Bart van Heukelom:
An ArrayBuffer is in the memory, available for manipulation.
A Blob can be on disk, in cache memory, and other places not readily available
Access Layer
ArrayBuffer will require some access layer like typed arrays
Blob can be passed directly into other functions like window.URL.createObjectURL, as seen in the example from OP's URL.
However, as Mörre points out you may still need File-related interfaces and API's like FileReader to work with a Blob.
Convert / Generate
You can generate Blob from ArrayBuffer and vice versa, which addresses the OP's "Aren't both containers comprised of bits?"
ArrayBuffer can be generated from a Blob using the FileReader's readAsArrayBuffer method , or the async method const arrayBuffer = await blob.arrayBuffer() (thanks to #Darren G)
Blob can be generated from an ArrayBuffer as #user3405291 points out new Blob([new Uint8Array(data)]);, shown in
this answer
Use in Other Libraries
jsZip; (new JSZip()).loadAsync(...) accepts both ArrayBuffer and Blob: String/Array of bytes/ArrayBuffer/Uint8Array/Buffer/Blob/Promise
How does protocol handle ArrayBuffer vs Blob
Websocket (aka WS / WSS)
Use the webSocket's binaryType property (could have values "arraybuffer" or "blob") to "control the type of binary data being received over the WebSocket connection."
XmlHttpRequest (aka XHR)
Use the xhr's responseType property to "to change the expected response type from the server" (valid values include "arraybuffer", "blob", and others like "document", "json", and "text")
the response property will contain the entity body according to responseType, as an ArrayBuffer, Blob, Document, JSON, or string.
Other helpful documentation:
ArrayBuffer
The ArrayBuffer object is used to represent a generic, fixed-length
raw binary data buffer. You cannot directly manipulate the contents of
an ArrayBuffer; instead, you create one of the typed array objects or
a DataView object which represents the buffer in a specific format,
and use that to read and write the contents of the buffer.
Blob
A Blob object represents a file-like object of immutable, raw data.
Blob represent data that isn't necessarily in a JavaScript-native
format. The File interface is based on Blob, inheriting blob
functionality and expanding it to support files on the user's system.
It's explained on the page.
ArrayBuffer
An ArrayBuffer is a generic fixed-length container for binary data. They are super handy if you need a generalized buffer of raw data, but the real power behind these guys is that you can create "views" of the underlying data using JavaScript typed arrays. In fact, multiple views can be created from a single ArrayBuffer source. For example, you could create an 8-bit integer array that shares the same ArrayBuffer as an existing 32-bit integer array from the same data. The underlying data remains the same, we just create different representations of it.
BLOB
If you want to work directly with a Blob and/or don't need to manipulate any of the file's bytes, use xhr.responseType='blob':
If you are dealing with something that is more similar to an immutable file that may be retrieved, stored, or served as a file over HTTP, a Blob has a useful feature: blob.type (Web API docs, Nodejs docs). This returns a MIME type (such as image/png) that you can you use for your Content-Type HTTP header when serving the blob.

Categories