I'm reading http://www.html5rocks.com/en/tutorials/file/xhr2/ and trying to figure out the difference between an ArrayBuffer and a Blob.
Aren't both containers comprised of bits? Hence, couldn't both containers be viewed in many ways (as 32-bit chunks, 16-bit chunks, etc.)?
Summary
Unless you need the ability to write/edit (using an ArrayBuffer), then Blob format is probably best.
Detail
I came to this question from a different html5rocks page., and I found #Bart van Heukelom's comments to be helpful, so I wanted to elevate them to an answer here.
I also found helpful resources specific to ArrayBuffer and Blob objects. In summary: despite the emphasis on Blob being immutable/"raw data" Blob objects are easy to work with.
Resources that compare / contrast ArrayBuffer vs Blob:
Mutability
an ArrayBuffer can be changed (e.g. with a DataView)
a Blob is immutable
Source / Availability in Memory
Quoting Bart van Heukelom:
An ArrayBuffer is in the memory, available for manipulation.
A Blob can be on disk, in cache memory, and other places not readily available
Access Layer
ArrayBuffer will require some access layer like typed arrays
Blob can be passed directly into other functions like window.URL.createObjectURL, as seen in the example from OP's URL.
However, as Mörre points out you may still need File-related interfaces and API's like FileReader to work with a Blob.
Convert / Generate
You can generate Blob from ArrayBuffer and vice versa, which addresses the OP's "Aren't both containers comprised of bits?"
ArrayBuffer can be generated from a Blob using the FileReader's readAsArrayBuffer method , or the async method const arrayBuffer = await blob.arrayBuffer() (thanks to #Darren G)
Blob can be generated from an ArrayBuffer as #user3405291 points out new Blob([new Uint8Array(data)]);, shown in
this answer
Use in Other Libraries
jsZip; (new JSZip()).loadAsync(...) accepts both ArrayBuffer and Blob: String/Array of bytes/ArrayBuffer/Uint8Array/Buffer/Blob/Promise
How does protocol handle ArrayBuffer vs Blob
Websocket (aka WS / WSS)
Use the webSocket's binaryType property (could have values "arraybuffer" or "blob") to "control the type of binary data being received over the WebSocket connection."
XmlHttpRequest (aka XHR)
Use the xhr's responseType property to "to change the expected response type from the server" (valid values include "arraybuffer", "blob", and others like "document", "json", and "text")
the response property will contain the entity body according to responseType, as an ArrayBuffer, Blob, Document, JSON, or string.
Other helpful documentation:
ArrayBuffer
The ArrayBuffer object is used to represent a generic, fixed-length
raw binary data buffer. You cannot directly manipulate the contents of
an ArrayBuffer; instead, you create one of the typed array objects or
a DataView object which represents the buffer in a specific format,
and use that to read and write the contents of the buffer.
Blob
A Blob object represents a file-like object of immutable, raw data.
Blob represent data that isn't necessarily in a JavaScript-native
format. The File interface is based on Blob, inheriting blob
functionality and expanding it to support files on the user's system.
It's explained on the page.
ArrayBuffer
An ArrayBuffer is a generic fixed-length container for binary data. They are super handy if you need a generalized buffer of raw data, but the real power behind these guys is that you can create "views" of the underlying data using JavaScript typed arrays. In fact, multiple views can be created from a single ArrayBuffer source. For example, you could create an 8-bit integer array that shares the same ArrayBuffer as an existing 32-bit integer array from the same data. The underlying data remains the same, we just create different representations of it.
BLOB
If you want to work directly with a Blob and/or don't need to manipulate any of the file's bytes, use xhr.responseType='blob':
If you are dealing with something that is more similar to an immutable file that may be retrieved, stored, or served as a file over HTTP, a Blob has a useful feature: blob.type (Web API docs, Nodejs docs). This returns a MIME type (such as image/png) that you can you use for your Content-Type HTTP header when serving the blob.
Related
WebSocket has a .binaryType property that determines whether received data is exposed as a Blob or an ArrayBuffer. Why? ArrayBuffer allows you to process the data in a synchronous fashion, but what's the benefit of Blob, and why is it the default?
An ArrayBuffer would have to be fully read before it could be used, and all of the data would have to be in memory at the same time. In contrast, a Blob can be read as a stream, avoiding having to have all the data in memory before you can do anything with it (or at all).
For many purposes, incremental processing of the data from a stream is what you want (such as playing audio or video).
For some purposes, you really need all the data before you can do anything with it, in which case you might use an ArrayBuffer.
As for why Blob is the default, I'll speculate that it's because it's more flexible. When you need incremental processing, you can get it via the Blob (using a stream) without waiting for all the data, and if you need all the data before you start, you can always get an ArrayBuffer from the Blob via its arrayBuffer method.
Using the Fetch API I'm able to make a network request for a large asset of binary data (say more than 500 MB) and then convert the Response to either a Blob or an ArrayBuffer.
Afterwards, I can either do worker.postMessage and let the standard structured clone algorithm copy the Blob over to a Web Worker or transfer the ArrayBuffer over to the worker context (making effectively no longer available from the main thread).
At first, it would seem that it would be much preferable to fetch the data as an ArrayBuffer, since a Blob is not transferrable and thus, will need to be copied over. However, blobs are immutable and thus, it seems that the browser doesn't store it in the JS heap associated to the page, but rather in a dedicated blob storage space and thus, what's ended up being copied over to the worker context is just a reference.
I've prepared a demo to try out the difference between the two approaches: https://blobvsab.vercel.app/. I'm fetching 656 MB worth of binary data using both approaches.
Something interesting I've observed in my local tests, is that copying the Blob is even faster than transferring the ArrayBuffer:
Blob copy time from main thread to worker: 1.828125 ms
ArrayBuffer transfer time from main thread to worker: 3.393310546875 ms
This is a strong indicator that dealing with Blobs is actually pretty cheap. Since they're immutable, the browser seems to be smart enough to treat them as a reference rather than linking the overlying binary data to those references.
Here are the heap memory snapshots I've taken when fetching as a Blob:
The first two snapshots were taken after the resulting Blob of fetching was copied over the worker context using postMessage. Notice that neither of those heaps include the 656 MBs.
The latter two snapshots were taken after I've used a FileReader to actually access the underlying data, and as expected, the heap grew a lot.
Now, this is what happens with fetching directly as an ArrayBuffer:
Here, since the binary data was simply transferred over the worker thread, the heap of the main thread is small but the worker heap contains the entirety of the 656 MBs, even before reading this data.
Now, looking around at SO I see that What is the difference between an ArrayBuffer and a Blob? mentions a lot of underlying differences between the two structures, but I haven't found a good reference regarding if one should be worried about copying over a Blob between execution contexts vs. what would seem an inherent advantage of ArrayBuffer that they're transferrable. However, my experiments show that copying the Blob might actually be faster and thus I think preferable.
It seems to be up to each browser vendor how they're storing and handling Blobs. I've found this Chromium documentation describing that all Blobs are transferred from each renderer process (i.e. a page on a tab) to the browser process and that way Chrome can even offload the Blob to the secondary memory if needed.
Does anyone have some more insights regarding all of this? If I can choose to fetch some large binary data over the network and move that to a Web Worker should I prefer a Blob or a ArrayBuffer?
No, it's not expensive at all to postMessage a Blob.
The cloning steps of a Blob are
Their serialization steps, given value and serialized, are:
Set serialized.[[SnapshotState]] to value’s snapshot state.
Set serialized.[[ByteSequence]] to value’s underlying byte sequence.
Their deserialization step, given serialized and value, are:
Set value’s snapshot state to serialized.[[SnapshotState]].
Set value’s underlying byte sequence to serialized.[[ByteSequence]].
In other words, nothing is copied, both the snapshot state and the byte sequence are passed by reference, (even though the wrapping JS object is not).
However regarding your full project, I wouldn't advise using Blobs here for two reasons:
The fetch algorithm first fetches as an ArrayBuffer internally. Requesting a Blob adds an extra step there (which consumes memory).
You'll probably need to read that Blob from the Worker, adding yet an other step (which will also consume memory since here the data will actually get copied).
As said from the MDN Web Docs:
The Blob object represents a blob, which is a file-like object of immutable, raw data; they can be read as text or binary data, or converted into a ReadableStream so its methods can be used for processing the data.
I also know that a File object inherits some of the properties of a Blob and it can be used pretty much everywhere that Blob can. But, if File can be used in the same context as Blob, how should I choose between them? Are there some cases that one is preferable from another?
As said on the very page you linked:
The File interface is based on Blob, inheriting blob functionality and expanding it to support files on the user's system.
The File docs show that in addition to the blob properties, files also do have a lastModified date, a name, and possibly a path. It also states that
File objects are generally retrieved from a FileList object returned as a result of a user selecting files using the <input> element [or] from a drag and drop operation's DataTransfer object
Whenever you can actually choose which one to use, that means you are constructing them yourself. In contrast to the Blob constructor, the File constructor takes a non-optional name argument. So whenever you're constructing something that you'd give a file name to, use File.
I am streaming a file from a gRPC backend to my javascript client. The stream sends a series of UInt8Array objects. I want to combine this series of UInt8Array into one single arrayBuffer (i.e. the original file's arrayBuffer). How can I do this?
I'm actually storing data obtained via MediaRecorder from an audio stream into a Blob which, after being read by FileReader.readAsArrayBuffer(), is represented as a Int8Array. Is there any way to read it as an Int16Array instead?
The recording method has been extrated from here
Thanks in advance.
After being read by FileReader.readAsArrayBuffer(), is represented as a Int8Array
No, it's a buffer.
Is there any way to read it as an Int16Array instead?
No, but you can trivially construct any typed array "view" that you need on that buffer:
new Int16Array(fileReader.result)