WebAssembly.instantiateStreaming is the fastest way to download and instantiate a .wasm module however for large .wasm files it can still take a long time. Simply displaying a spinner does not provide enough user feedback in this case.
Is there a way to use the WebAssembly.instantiateStreaming api and get some form of progress event so that an eta can displayed to the user? Ideally I would like to be able to display a percentage progress bar / estimated time left indicator so user's know how long they will have to wait.
Building off the answer here.
To get the progress of WebAssembly.instantiateStreaming / WebAssembly.compileStreaming create a new Fetch Response with a custom ReadableStream which implements it's own controller.
Example:
// Get your normal fetch response
var response = await fetch('https://www.example.com/example.wasm');
// Note - If you are compressing your .wasm file the Content-Length will be incorrect
// One workaround is to use a custom http header to manually specify the uncompressed size
var contentLength = response.headers.get('Content-Length');
var total = parseInt(contentLength, 10);
var loaded = 0;
function progressHandler(bytesLoaded, totalBytes)
{
// Do what you want with this info...
}
var res = new Response(new ReadableStream({
async start(controller) {
var reader = response.body.getReader();
for (;;) {
var {done, value} = await reader.read();
if (done)
{
progressHandler(total, total)
break
}
loaded += value.byteLength;
progressHandler(loaded, total)
controller.enqueue(value);
}
controller.close();
},
}, {
"status" : response.status,
"statusText" : response.statusText
}));
// Make sure to copy the headers!
// Wasm is very picky with it's headers and it will fail to compile if they are not
// specified correctly.
for (var pair of response.headers.entries()) {
res.headers.set(pair[0], pair[1]);
}
// The response (res) can now be passed to any of the streaming methods as normal
var promise = WebAssembly.instantiateStreaming(res)
Building off of various other SO answers, here is what I ended up with.
My solution also has decent fallback for Firefox, which doesn't yet have proper stream support. I opted for falling back to a good old XHR and WebAssembly.Instantiate there, as I really do want to show a loading bar, even if it means slightly slower startup just on FF.
async function fetchWithProgress(path, progress) {
const response = await fetch(path);
// May be incorrect if compressed
const contentLength = response.headers.get("Content-Length");
const total = parseInt(contentLength, 10);
let bytesLoaded = 0;
const ts = new TransformStream({
transform (chunk, ctrl) {
bytesLoaded += chunk.byteLength;
progress(bytesLoaded / total);
ctrl.enqueue(chunk)
}
});
return new Response(response.body.pipeThrough(ts), response);
}
async function initWasmWithProgress(wasmFile, importObject, progress) {
if (typeof TransformStream === "function" && ReadableStream.prototype.pipeThrough) {
let done = false;
const response = await fetchWithProgress(wasmFile, function() {
if (!done) {
progress.apply(null, arguments);
}
});
await WebAssembly.InstantiateStreaming(response, importObject);
done = true;
progress(1);
} else {
// xhr fallback, this is slower and doesn't use WebAssembly.InstantiateStreaming,
// but it's only happening on Firefox, and we can probably live with the game
// starting slightly slower there...
const xhr = new XMLHttpRequest();
await new Promise(function(resolve, reject) {
xhr.open("GET", wasmFile);
xhr.responseType = "arraybuffer";
xhr.onload = resolve;
xhr.onerror = reject;
xhr.onprogress = e => progress(e.loaded / e.total);
xhr.send();
});
await WebAssembly.Instantiate(xhr.response, importObject);
progress(1);
}
}
const wasmFile = "./wasm.wasm";
await initWasmWithProgress(wasmFile, importObject, p => console.log(`progress: ${p*100}%`));
console.log("Initialized wasm");
Related
I'm struggling to find documentation or examples of implementing an upload progress indicator using fetch.
This is the only reference I've found so far, which states:
Progress events are a high level feature that won't arrive in fetch for now. You can create your own by looking at the Content-Length header and using a pass-through stream to monitor the bytes received.
This means you can explicitly handle responses without a Content-Length differently. And of course, even if Content-Length is there it can be a lie. With streams you can handle these lies however you want.
How would I write "a pass-through stream to monitor the bytes" sent? If it makes any sort of difference, I'm trying to do this to power image uploads from the browser to Cloudinary.
NOTE: I am not interested in the Cloudinary JS library, as it depends on jQuery and my app does not. I'm only interested in the stream processing necessary to do this with native javascript and Github's fetch polyfill.
https://fetch.spec.whatwg.org/#fetch-api
Streams are starting to land in the web platform (https://jakearchibald.com/2016/streams-ftw/) but it's still early days.
Soon you'll be able to provide a stream as the body of a request, but the open question is whether the consumption of that stream relates to bytes uploaded.
Particular redirects can result in data being retransmitted to the new location, but streams cannot "restart". We can fix this by turning the body into a callback which can be called multiple times, but we need to be sure that exposing the number of redirects isn't a security leak, since it'd be the first time on the platform JS could detect that.
Some are questioning whether it even makes sense to link stream consumption to bytes uploaded.
Long story short: this isn't possible yet, but in future this will be handled either by streams, or some kind of higher-level callback passed into fetch().
My solution is to use axios, which supports this pretty well:
axios.request({
method: "post",
url: "/aaa",
data: myData,
onUploadProgress: (p) => {
console.log(p);
//this.setState({
//fileprogress: p.loaded / p.total
//})
}
}).then (data => {
//this.setState({
//fileprogress: 1.0,
//})
})
I have example for using this in react on github.
fetch: not possible yet
It sounds like upload progress will eventually be possible with fetch once it supports a ReadableStream as the body. This is currently not implemented, but it's in progress. I think the code will look something like this:
warning: this code does not work yet, still waiting on browsers to support it
async function main() {
const blob = new Blob([new Uint8Array(10 * 1024 * 1024)]); // any Blob, including a File
const progressBar = document.getElementById("progress");
const totalBytes = blob.size;
let bytesUploaded = 0;
const blobReader = blob.stream().getReader();
const progressTrackingStream = new ReadableStream({
async pull(controller) {
const result = await blobReader.read();
if (result.done) {
console.log("completed stream");
controller.close();
return;
}
controller.enqueue(result.value);
bytesUploaded += result.value.byteLength;
console.log("upload progress:", bytesUploaded / totalBytes);
progressBar.value = bytesUploaded / totalBytes;
},
});
const response = await fetch("https://httpbin.org/put", {
method: "PUT",
headers: {
"Content-Type": "application/octet-stream"
},
body: progressTrackingStream,
});
console.log("success:", response.ok);
}
main().catch(console.error);
upload: <progress id="progress" />
workaround: good ol' XMLHttpRequest
Instead of fetch(), it's possible to use XMLHttpRequest to track upload progress — the xhr.upload object emits a progress event.
async function main() {
const blob = new Blob([new Uint8Array(10 * 1024 * 1024)]); // any Blob, including a File
const uploadProgress = document.getElementById("upload-progress");
const downloadProgress = document.getElementById("download-progress");
const xhr = new XMLHttpRequest();
const success = await new Promise((resolve) => {
xhr.upload.addEventListener("progress", (event) => {
if (event.lengthComputable) {
console.log("upload progress:", event.loaded / event.total);
uploadProgress.value = event.loaded / event.total;
}
});
xhr.addEventListener("progress", (event) => {
if (event.lengthComputable) {
console.log("download progress:", event.loaded / event.total);
downloadProgress.value = event.loaded / event.total;
}
});
xhr.addEventListener("loadend", () => {
resolve(xhr.readyState === 4 && xhr.status === 200);
});
xhr.open("PUT", "https://httpbin.org/put", true);
xhr.setRequestHeader("Content-Type", "application/octet-stream");
xhr.send(blob);
});
console.log("success:", success);
}
main().catch(console.error);
upload: <progress id="upload-progress"></progress><br/>
download: <progress id="download-progress"></progress>
Update: as the accepted answer says it's impossible now. but the below code handled our problem for sometime. I should add that at least we had to switch to using a library that is based on XMLHttpRequest.
const response = await fetch(url);
const total = Number(response.headers.get('content-length'));
const reader = response.body.getReader();
let bytesReceived = 0;
while (true) {
const result = await reader.read();
if (result.done) {
console.log('Fetch complete');
break;
}
bytesReceived += result.value.length;
console.log('Received', bytesReceived, 'bytes of data so far');
}
thanks to this link: https://jakearchibald.com/2016/streams-ftw/
As already explained in the other answers, it is not possible with fetch, but with XHR. Here is my a-little-more-compact XHR solution:
const uploadFiles = (url, files, onProgress) =>
new Promise((resolve, reject) => {
const xhr = new XMLHttpRequest();
xhr.upload.addEventListener('progress', e => onProgress(e.loaded / e.total));
xhr.addEventListener('load', () => resolve({ status: xhr.status, body: xhr.responseText }));
xhr.addEventListener('error', () => reject(new Error('File upload failed')));
xhr.addEventListener('abort', () => reject(new Error('File upload aborted')));
xhr.open('POST', url, true);
const formData = new FormData();
Array.from(files).forEach((file, index) => formData.append(index.toString(), file));
xhr.send(formData);
});
Works with one or multiple files.
If you have a file input element like this:
<input type="file" multiple id="fileUpload" />
Call the function like this:
document.getElementById('fileUpload').addEventListener('change', async e => {
const onProgress = progress => console.log('Progress:', `${Math.round(progress * 100)}%`);
const response = await uploadFiles('/api/upload', e.currentTarget.files, onProgress);
if (response.status >= 400) {
throw new Error(`File upload failed - Status code: ${response.status}`);
}
console.log('Response:', response.body);
}
Also works with the e.dataTransfer.files you get from a drop event when building a file drop zone.
I don't think it's possible. The draft states:
it is currently lacking [in comparison to XHR] when it comes to request progression
(old answer):
The first example in the Fetch API chapter gives some insight on how to :
If you want to receive the body data progressively:
function consume(reader) {
var total = 0
return new Promise((resolve, reject) => {
function pump() {
reader.read().then(({done, value}) => {
if (done) {
resolve()
return
}
total += value.byteLength
log(`received ${value.byteLength} bytes (${total} bytes in total)`)
pump()
}).catch(reject)
}
pump()
})
}
fetch("/music/pk/altes-kamuffel.flac")
.then(res => consume(res.body.getReader()))
.then(() => log("consumed the entire body without keeping the whole thing in memory!"))
.catch(e => log("something went wrong: " + e))
Apart from their use of the Promise constructor antipattern, you can see that response.body is a Stream from which you can read byte by byte using a Reader, and you can fire an event or do whatever you like (e.g. log the progress) for every of them.
However, the Streams spec doesn't appear to be quite finished, and I have no idea whether this already works in any fetch implementation.
with fetch: now possible with Chrome >= 105 🎉
How to:
https://developer.chrome.com/articles/fetch-streaming-requests/
Currently not supported by other browsers (maybe that will be the case when you read this, please edit my answer accordingly)
Feature detection (source)
const supportsRequestStreams = (() => {
let duplexAccessed = false;
const hasContentType = new Request('', {
body: new ReadableStream(),
method: 'POST',
get duplex() {
duplexAccessed = true;
return 'half';
},
}).headers.has('Content-Type');
return duplexAccessed && !hasContentType;
})();
HTTP >= 2 required
The fetch will be rejected if the connection is HTTP/1.x.
Since none of the answers solve the problem.
Just for implementation sake, you can detect the upload speed with some small initial chunk of known size and the upload time can be calculated with content-length/upload-speed. You can use this time as estimation.
A possible workaround would be to utilize new Request() constructor then check Request.bodyUsed Boolean attribute
The bodyUsed attribute’s getter must return true if disturbed, and
false otherwise.
to determine if stream is distributed
An object implementing the Body mixin is said to be disturbed if
body is non-null and its stream is disturbed.
Return the fetch() Promise from within .then() chained to recursive .read() call of a ReadableStream when Request.bodyUsed is equal to true.
Note, the approach does not read the bytes of the Request.body as the bytes are streamed to the endpoint. Also, the upload could complete well before any response is returned in full to the browser.
const [input, progress, label] = [
document.querySelector("input")
, document.querySelector("progress")
, document.querySelector("label")
];
const url = "/path/to/server/";
input.onmousedown = () => {
label.innerHTML = "";
progress.value = "0"
};
input.onchange = (event) => {
const file = event.target.files[0];
const filename = file.name;
progress.max = file.size;
const request = new Request(url, {
method: "POST",
body: file,
cache: "no-store"
});
const upload = settings => fetch(settings);
const uploadProgress = new ReadableStream({
start(controller) {
console.log("starting upload, request.bodyUsed:", request.bodyUsed);
controller.enqueue(request.bodyUsed);
},
pull(controller) {
if (request.bodyUsed) {
controller.close();
}
controller.enqueue(request.bodyUsed);
console.log("pull, request.bodyUsed:", request.bodyUsed);
},
cancel(reason) {
console.log(reason);
}
});
const [fileUpload, reader] = [
upload(request)
.catch(e => {
reader.cancel();
throw e
})
, uploadProgress.getReader()
];
const processUploadRequest = ({value, done}) => {
if (value || done) {
console.log("upload complete, request.bodyUsed:", request.bodyUsed);
// set `progress.value` to `progress.max` here
// if not awaiting server response
// progress.value = progress.max;
return reader.closed.then(() => fileUpload);
}
console.log("upload progress:", value);
progress.value = +progress.value + 1;
return reader.read().then(result => processUploadRequest(result));
};
reader.read().then(({value, done}) => processUploadRequest({value,done}))
.then(response => response.text())
.then(text => {
console.log("response:", text);
progress.value = progress.max;
input.value = "";
})
.catch(err => console.log("upload error:", err));
}
I fished around for some time about this and just for everyone who may come across this issue too here is my solution:
const form = document.querySelector('form');
const status = document.querySelector('#status');
// When form get's submitted.
form.addEventListener('submit', async function (event) {
// cancel default behavior (form submit)
event.preventDefault();
// Inform user that the upload has began
status.innerText = 'Uploading..';
// Create FormData from form
const formData = new FormData(form);
// Open request to origin
const request = await fetch('https://httpbin.org/post', { method: 'POST', body: formData });
// Get amount of bytes we're about to transmit
const bytesToUpload = request.headers.get('content-length');
// Create a reader from the request body
const reader = request.body.getReader();
// Cache how much data we already send
let bytesUploaded = 0;
// Get first chunk of the request reader
let chunk = await reader.read();
// While we have more chunks to go
while (!chunk.done) {
// Increase amount of bytes transmitted.
bytesUploaded += chunk.value.length;
// Inform user how far we are
status.innerText = 'Uploading (' + (bytesUploaded / bytesToUpload * 100).toFixed(2) + ')...';
// Read next chunk
chunk = await reader.read();
}
});
const req = await fetch('./foo.json');
const total = Number(req.headers.get('content-length'));
let loaded = 0;
for await(const {length} of req.body.getReader()) {
loaded = += length;
const progress = ((loaded / total) * 100).toFixed(2); // toFixed(2) means two digits after floating point
console.log(`${progress}%`); // or yourDiv.textContent = `${progress}%`;
}
Key part is ReadableStream ≪obj_response.body≫.
Sample:
let parse=_/*result*/=>{
console.log(_)
//...
return /*cont?*/_.value?true:false
}
fetch('').
then(_=>( a/*!*/=_.body.getReader(), b/*!*/=z=>a.read().then(parse).then(_=>(_?b:z=>z)()), b() ))
You can test running it on a huge page eg https://html.spec.whatwg.org/ and https://html.spec.whatwg.org/print.pdf . CtrlShiftJ and load the code in.
(Tested on Chrome.)
I'm trying to do a javascript fetch to grab a video file using fetch. I am able to get the file downloaded and get the blob URL, but I can't seem to get the progress while its downloading.
I tried this:
let response = await fetch('test.mp4');
const reader = response.body.getReader();
const contentLength=response.headers.get('Content-Length');
let receivedLength = 0;
d=document.getElementById('progress_bar');
while(true)
{
const {done, value} = await reader.read();
if (done)
{
break;
}
receivedLength += value.length;
d.innerHTML="Bytes loaded:"+receivedLength;
}
const blob = await response.blob();
var vid=URL.createObjectURL(blob);
The problem is that I get "Response.blob: Body has already been consumed". I see that the reader.read() is probably doing that. How do I just get the amount of data received and then get a blob URL at the end of it?
Thanks.
Update:
My first attempt collected the chunks as they downloaded and them put them back together, with a large (2-3x the size of the video) memory footprint. Using a ReadableStream has a much lower memory footprint (memory usage hovers around 150MB for a 1.1GB mkv). Code largely adapted from the snippet here with only minimal modifications from me:
https://github.com/AnthumChris/fetch-progress-indicators/blob/master/fetch-basic/supported-browser.js
<div id="progress_bar"></div>
<video id="video_player"></video>
const elProgress = document.getElementById('progress_bar'),
player = document.getElementById('video_player');
function getVideo2() {
let contentType = 'video/mp4';
fetch('$pathToVideo.mp4')
.then(response => {
const contentEncoding = response.headers.get('content-encoding');
const contentLength = response.headers.get(contentEncoding ? 'x-file-size' : 'content-length');
contentType = response.headers.get('content-type') || contentType;
if (contentLength === null) {
throw Error('Response size header unavailable');
}
const total = parseInt(contentLength, 10);
let loaded = 0;
return new Response(
new ReadableStream({
start(controller) {
const reader = response.body.getReader();
read();
function read() {
reader.read().then(({done, value}) => {
if (done) {
controller.close();
return;
}
loaded += value.byteLength;
progress({loaded, total})
controller.enqueue(value);
read();
}).catch(error => {
console.error(error);
controller.error(error)
})
}
}
})
);
})
.then(response => response.blob())
.then(blob => {
let vid = URL.createObjectURL(blob);
player.style.display = 'block';
player.type = contentType;
player.src = vid;
elProgress.innerHTML += "<br /> Press play!";
})
.catch(error => {
console.error(error);
})
}
function progress({loaded, total}) {
elProgress.innerHTML = Math.round(loaded / total * 100) + '%';
}
First Attempt (worse, suitable for smaller files)
My original approach. For a 1.1GB mkv, the memory usage creeps up to 1.3GB while the file is downloading, then spikes to about 3.5Gb when the chunks are being combined. Once the video starts playing, the tab's memory usage goes back down to ~200MB but Chrome's overall usage stays over 1GB.
Instead of calling response.blob() to get the blob, you can construct the blob yourself by accumulating each chunk of the video (value). Adapted from the exmaple here: https://javascript.info/fetch-progress#0d0g7tutne
//...
receivedLength += value.length;
chunks.push(value);
//...
// ==> put the chunks into a Uint8Array that the Blob constructor can use
let Uint8Chunks = new Uint8Array(receivedLength), position = 0;
for (let chunk of chunks) {
Uint8Chunks.set(chunk, position);
position += chunk.length;
}
// ==> you may want to get the mimetype from the content-type header
const blob = new Blob([Uint8Chunks], {type: 'video/mp4'})
In my app, I have an hour-long audio file that's entirely sound effects. Unfortunately I do need them all - they're species-specific sounds, so I can't cut any of them out. They were separate before, but I audiosprite'd them all into one large file.
The export file is about 20MB compressed, but it's still a large download for users with a slow connection. I need this file to be in an AudioBuffer, since I'm seeking to sections of an audioSprite and using loopStart/loopEnd to only loop that section. I more or less need the whole thing downloaded before playback can start, because the requested species are randomly picked when the app starts. They could be looking for sounds at the start of the file, or at the very end.
What I'm wondering is, if I were to split this file in fourths, could I load them in in parallel, and stitch them into the full AudioBuffer once loading finishes? I'm guessing I'd be merging multiple arrays, but only performing decodeAudioData() once? Requesting ~100 separate files (too many) was what brought me to audiosprites in the first place, but I'm wondering if there's a way to leverage some amount of async loading to lower the time it takes. I thought about having four <audio> elements and using createMediaElementSource() to load them, but my understanding is that I can't (?) turn a MediaElementSource into an AudioBuffer.
Consider playing the files immediately in chucks instead of waiting for the entire file to download. You could do this with the Streams API and:
Queuing chunks with the MediaSource Extensions (MSE) API and switching between buffers.
Playing back decoded PCM audio with the Web Audio API and AudioBuffer.
See examples for low-latency audio playback of file chunks as they are received.
I think in principle you can. Just download each chunk as an ArrayBuffer, concatenate all of the chunks together and send that to decodeAudioData.
But if you're on a slow link, I'm not sure how downloading in parallel will help.
Edit: this code is functional, but on occasion produces really nasty audio glitches, so I don't recommend using it without further testing. I'm leaving it here in case it helps someone else figure out working with Uint8Arrays.
So here's a basic version of it, basically what Raymond described. I haven't tested this with a split version of the large file yet, so I don't know if it improves the load speed at all, but it works. The JS is below, but if you want to test it yourself, here's the pen.
// mp3 link is from: https://codepen.io/SitePoint/pen/JRaLVR
(function () {
'use strict';
const context = new AudioContext();
let bufferList = [];
// change the urlList for your needs
const URL = 'https://s3-us-west-2.amazonaws.com/s.cdpn.io/123941/Yodel_Sound_Effect.mp3';
const urlList = [URL, URL, URL, URL, URL, URL];
const loadButton = document.querySelector('.loadFile');
const playButton = document.querySelector('.playFile');
loadButton.onclick = () => loadAllFiles(urlList, loadProgress);
function play(audioBuffer) {
const source = context.createBufferSource();
source.buffer = audioBuffer;
source.connect(context.destination);
source.start();
}
// concatenates all the buffers into one collected ArrayBuffer
function concatBufferList(buflist, len) {
let tmp = new Uint8Array(len);
let pos = 0;
for (let i = 0; i < buflist.length; i++) {
tmp.set(new Uint8Array(buflist[i]), pos);
pos += buflist[i].byteLength;
}
return tmp.buffer;
}
function loadAllFiles(list, onProgress) {
let fileCount = 0;
let fileSize = 0;
for (let i = 0; i < list.length; i++) {
loadFileXML(list[i], loadProgress, i).then(e => {
bufferList[i] = e.buf;
fileSize += e.size;
fileCount++;
if (fileCount == bufferList.length) {
let b = concatBufferList(bufferList, fileSize);
context.decodeAudioData(b).then(audioBuffer => {
playButton.disabled = false;
playButton.onclick = () => play(audioBuffer);
}).catch(error => console.log(error));
}
});
}
}
// adapted from petervdn's audiobuffer-load on npm
function loadFileXML(url, onProgress, index) {
return new Promise((resolve, reject) => {
const request = new XMLHttpRequest();
request.open('GET', url, true);
request.responseType = 'arraybuffer';
if (onProgress) {
request.onprogress = event => {
onProgress(event.loaded / event.total);
};
}
request.onload = () => {
if (request.status === 200) {
const fileSize = request.response.byteLength;
resolve({
buf: request.response,
size: fileSize
});
}
else {
reject(`Error loading '${url}' (${request.status})`);
}
};
request.onerror = error => {
reject(error);
};
request.send();
});
}
function loadProgress(e) {
console.log("Progress: "+e);
}
}());
I m trying to create a resumable file downloader based only on the client side. The server is beyond my control and on an ajax request i get the file which is a very very big binary data file (100mgb).
After long research i have understood that i cannot use the xhr element to stream the response and i cannot read chunks of the file before it is completely cached... So I looked some more and found the fetch api which is quite new but i cannot find any proper documentation or tutorials. I would very much appreciate if someone could illustrate a simple example of fetching some url and reading the stream chunk by chunk
Here's an example from this blog post:
fetch('/big-data.csv').then(function(response) {
var reader = response.body.getReader();
var partialCell = '';
var returnNextCell = false;
var returnCellAfter = "Jake";
var decoder = new TextDecoder();
function search() {
return reader.read().then(function(result) {
partialCell += decoder.decode(result.value || new Uint8Array, {
stream: !result.done
});
// Split what we have into CSV 'cells'
var cellBoundry = /(?:,|\r\n)/;
var completeCells = partialCell.split(cellBoundry);
if (!result.done) {
// Last cell is likely incomplete
// Keep hold of it for next time
partialCell = completeCells[completeCells.length - 1];
// Remove it from our complete cells
completeCells = completeCells.slice(0, -1);
}
for (var cell of completeCells) {
cell = cell.trim();
if (returnNextCell) {
reader.cancel("No more reading needed.");
return cell;
}
if (cell === returnCellAfter) {
returnNextCell = true;
}
}
if (result.done) {
throw Error("Could not find value after " + returnCellAfter);
}
return search();
})
}
return search();
}).then(function(result) {
console.log("Got the result! It's '" + result + "'");
}).catch(function(err) {
console.log(err.message);
});
Note that streaming responses aren't supported yet in all browsers, check the compatibility table on MDN.
Is there a way to calculate the MD5 hash of a file before the upload to the server using Javascript?
While there are JS implementations of the MD5 algorithm, older browsers are generally unable to read files from the local filesystem.
I wrote that in 2009. So what about new browsers?
With a browser that supports the FileAPI, you can read the contents of a file - the user has to have selected it, either with an <input> element or drag-and-drop. As of Jan 2013, here's how the major browsers stack up:
FF 3.6 supports FileReader, FF4 supports even more file based functionality
Chrome has supported the FileAPI since version 7.0.517.41
Internet Explorer 10 has partial FileAPI support
Opera 11.10 has partial support for FileAPI
Safari - I couldn't find a good official source for this, but this site suggests partial support from 5.1, full support for 6.0. Another article reports some inconsistencies with the older Safari versions
How?
See the answer below by Benny Neugebauer which uses the MD5 function of CryptoJS
I've made a library that implements incremental md5 in order to hash large files efficiently.
Basically you read a file in chunks (to keep memory low) and hash it incrementally.
You got basic usage and examples in the readme.
Be aware that you need HTML5 FileAPI, so be sure to check for it.
There is a full example in the test folder.
https://github.com/satazor/SparkMD5
it is pretty easy to calculate the MD5 hash using the MD5 function of CryptoJS and the HTML5 FileReader API. The following code snippet shows how you can read the binary data and calculate the MD5 hash from an image that has been dragged into your Browser:
var holder = document.getElementById('holder');
holder.ondragover = function() {
return false;
};
holder.ondragend = function() {
return false;
};
holder.ondrop = function(event) {
event.preventDefault();
var file = event.dataTransfer.files[0];
var reader = new FileReader();
reader.onload = function(event) {
var binary = event.target.result;
var md5 = CryptoJS.MD5(binary).toString();
console.log(md5);
};
reader.readAsBinaryString(file);
};
I recommend to add some CSS to see the Drag & Drop area:
#holder {
border: 10px dashed #ccc;
width: 300px;
height: 300px;
}
#holder.hover {
border: 10px dashed #333;
}
More about the Drag & Drop functionality can be found here: File API & FileReader
I tested the sample in Google Chrome Version 32.
The following snippet shows an example, which can archive a throughput of 400 MB/s while reading and hashing the file.
It is using a library called hash-wasm, which is based on WebAssembly and calculates the hash faster than js-only libraries. As of 2020, all modern browsers support WebAssembly.
const chunkSize = 64 * 1024 * 1024;
const fileReader = new FileReader();
let hasher = null;
function hashChunk(chunk) {
return new Promise((resolve, reject) => {
fileReader.onload = async(e) => {
const view = new Uint8Array(e.target.result);
hasher.update(view);
resolve();
};
fileReader.readAsArrayBuffer(chunk);
});
}
const readFile = async(file) => {
if (hasher) {
hasher.init();
} else {
hasher = await hashwasm.createMD5();
}
const chunkNumber = Math.floor(file.size / chunkSize);
for (let i = 0; i <= chunkNumber; i++) {
const chunk = file.slice(
chunkSize * i,
Math.min(chunkSize * (i + 1), file.size)
);
await hashChunk(chunk);
}
const hash = hasher.digest();
return Promise.resolve(hash);
};
const fileSelector = document.getElementById("file-input");
const resultElement = document.getElementById("result");
fileSelector.addEventListener("change", async(event) => {
const file = event.target.files[0];
resultElement.innerHTML = "Loading...";
const start = Date.now();
const hash = await readFile(file);
const end = Date.now();
const duration = end - start;
const fileSizeMB = file.size / 1024 / 1024;
const throughput = fileSizeMB / (duration / 1000);
resultElement.innerHTML = `
Hash: ${hash}<br>
Duration: ${duration} ms<br>
Throughput: ${throughput.toFixed(2)} MB/s
`;
});
<script src="https://cdn.jsdelivr.net/npm/hash-wasm"></script>
<!-- defines the global `hashwasm` variable -->
<input type="file" id="file-input">
<div id="result"></div>
HTML5 + spark-md5 and Q
Assuming your'e using a modern browser (that supports HTML5 File API), here's how you calculate the MD5 Hash of a large file (it will calculate the hash on variable chunks)
function calculateMD5Hash(file, bufferSize) {
var def = Q.defer();
var fileReader = new FileReader();
var fileSlicer = File.prototype.slice || File.prototype.mozSlice || File.prototype.webkitSlice;
var hashAlgorithm = new SparkMD5();
var totalParts = Math.ceil(file.size / bufferSize);
var currentPart = 0;
var startTime = new Date().getTime();
fileReader.onload = function(e) {
currentPart += 1;
def.notify({
currentPart: currentPart,
totalParts: totalParts
});
var buffer = e.target.result;
hashAlgorithm.appendBinary(buffer);
if (currentPart < totalParts) {
processNextPart();
return;
}
def.resolve({
hashResult: hashAlgorithm.end(),
duration: new Date().getTime() - startTime
});
};
fileReader.onerror = function(e) {
def.reject(e);
};
function processNextPart() {
var start = currentPart * bufferSize;
var end = Math.min(start + bufferSize, file.size);
fileReader.readAsBinaryString(fileSlicer.call(file, start, end));
}
processNextPart();
return def.promise;
}
function calculate() {
var input = document.getElementById('file');
if (!input.files.length) {
return;
}
var file = input.files[0];
var bufferSize = Math.pow(1024, 2) * 10; // 10MB
calculateMD5Hash(file, bufferSize).then(
function(result) {
// Success
console.log(result);
},
function(err) {
// There was an error,
},
function(progress) {
// We get notified of the progress as it is executed
console.log(progress.currentPart, 'of', progress.totalParts, 'Total bytes:', progress.currentPart * bufferSize, 'of', progress.totalParts * bufferSize);
});
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/q.js/1.4.1/q.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/spark-md5/2.0.2/spark-md5.min.js"></script>
<div>
<input type="file" id="file"/>
<input type="button" onclick="calculate();" value="Calculate" class="btn primary" />
</div>
You need to to use FileAPI. It is available in the latest FF & Chrome, but not IE9.
Grab any md5 JS implementation suggested above. I've tried this and abandoned it because JS was too slow (minutes on large image files). Might revisit it if someone rewrites MD5 using typed arrays.
Code would look something like this:
HTML:
<input type="file" id="file-dialog" multiple="true" accept="image/*">
JS (w JQuery)
$("#file-dialog").change(function() {
handleFiles(this.files);
});
function handleFiles(files) {
for (var i=0; i<files.length; i++) {
var reader = new FileReader();
reader.onload = function() {
var md5 = binl_md5(reader.result, reader.result.length);
console.log("MD5 is " + md5);
};
reader.onerror = function() {
console.error("Could not read the file");
};
reader.readAsBinaryString(files.item(i));
}
}
Apart from the impossibility to get
file system access in JS, I would not
put any trust at all in a
client-generated checksum. So
generating the checksum on the server
is mandatory in any case. – Tomalak
Apr 20 '09 at 14:05
Which is useless in most cases. You want the MD5 computed at client side, so that you can compare it with the code recomputed at server side and conclude the upload went wrong if they differ. I have needed to do that in applications working with large files of scientific data, where receiving uncorrupted files were key. My cases was simple, cause users had the MD5 already computed from their data analysis tools, so I just needed to ask it to them with a text field.
If sha256 is also fine:
async sha256(file: File) {
// get byte array of file
let buffer = await file.arrayBuffer();
// hash the message
const hashBuffer = await crypto.subtle.digest('SHA-256', buffer);
// convert ArrayBuffer to Array
const hashArray = Array.from(new Uint8Array(hashBuffer));
// convert bytes to hex string
const hashHex = hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
return hashHex;
}
To get the hash of files, there are a lot of options. Normally the problem is that it's really slow to get the hash of big files.
I created a little library that get the hash of files, with the 64kb of the start of the file and the 64kb of the end of it.
Live example: http://marcu87.github.com/hashme/ and library: https://github.com/marcu87/hashme
hope you have found a good solution by now. If not, the solution below is an ES6 promise implementation based on js-spark-md5
import SparkMD5 from 'spark-md5';
// Read in chunks of 2MB
const CHUCK_SIZE = 2097152;
/**
* Incrementally calculate checksum of a given file based on MD5 algorithm
*/
export const checksum = (file) =>
new Promise((resolve, reject) => {
let currentChunk = 0;
const chunks = Math.ceil(file.size / CHUCK_SIZE);
const blobSlice =
File.prototype.slice ||
File.prototype.mozSlice ||
File.prototype.webkitSlice;
const spark = new SparkMD5.ArrayBuffer();
const fileReader = new FileReader();
const loadNext = () => {
const start = currentChunk * CHUCK_SIZE;
const end =
start + CHUCK_SIZE >= file.size ? file.size : start + CHUCK_SIZE;
// Selectively read the file and only store part of it in memory.
// This allows client-side applications to process huge files without the need for huge memory
fileReader.readAsArrayBuffer(blobSlice.call(file, start, end));
};
fileReader.onload = e => {
spark.append(e.target.result);
currentChunk++;
if (currentChunk < chunks) loadNext();
else resolve(spark.end());
};
fileReader.onerror = () => {
return reject('Calculating file checksum failed');
};
loadNext();
});
There is a couple scripts out there on the internet to create an MD5 Hash.
The one from webtoolkit is good, http://www.webtoolkit.info/javascript-md5.html
Although, I don't believe it will have access to the local filesystem as that access is limited.
This is another hash-wasm example, but using the streams API, instead of having to set FileReader:
async function calculateSHA1(file: File) {
const hasher = await createSHA1()
const hasherStream = new WritableStream<Uint8Array>({
start: () => {
hasher.init()
// you can set UI state here also
},
write: chunk => {
hasher.update(chunk)
// you can set UI state here also
},
close: () => {
// you can set UI state here also
},
})
await file.stream().pipeTo(hasherStream)
return hasher.digest('hex')
}
I don't believe there is a way in javascript to access the contents of a file upload. So you therefore cannot look at the file contents to generate an MD5 sum.
You can however send the file to the server, which can then send an MD5 sum back or send the file contents back .. but that's a lot of work and probably not worthwhile for your purposes.