I'm working on a project that utilizes WebRTC for file transfers, recently someone reported an issue saying that transfers end prematurely for bigger files. I've found the problem, and my solution to that problem was to rely on the bufferedamountlow event to coordinate the sending of chunks. I've also stopped closing the connection when the sender thinks it's complete.
For some reason, though, in Safari that event does not fire.
Here is the relevant code:
const connection = new RTCPeerConnection(rtcConfiguration);
const channel = connection.createDataChannel('sendDataChannel');
channel.binaryType = 'arraybuffer';
channel.addEventListener('open', () => {
const fileReader = new FileReader();
let offset = 0;
const nextSlice = (currentOffset: number) => {
// Do asynchronous thing with FileReader, that will result in
// channel.send(buffer) getting called.
// Also, offset gets increased by 16384 (the size of the buffer).
};
channel.bufferedAmountLowThreshold = 0;
channel.addEventListener('bufferedamountlow', () => nextSlice(offset));
nextSlice(0);
});
The longer version of my code is available here.
While researching the issue, I've realized that on Safari, my connection.stcp is undefined. (Since I've switched to connection.sctp.maxMessageSize instead of 16384 for my buffer size.) I would assume the problem is related to that.
What could be the cause for this problem? Let me add that on Chrome and Firefox everything works just fine without any issues whatsoever.
The bufferedamountlow event is not required for the proper function of my code, I would like for it to work, though, to get more precise estimates of current progress and speed on the sending end of the file transfer.
After some investigation, it comes to me that Safari has issues with 0 as a value for the bufferedAmountLowThreshold property.
When set to a non-zero value, the code functions properly.
Checking the bufferedAmount inside of the nextSlice function also increases the speed at which the chunks are being sent:
const bufferSize = connection.sctp?.maxMessageSize || 65535;
channel.addEventListener('open', () => {
const fileReader = new FileReader();
let offset = 0;
const nextSlice = (currentOffset: number) => {
const slice = file.slice(offset, currentOffset + bufferSize);
fileReader.readAsArrayBuffer(slice);
};
fileReader.addEventListener('load', e => {
const buffer = e.target.result as ArrayBuffer;
try {
channel.send(buffer);
} catch {
// Deal with failure...
}
offset += buffer.byteLength;
if (channel.bufferedAmount < bufferSize / 2) {
nextSlice(offset);
}
});
channel.bufferedAmountLowThreshold = bufferSize / 2;
channel.addEventListener('bufferedamountlow', () => nextSlice(offset));
nextSlice(0);
});
Related
I have a client attempting to send images to a server over BLE.
Client Code
//BoilerPlate to setup connection and whatnot
sendFile.onclick = async () => {
var fileList = document.getElementById("myFile").files;
var fileReader = new FileReader();
if (fileReader && fileList && fileList.length) {
fileReader.readAsArrayBuffer(fileList[0]);
fileReader.onload = function () {
var imageData = fileReader.result;
//Server doesn't get data if I don't do this chunking
imageData = imageData.slice(0,512);
const base64String = _arrayBufferToBase64(imageData);
document.getElementById("ItemPreview").src = "data:image/jpeg;base64," + base64String;
sendCharacteristic.writeValue(imageData);
};
}
};
Server Code
MyCharacteristic.prototype.onWriteRequest = function(data, offset, withoutResponse, callback) {
//It seems this will not print out if Server sends over 512B.
console.log(this._value);
};
My goal is to send small images (Just ~6kb)...These are still so small that'd I'd still prefer to use BLE over a BT Serial Connection. Is the only way this is possible is to perform some chunking and then streaming the chunks over?
Current 'Chunking' Code
const MAX_LENGTH = 512;
for (let i=0; i<bytes.byteLength; i+= MAX_LENGTH) {
const end = (i+MAX_LENGTH > bytes.byteLength) ? bytes.byteLength : i+MAX_LENGTH;
const chunk = bytes.slice(i, end);
sendCharacteristic.writeValue(chunk);
await sleep(1000);
}
The above code works, however it sleeps in between sends. I'd rather not do this because there's no guarantee a previous packet will be finished sending and I could sleep longer than needed.
I'm also perplexed on how the server code would then know the client has finished sending all bytes and can then assemble them. Is there some kind of pattern to achieving this?
BLE characteristic values can only be 512 bytes, so yes the common way to send larger data is to split it into multiple chunks. Use "Write Without Response" for best performance (MTU-3 must be at least as big as your chunk).
I am trying to develop a simple game using nw.js (node.js + chromium page).
<canvas width="1200" height="800" id="main"></canvas>
<script>
var Mouse = {x: 0, y: 0, fire: false};
(async function() {
"use strict";
const reload = 25;
var ireload = 0;
const audioCtx = new AudioContext();
let fire = await fetch('shotgun.mp3');
let bgMusic = await fetch('hard.mp3');
fire = await fire.arrayBuffer();
bgMusic = await bgMusic.arrayBuffer();
const bgMdecoded = await audioCtx.decodeAudioData(bgMusic);
const fireDecoded = await audioCtx.decodeAudioData(fire);
const bgM = audioCtx.createBufferSource();
bgM.buffer = bgMdecoded;
bgM.loop = true;
bgM.connect(audioCtx.destination)
bgM.start(0);
let shot = audioCtx.createBufferSource();
shot.buffer = fireDecoded;
shot.connect(audioCtx.destination);
document.getElementById('main').onmousedown = function(e) {
Mouse.x = e.layerX;
Mouse.y = e.layerY;
Mouse.fire = true;
}
function main(tick) {
var dt = lastTick - tick;
lastTick = tick;
///take fire
if(--ireload < 0 && Mouse.fire) {
ireload = reload;
shot.start(0);
shot = audioCtx.createBufferSource();
shot.buffer = fireDecoded;
shot.connect(audioCtx.destination);
Mouse.fire = false;
}
/* moving objects, rendering on thread with offscreen canvas */
requestAnimationFrame(main);
}
let lastTick = performance.now();
main(lastTick);
})();
</script>
I have stripped code to minimal working example.
The problem is with shooting, everytime I fire (///take fire), the game drops FPS. Exactly the same happens in Kaiido example (https://jsfiddle.net/sLpx6b3v/). This works great, using it in long periods, but playing multiple sounds (the game is shooter) several times, gives framerate drop and after some time GC hiccups.
Less than one year old gaming laptop is dropping 60fps to about 40fps, and about 44fps on Kaidos example.
What could be fixed with sound?
Desired behaviour is no lagging / no gc / no framedrops due to sound. The one in background works well.
I will try AudioWorklet, but it is hard to create one and process instantenous sounds (probably another question).
It is possible to reuse buffer, a bit hackish way.
First create
const audioCtx = new AudioContext();
then fetch resource as usual:
let fire = await fetch('shotgun.mp3');
fire = await fire.arrayBuffer();
fire = await audioCtx.decodeAudioData(fire);
const shot = audioCtx.createBufferSource();
shot.buffer = fire;
shot.loopEnd = 0.00001; //some small value to make it unplayable
shot.start(0);
Then, during event (mouse down in my case):
shot.loopEnd = 1; //that restarts sound and plays in a loop.
Next, after it was played, set again
shot.loopEnd = 0.00001;
In my case, I stop it inside requestAnimationFrame
<canvas width="1200" height="800" id="main"></canvas>
<script>
var Mouse = {x: 0, y: 0, fire: false};
(async function() {
"use strict";
const reload = 25;
var ireload = 0;
const audioCtx = new AudioContext();
let fire = await fetch('shotgun.mp3');
let bgMusic = await fetch('hard.mp3');
fire = await fire.arrayBuffer();
bgMusic = await bgMusic.arrayBuffer();
const bgMdecoded = await audioCtx.decodeAudioData(bgMusic);
const fireDecoded = await audioCtx.decodeAudioData(fire);
const bgM = audioCtx.createBufferSource();
bgM.buffer = bgMdecoded;
bgM.loop = true;
bgM.connect(audioCtx.destination)
bgM.start(0);
let shot = audioCtx.createBufferSource();
shot.buffer = fireDecoded;
shot.connect(audioCtx.destination);
shot.loopEnd = 0.00001; //some small value to make it unplayable
shot.start(0);
document.getElementById('main').onmousedown = function(e) {
Mouse.x = e.layerX;
Mouse.y = e.layerY;
Mouse.fire = true;
}
function main(tick) {
var dt = lastTick - tick;
lastTick = tick;
///take fire
//asuming 60fps, which is true in my case, I stop it after a second
if(reload < -35) {
shot.loopEnd = 0.00001;
}
if(--ireload < 0 && Mouse.fire) {
ireload = reload;
shot.loopEnd = 1; //that restarts sound and plays in a loop.
Mouse.fire = false;
}
/* moving objects, rendering on thread with offscreen canvas */
requestAnimationFrame(main);
}
let lastTick = performance.now();
main(lastTick);
})();
</script>
A note about GC, it is true that it handles audiobuffers quickly, but I have checked, GC fires only when there are allocations, and memory reallocations. Garbage Collector interupts all script execution, so there is jank, lag.
I use memory pool in tandem to this trick, allocating pool at initialisation and then only reuse objects, and get literally no GC after second sweep, it runs once, after initialisation and kicks in second time, after optimisation and reduces unused memory. After that, there is no GC at all. Using typed array and workers gives really performant combo, with 60 fps, crisp sound and no lags at all.
You may think that locking GC is a bad idea. Maybe you are right, but after all, wasting resources only because there is GC doesn't seem like a good idea either.
After tests, AudioWorklets seem to work as intended, but these are heavy, hard to maintain and consumes a lot of resources and writing processor that simply copies inputs to outputs defies it's purpose. PostMessaging system is really heavy process, and you have to either connect the standard way and recreate buffers, or copy it to Worklet space and manage it via shared arrays and atomic operations manually.
You may be interested also in: Writeup about WebAudio design where the author share the concerns and gets exactly the same problem, quote
I know I’m fighting an uphill battle here, but a GC is not what we
need during realtime audio playback.
Keeping a pool of AudioBuffers seems to work, though in my own test
app I still see slow growth to 12MB over time before a major GC wipes,
according to the Chrome profiler.
And Writeup about GC, where memory leaks in JavaScript are described. A quote:
Consider the following scenario:
A sizable set of allocations is performed.
Most of these elements (or all of them) are marked as unreachable (suppose we null a reference pointing to a cache we no
longer need).
No further allocations are performed.
In this scenario, most GCs will not run any further collection passes.
In other words, even though there are unreachable references available
for collection, these are not claimed by the collector. These are not
strictly leaks but still, result in higher-than-usual memory usage.
I am modifying the background position of a background image that a user just uploaded (and so is a raw Data URI) using CSS but the re-rendering starts to lag if the image is >1mb.
The issue is not present for smaller images.
Is there anyway of dealing with this short of trying to optimize renders? I already am not re-rendering unless the background position changes by at least 1% (i.e. the change was 1% or greater between re-renders).
The issue is not present with the same image when I host it and then load the image from a URL.
You can see this issue in a small CodeSandBox that I made. Using the image in the default URL works well, but taking the file and uploading it makes it very laggy.
Could I somehow cache the data? It would seem like it is lagging because the raw data is not cached whereas the image is being cached.
I fiddled with the logic a bit and changed styles.backgroundPositionY to direct style manipulation by ref element. I don't see any big improvements. Uploaded image was indeed lagging while moving it where the default one was nimble.
const imageDivRef = useRef(null);
const onUpload = e => {
new Promise((resolve, reject) => {
const reader = new FileReader();
reader.onload = onLoadEvent => {
const dataUri = onLoadEvent.target.result;
let image = new Image() // try to cache that
image.src = dataUri;
image.onload = () => {
resolve(dataUri);
}
};
reader.readAsDataURL(e.target.files[0])
}).then(dataUri => {
updateBackgroundImage(dataUri)
})
};
const onPan = e => {
const y = position.y;
const delta = (e.deltaY / CONTAINER_HEIGHT) * -100;
requestAnimationFrame(() => {
imageDivRef.current.style.backgroundPositionY = `${Math.min(
Math.max(y + delta, 0),
100)}%`
})
};
const onPanEnd = e => {
const y = position.y;
const delta = (e.deltaY / CONTAINER_HEIGHT) * -100;
requestAnimationFrame(() => {
imageDivRef.current.style.backgroundPositionY = `${Math.min(
Math.max(y + delta, 0),
100)}%`
})
};
There is also one crazy idea. Ditch the backgroundPositionY, make wrapper div with overflow hidden and fixed size, put inside it img element, and manipulate it with transform: translateY(unit) property. I wonder if that can be quicker...
Next js has a image optimization future not a basic solution but...
In my app, I have an hour-long audio file that's entirely sound effects. Unfortunately I do need them all - they're species-specific sounds, so I can't cut any of them out. They were separate before, but I audiosprite'd them all into one large file.
The export file is about 20MB compressed, but it's still a large download for users with a slow connection. I need this file to be in an AudioBuffer, since I'm seeking to sections of an audioSprite and using loopStart/loopEnd to only loop that section. I more or less need the whole thing downloaded before playback can start, because the requested species are randomly picked when the app starts. They could be looking for sounds at the start of the file, or at the very end.
What I'm wondering is, if I were to split this file in fourths, could I load them in in parallel, and stitch them into the full AudioBuffer once loading finishes? I'm guessing I'd be merging multiple arrays, but only performing decodeAudioData() once? Requesting ~100 separate files (too many) was what brought me to audiosprites in the first place, but I'm wondering if there's a way to leverage some amount of async loading to lower the time it takes. I thought about having four <audio> elements and using createMediaElementSource() to load them, but my understanding is that I can't (?) turn a MediaElementSource into an AudioBuffer.
Consider playing the files immediately in chucks instead of waiting for the entire file to download. You could do this with the Streams API and:
Queuing chunks with the MediaSource Extensions (MSE) API and switching between buffers.
Playing back decoded PCM audio with the Web Audio API and AudioBuffer.
See examples for low-latency audio playback of file chunks as they are received.
I think in principle you can. Just download each chunk as an ArrayBuffer, concatenate all of the chunks together and send that to decodeAudioData.
But if you're on a slow link, I'm not sure how downloading in parallel will help.
Edit: this code is functional, but on occasion produces really nasty audio glitches, so I don't recommend using it without further testing. I'm leaving it here in case it helps someone else figure out working with Uint8Arrays.
So here's a basic version of it, basically what Raymond described. I haven't tested this with a split version of the large file yet, so I don't know if it improves the load speed at all, but it works. The JS is below, but if you want to test it yourself, here's the pen.
// mp3 link is from: https://codepen.io/SitePoint/pen/JRaLVR
(function () {
'use strict';
const context = new AudioContext();
let bufferList = [];
// change the urlList for your needs
const URL = 'https://s3-us-west-2.amazonaws.com/s.cdpn.io/123941/Yodel_Sound_Effect.mp3';
const urlList = [URL, URL, URL, URL, URL, URL];
const loadButton = document.querySelector('.loadFile');
const playButton = document.querySelector('.playFile');
loadButton.onclick = () => loadAllFiles(urlList, loadProgress);
function play(audioBuffer) {
const source = context.createBufferSource();
source.buffer = audioBuffer;
source.connect(context.destination);
source.start();
}
// concatenates all the buffers into one collected ArrayBuffer
function concatBufferList(buflist, len) {
let tmp = new Uint8Array(len);
let pos = 0;
for (let i = 0; i < buflist.length; i++) {
tmp.set(new Uint8Array(buflist[i]), pos);
pos += buflist[i].byteLength;
}
return tmp.buffer;
}
function loadAllFiles(list, onProgress) {
let fileCount = 0;
let fileSize = 0;
for (let i = 0; i < list.length; i++) {
loadFileXML(list[i], loadProgress, i).then(e => {
bufferList[i] = e.buf;
fileSize += e.size;
fileCount++;
if (fileCount == bufferList.length) {
let b = concatBufferList(bufferList, fileSize);
context.decodeAudioData(b).then(audioBuffer => {
playButton.disabled = false;
playButton.onclick = () => play(audioBuffer);
}).catch(error => console.log(error));
}
});
}
}
// adapted from petervdn's audiobuffer-load on npm
function loadFileXML(url, onProgress, index) {
return new Promise((resolve, reject) => {
const request = new XMLHttpRequest();
request.open('GET', url, true);
request.responseType = 'arraybuffer';
if (onProgress) {
request.onprogress = event => {
onProgress(event.loaded / event.total);
};
}
request.onload = () => {
if (request.status === 200) {
const fileSize = request.response.byteLength;
resolve({
buf: request.response,
size: fileSize
});
}
else {
reject(`Error loading '${url}' (${request.status})`);
}
};
request.onerror = error => {
reject(error);
};
request.send();
});
}
function loadProgress(e) {
console.log("Progress: "+e);
}
}());
Is there a way to calculate the MD5 hash of a file before the upload to the server using Javascript?
While there are JS implementations of the MD5 algorithm, older browsers are generally unable to read files from the local filesystem.
I wrote that in 2009. So what about new browsers?
With a browser that supports the FileAPI, you can read the contents of a file - the user has to have selected it, either with an <input> element or drag-and-drop. As of Jan 2013, here's how the major browsers stack up:
FF 3.6 supports FileReader, FF4 supports even more file based functionality
Chrome has supported the FileAPI since version 7.0.517.41
Internet Explorer 10 has partial FileAPI support
Opera 11.10 has partial support for FileAPI
Safari - I couldn't find a good official source for this, but this site suggests partial support from 5.1, full support for 6.0. Another article reports some inconsistencies with the older Safari versions
How?
See the answer below by Benny Neugebauer which uses the MD5 function of CryptoJS
I've made a library that implements incremental md5 in order to hash large files efficiently.
Basically you read a file in chunks (to keep memory low) and hash it incrementally.
You got basic usage and examples in the readme.
Be aware that you need HTML5 FileAPI, so be sure to check for it.
There is a full example in the test folder.
https://github.com/satazor/SparkMD5
it is pretty easy to calculate the MD5 hash using the MD5 function of CryptoJS and the HTML5 FileReader API. The following code snippet shows how you can read the binary data and calculate the MD5 hash from an image that has been dragged into your Browser:
var holder = document.getElementById('holder');
holder.ondragover = function() {
return false;
};
holder.ondragend = function() {
return false;
};
holder.ondrop = function(event) {
event.preventDefault();
var file = event.dataTransfer.files[0];
var reader = new FileReader();
reader.onload = function(event) {
var binary = event.target.result;
var md5 = CryptoJS.MD5(binary).toString();
console.log(md5);
};
reader.readAsBinaryString(file);
};
I recommend to add some CSS to see the Drag & Drop area:
#holder {
border: 10px dashed #ccc;
width: 300px;
height: 300px;
}
#holder.hover {
border: 10px dashed #333;
}
More about the Drag & Drop functionality can be found here: File API & FileReader
I tested the sample in Google Chrome Version 32.
The following snippet shows an example, which can archive a throughput of 400 MB/s while reading and hashing the file.
It is using a library called hash-wasm, which is based on WebAssembly and calculates the hash faster than js-only libraries. As of 2020, all modern browsers support WebAssembly.
const chunkSize = 64 * 1024 * 1024;
const fileReader = new FileReader();
let hasher = null;
function hashChunk(chunk) {
return new Promise((resolve, reject) => {
fileReader.onload = async(e) => {
const view = new Uint8Array(e.target.result);
hasher.update(view);
resolve();
};
fileReader.readAsArrayBuffer(chunk);
});
}
const readFile = async(file) => {
if (hasher) {
hasher.init();
} else {
hasher = await hashwasm.createMD5();
}
const chunkNumber = Math.floor(file.size / chunkSize);
for (let i = 0; i <= chunkNumber; i++) {
const chunk = file.slice(
chunkSize * i,
Math.min(chunkSize * (i + 1), file.size)
);
await hashChunk(chunk);
}
const hash = hasher.digest();
return Promise.resolve(hash);
};
const fileSelector = document.getElementById("file-input");
const resultElement = document.getElementById("result");
fileSelector.addEventListener("change", async(event) => {
const file = event.target.files[0];
resultElement.innerHTML = "Loading...";
const start = Date.now();
const hash = await readFile(file);
const end = Date.now();
const duration = end - start;
const fileSizeMB = file.size / 1024 / 1024;
const throughput = fileSizeMB / (duration / 1000);
resultElement.innerHTML = `
Hash: ${hash}<br>
Duration: ${duration} ms<br>
Throughput: ${throughput.toFixed(2)} MB/s
`;
});
<script src="https://cdn.jsdelivr.net/npm/hash-wasm"></script>
<!-- defines the global `hashwasm` variable -->
<input type="file" id="file-input">
<div id="result"></div>
HTML5 + spark-md5 and Q
Assuming your'e using a modern browser (that supports HTML5 File API), here's how you calculate the MD5 Hash of a large file (it will calculate the hash on variable chunks)
function calculateMD5Hash(file, bufferSize) {
var def = Q.defer();
var fileReader = new FileReader();
var fileSlicer = File.prototype.slice || File.prototype.mozSlice || File.prototype.webkitSlice;
var hashAlgorithm = new SparkMD5();
var totalParts = Math.ceil(file.size / bufferSize);
var currentPart = 0;
var startTime = new Date().getTime();
fileReader.onload = function(e) {
currentPart += 1;
def.notify({
currentPart: currentPart,
totalParts: totalParts
});
var buffer = e.target.result;
hashAlgorithm.appendBinary(buffer);
if (currentPart < totalParts) {
processNextPart();
return;
}
def.resolve({
hashResult: hashAlgorithm.end(),
duration: new Date().getTime() - startTime
});
};
fileReader.onerror = function(e) {
def.reject(e);
};
function processNextPart() {
var start = currentPart * bufferSize;
var end = Math.min(start + bufferSize, file.size);
fileReader.readAsBinaryString(fileSlicer.call(file, start, end));
}
processNextPart();
return def.promise;
}
function calculate() {
var input = document.getElementById('file');
if (!input.files.length) {
return;
}
var file = input.files[0];
var bufferSize = Math.pow(1024, 2) * 10; // 10MB
calculateMD5Hash(file, bufferSize).then(
function(result) {
// Success
console.log(result);
},
function(err) {
// There was an error,
},
function(progress) {
// We get notified of the progress as it is executed
console.log(progress.currentPart, 'of', progress.totalParts, 'Total bytes:', progress.currentPart * bufferSize, 'of', progress.totalParts * bufferSize);
});
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/q.js/1.4.1/q.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/spark-md5/2.0.2/spark-md5.min.js"></script>
<div>
<input type="file" id="file"/>
<input type="button" onclick="calculate();" value="Calculate" class="btn primary" />
</div>
You need to to use FileAPI. It is available in the latest FF & Chrome, but not IE9.
Grab any md5 JS implementation suggested above. I've tried this and abandoned it because JS was too slow (minutes on large image files). Might revisit it if someone rewrites MD5 using typed arrays.
Code would look something like this:
HTML:
<input type="file" id="file-dialog" multiple="true" accept="image/*">
JS (w JQuery)
$("#file-dialog").change(function() {
handleFiles(this.files);
});
function handleFiles(files) {
for (var i=0; i<files.length; i++) {
var reader = new FileReader();
reader.onload = function() {
var md5 = binl_md5(reader.result, reader.result.length);
console.log("MD5 is " + md5);
};
reader.onerror = function() {
console.error("Could not read the file");
};
reader.readAsBinaryString(files.item(i));
}
}
Apart from the impossibility to get
file system access in JS, I would not
put any trust at all in a
client-generated checksum. So
generating the checksum on the server
is mandatory in any case. – Tomalak
Apr 20 '09 at 14:05
Which is useless in most cases. You want the MD5 computed at client side, so that you can compare it with the code recomputed at server side and conclude the upload went wrong if they differ. I have needed to do that in applications working with large files of scientific data, where receiving uncorrupted files were key. My cases was simple, cause users had the MD5 already computed from their data analysis tools, so I just needed to ask it to them with a text field.
If sha256 is also fine:
async sha256(file: File) {
// get byte array of file
let buffer = await file.arrayBuffer();
// hash the message
const hashBuffer = await crypto.subtle.digest('SHA-256', buffer);
// convert ArrayBuffer to Array
const hashArray = Array.from(new Uint8Array(hashBuffer));
// convert bytes to hex string
const hashHex = hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
return hashHex;
}
To get the hash of files, there are a lot of options. Normally the problem is that it's really slow to get the hash of big files.
I created a little library that get the hash of files, with the 64kb of the start of the file and the 64kb of the end of it.
Live example: http://marcu87.github.com/hashme/ and library: https://github.com/marcu87/hashme
hope you have found a good solution by now. If not, the solution below is an ES6 promise implementation based on js-spark-md5
import SparkMD5 from 'spark-md5';
// Read in chunks of 2MB
const CHUCK_SIZE = 2097152;
/**
* Incrementally calculate checksum of a given file based on MD5 algorithm
*/
export const checksum = (file) =>
new Promise((resolve, reject) => {
let currentChunk = 0;
const chunks = Math.ceil(file.size / CHUCK_SIZE);
const blobSlice =
File.prototype.slice ||
File.prototype.mozSlice ||
File.prototype.webkitSlice;
const spark = new SparkMD5.ArrayBuffer();
const fileReader = new FileReader();
const loadNext = () => {
const start = currentChunk * CHUCK_SIZE;
const end =
start + CHUCK_SIZE >= file.size ? file.size : start + CHUCK_SIZE;
// Selectively read the file and only store part of it in memory.
// This allows client-side applications to process huge files without the need for huge memory
fileReader.readAsArrayBuffer(blobSlice.call(file, start, end));
};
fileReader.onload = e => {
spark.append(e.target.result);
currentChunk++;
if (currentChunk < chunks) loadNext();
else resolve(spark.end());
};
fileReader.onerror = () => {
return reject('Calculating file checksum failed');
};
loadNext();
});
There is a couple scripts out there on the internet to create an MD5 Hash.
The one from webtoolkit is good, http://www.webtoolkit.info/javascript-md5.html
Although, I don't believe it will have access to the local filesystem as that access is limited.
This is another hash-wasm example, but using the streams API, instead of having to set FileReader:
async function calculateSHA1(file: File) {
const hasher = await createSHA1()
const hasherStream = new WritableStream<Uint8Array>({
start: () => {
hasher.init()
// you can set UI state here also
},
write: chunk => {
hasher.update(chunk)
// you can set UI state here also
},
close: () => {
// you can set UI state here also
},
})
await file.stream().pipeTo(hasherStream)
return hasher.digest('hex')
}
I don't believe there is a way in javascript to access the contents of a file upload. So you therefore cannot look at the file contents to generate an MD5 sum.
You can however send the file to the server, which can then send an MD5 sum back or send the file contents back .. but that's a lot of work and probably not worthwhile for your purposes.