Splitting large file load into chunks, stitching to AudioBuffer?

Splitting large file load into chunks, stitching to AudioBuffer? - javascript

In my app, I have an hour-long audio file that's entirely sound effects. Unfortunately I do need them all - they're species-specific sounds, so I can't cut any of them out. They were separate before, but I audiosprite'd them all into one large file.
The export file is about 20MB compressed, but it's still a large download for users with a slow connection. I need this file to be in an AudioBuffer, since I'm seeking to sections of an audioSprite and using loopStart/loopEnd to only loop that section. I more or less need the whole thing downloaded before playback can start, because the requested species are randomly picked when the app starts. They could be looking for sounds at the start of the file, or at the very end.
What I'm wondering is, if I were to split this file in fourths, could I load them in in parallel, and stitch them into the full AudioBuffer once loading finishes? I'm guessing I'd be merging multiple arrays, but only performing decodeAudioData() once? Requesting ~100 separate files (too many) was what brought me to audiosprites in the first place, but I'm wondering if there's a way to leverage some amount of async loading to lower the time it takes. I thought about having four <audio> elements and using createMediaElementSource() to load them, but my understanding is that I can't (?) turn a MediaElementSource into an AudioBuffer.

Consider playing the files immediately in chucks instead of waiting for the entire file to download. You could do this with the Streams API and:
Queuing chunks with the MediaSource Extensions (MSE) API and switching between buffers.
Playing back decoded PCM audio with the Web Audio API and AudioBuffer.
See examples for low-latency audio playback of file chunks as they are received.

I think in principle you can. Just download each chunk as an ArrayBuffer, concatenate all of the chunks together and send that to decodeAudioData.
But if you're on a slow link, I'm not sure how downloading in parallel will help.

Edit: this code is functional, but on occasion produces really nasty audio glitches, so I don't recommend using it without further testing. I'm leaving it here in case it helps someone else figure out working with Uint8Arrays.
So here's a basic version of it, basically what Raymond described. I haven't tested this with a split version of the large file yet, so I don't know if it improves the load speed at all, but it works. The JS is below, but if you want to test it yourself, here's the pen.
// mp3 link is from: https://codepen.io/SitePoint/pen/JRaLVR
(function () {
'use strict';
const context = new AudioContext();
let bufferList = [];
// change the urlList for your needs
const URL = 'https://s3-us-west-2.amazonaws.com/s.cdpn.io/123941/Yodel_Sound_Effect.mp3';
const urlList = [URL, URL, URL, URL, URL, URL];
const loadButton = document.querySelector('.loadFile');
const playButton = document.querySelector('.playFile');
loadButton.onclick = () => loadAllFiles(urlList, loadProgress);
function play(audioBuffer) {
const source = context.createBufferSource();
source.buffer = audioBuffer;
source.connect(context.destination);
source.start();
}
// concatenates all the buffers into one collected ArrayBuffer
function concatBufferList(buflist, len) {
let tmp = new Uint8Array(len);
let pos = 0;
for (let i = 0; i < buflist.length; i++) {
tmp.set(new Uint8Array(buflist[i]), pos);
pos += buflist[i].byteLength;
}
return tmp.buffer;
}
function loadAllFiles(list, onProgress) {
let fileCount = 0;
let fileSize = 0;
for (let i = 0; i < list.length; i++) {
loadFileXML(list[i], loadProgress, i).then(e => {
bufferList[i] = e.buf;
fileSize += e.size;
fileCount++;
if (fileCount == bufferList.length) {
let b = concatBufferList(bufferList, fileSize);
context.decodeAudioData(b).then(audioBuffer => {
playButton.disabled = false;
playButton.onclick = () => play(audioBuffer);
}).catch(error => console.log(error));
}
});
}
}
// adapted from petervdn's audiobuffer-load on npm
function loadFileXML(url, onProgress, index) {
return new Promise((resolve, reject) => {
const request = new XMLHttpRequest();
request.open('GET', url, true);
request.responseType = 'arraybuffer';
if (onProgress) {
request.onprogress = event => {
onProgress(event.loaded / event.total);
};
}
request.onload = () => {
if (request.status === 200) {
const fileSize = request.response.byteLength;
resolve({
buf: request.response,
size: fileSize
});
}
else {
reject(`Error loading '${url}' (${request.status})`);
}
};
request.onerror = error => {
reject(error);
};
request.send();
});
}
function loadProgress(e) {
console.log("Progress: "+e);
}
}());

Related

How to write BLE write characteristic over 512B

I have a client attempting to send images to a server over BLE.
Client Code
//BoilerPlate to setup connection and whatnot
sendFile.onclick = async () => {
var fileList = document.getElementById("myFile").files;
var fileReader = new FileReader();
if (fileReader && fileList && fileList.length) {
fileReader.readAsArrayBuffer(fileList[0]);
fileReader.onload = function () {
var imageData = fileReader.result;
//Server doesn't get data if I don't do this chunking
imageData = imageData.slice(0,512);
const base64String = _arrayBufferToBase64(imageData);
document.getElementById("ItemPreview").src = "data:image/jpeg;base64," + base64String;
sendCharacteristic.writeValue(imageData);
};
}
};
Server Code
MyCharacteristic.prototype.onWriteRequest = function(data, offset, withoutResponse, callback) {
//It seems this will not print out if Server sends over 512B.
console.log(this._value);
};
My goal is to send small images (Just ~6kb)...These are still so small that'd I'd still prefer to use BLE over a BT Serial Connection. Is the only way this is possible is to perform some chunking and then streaming the chunks over?
Current 'Chunking' Code
const MAX_LENGTH = 512;
for (let i=0; i<bytes.byteLength; i+= MAX_LENGTH) {
const end = (i+MAX_LENGTH > bytes.byteLength) ? bytes.byteLength : i+MAX_LENGTH;
const chunk = bytes.slice(i, end);
sendCharacteristic.writeValue(chunk);
await sleep(1000);
}
The above code works, however it sleeps in between sends. I'd rather not do this because there's no guarantee a previous packet will be finished sending and I could sleep longer than needed.
I'm also perplexed on how the server code would then know the client has finished sending all bytes and can then assemble them. Is there some kind of pattern to achieving this?

BLE characteristic values can only be 512 bytes, so yes the common way to send larger data is to split it into multiple chunks. Use "Write Without Response" for best performance (MTU-3 must be at least as big as your chunk).

How to make upload faster in angular 7?

I am trying to speed up the upload. So I tried with different solution, with both BackEnd and Front-End. Those are,
1) I uploaded the tar file (already compressed one)
2) I tried chunk upload (sequentially), if the response is success next API will get triggered. In the back-end side, in the same file the content will get appended.
3) I tried chunk upload but in parallel, at a single time I make the 50 request to upload the chunk content (I know, at a time browser handle only 6 requests). From the backend side, we are storing all the chunk file separately, after receiving the final request, appending all those chunks in to the single file.
But observed is, I am not seeing the much difference with all these cases.
Following is my service file
export class largeGeneUpload {
chromosomeFile: any;
options: any;
chunkSize = 1200000;
activeConnections = 0;
threadsQuantity = 50;
totalChunkCount = 0;
chunksPosition = 0;
failedChunks = [];
sendNext() {
if (this.activeConnections >= this.threadsQuantity) {
return;
}
if (this.chunksPosition === this.totalChunkCount) {
console.log('all chunks are done');
return;
}
const i = this.chunksPosition;
const url = 'gene/human';
const chunkIndex = i;
const start = chunkIndex * this.chunkSize;
const end = Math.min(start + this.chunkSize, this.chromosomeFile.size);
const currentchunkSize = this.chunkSize * i;
const chunkData = this.chromosomeFile.webkitSlice ? this.chromosomeFile.webkitSlice(start, end) : this.chromosomeFile.slice(start, end);
const fd = new FormData();
const binar = new File([chunkData], this.chromosomeFile.upload.filename);
console.log(binar);
fd.append('file', binar);
fd.append('dzuuid', this.chromosomeFile.upload.uuid);
fd.append('dzchunkindex', chunkIndex.toString());
fd.append('dztotalfilesize', this.chromosomeFile.upload.total);
fd.append('dzchunksize', this.chunkSize.toString());
fd.append('dztotalchunkcount', this.chromosomeFile.upload.totalChunkCount);
fd.append('isCancel', 'false');
fd.append('dzchunkbyteoffset', currentchunkSize.toString());
this.chunksPosition += 1;
this.activeConnections += 1;
this.apiDataService.uploadChunk(url, fd)
.then(() => {
this.activeConnections -= 1;
this.sendNext();
})
.catch((error) => {
this.activeConnections -= 1;
console.log('error here');
// chunksQueue.push(chunkId);
});
this.sendNext();
}
uploadChunk(resrc: string, item) {
return new Promise((resolve, reject) => {
this._http.post(this.baseApiUrl + resrc, item, {
headers: this.headers,
withCredentials: true
}).subscribe(r => {
console.log(r);
resolve();
}, err => {
console.log('err', err);
reject();
});
});
}
But the thing is, If I upload the same file in google drive it is not taking much time.
Let's consider, I have 700 MB file, to upload it in google drive it took 3 mins. But the same 700 MB file to upload with my Angular code with our back-end server it took 7 mins to finish it.
How do I improve the performance of file upload.?

forgive me ,
it seems silly answer but this depend on your hosting infrastructure

A lot of variables can cause this, but by your story it has nothing to do with your front-end code. Making it into chunks is not going to help, because browsers have their own optimized algorithm to upload files. The most likely culprit is your backend server or the connection from your client to the server.
You say that google drive is fast, but you should also know that google has a very widespread global infrastructure with top of the line cloud servers. If you are using, for example, a 2 euro per month fixed place hosting provider, you cannot expect the same processing and network power as google.

Read blob contents into an existing SharedArrayBuffer

I'm trying to find the most efficient way to read the contents of a Blob into an existing SharedArrayBuffer originating is a worker waiting for the buffer to be poplated. In my case, I can guarantee that the SharedArrayBuffer is at least long enough to hold the entire contents of the Blob. The best approach I've come up with is:
// Assume 'blob' is the blob we are reading
// and 'buffer' is the SharedArrayBuffer.
const fr = new FileReader();
fr.addEventListener('load', e =>
new Uint8Array(buffer).set(new Uint8Array(e.target.result)));
fr.readAsArrayBuffer(blob);
This seems inefficient, especially if the blob being read is relatively large.

Blob is not a Transferable object. Also, there is no .readAsSharedArrayBuffer method available on FileReader.
However, if you only need to read a Blob from multiple workers simultaneously, I believe you can achieve this with URL.createObjectURL() and fetch, although I have not tested this with multiple workers:
// === main thread ===
let objectUrl = URL.createObjectURL(blob);
worker1.postMessage(objectUrl);
worker2.postMessage(objectUrl);
// === worker 1 & 2 ===
self.onmessage = msg => {
fetch(msg.data)
.then(res => res.blob())
.then(blob => {
doSomethingWithBlob(blob);
});
};
Otherwise, as far as I can tell, there really isn't an efficient way to load data from a file into a SharedArrayBuffer.
I also provide a method here for transferring chunks of a blob from main thread to a single worker. For my use case, the files are too big to read the entire contents into a single array buffer anyway (shared or not), so I use .slice to deal in chunks. Something like this will let you deliver tons of data to a single worker in a stream-like fashion via multiple .postMessage calls using the Transferable ArrayBuffer:
// === main thread ===
let eof = false;
let nextBuffer = null;
let workerReady = true;
let read = 0;
function nextChunk() {
let end = read + chunkSize;
if(end >= file.length) {
end = file.length;
eof = true;
}
let slice = file.slice(read, end);
read = end;
fr.readAsArrayBuffer(slice);
}
fr.onload = event => {
let ab = event.target.result;
if(workerReady) {
worker.postMessage(ab, [ab]);
workerReady = false;
if(!eof) nextChunk();
}
else {
nextBuffer = ab;
}
};
// wait until the worker finished the last chunk
// ... otherwise we'll flood main thread's heap
worker.onmessage = msg => {
if(nextBuffer) {
worker.postMessage(nextBuffer, [nextBuffer]);
nextBuffer = null;
}
else if(!eof && msg.ready) {
nextChunk();
}
};
nextChunk();
// === worker ===
self.onmessage = msg => {
let ab = msg.data;
// ... do stuff with data ...
self.postMessage({ready:true});
};
This will read a chunk of data into an ArrayBuffer in the main thread, transfer that to the worker, and then read the next chunk into memory while waiting for worker to process the previous chunk. This basically ensures that both threads stay busy the whole time.

How to load audio file into AudioContext like stream?

For example i want to load 100MB mp3 file into AudioContext, and i can do that with using XMLHttpRequest.
But with this solution i need to load all file and only then i can play it, because onprogress method don't return data.
xhr.onprogress = function(e) {
console.log(this.response); //return null
};
Also i tried to do that with fetch method, but this way have same problem.
fetch(url).then((data) => {
console.log(data); //return some ReadableStream in body,
//but i can't find way to use that
});
There is any way to load audio file like stream in client JavaScript?

You need to handle the ajax response in a streaming way.
there is no standard way to do this until fetch & ReadableStream have properly been implemented across all the browsers
I'll show you the most correct way according to the new standard how you should deal with streaming a ajax response
// only works in Blink right now
fetch(url).then(res => {
let reader = res.body.getReader()
let pump = () => {
reader.read().then(({value, done}) => {
value // chunk of data (push chunk to audio context)
if(!done) pump()
})
}
pump()
})
Firefox is working on implementing streams but until then you need to use xhr and moz-chunked-arraybuffer
IE/edge has ms-stream that you can use but it's more complicated

How can I send value.buffer to AudioContext?
This only plays the first chunk and it doesn't work correctly.
const context = new AudioContext()
const source = context.createBufferSource()
source.connect(context.destination)
const reader = response.body.getReader()
while (true) {
await reader.read()
const { done, value } = await reader.read()
if (done) {
break
}
const buffer = await context.decodeAudioData(value.buffer)
source.buffer = buffer
source.start(startTime)
}

How to calculate md5 hash of a file using javascript

Is there a way to calculate the MD5 hash of a file before the upload to the server using Javascript?

While there are JS implementations of the MD5 algorithm, older browsers are generally unable to read files from the local filesystem.
I wrote that in 2009. So what about new browsers?
With a browser that supports the FileAPI, you can read the contents of a file - the user has to have selected it, either with an <input> element or drag-and-drop. As of Jan 2013, here's how the major browsers stack up:
FF 3.6 supports FileReader, FF4 supports even more file based functionality
Chrome has supported the FileAPI since version 7.0.517.41
Internet Explorer 10 has partial FileAPI support
Opera 11.10 has partial support for FileAPI
Safari - I couldn't find a good official source for this, but this site suggests partial support from 5.1, full support for 6.0. Another article reports some inconsistencies with the older Safari versions
How?
See the answer below by Benny Neugebauer which uses the MD5 function of CryptoJS

I've made a library that implements incremental md5 in order to hash large files efficiently.
Basically you read a file in chunks (to keep memory low) and hash it incrementally.
You got basic usage and examples in the readme.
Be aware that you need HTML5 FileAPI, so be sure to check for it.
There is a full example in the test folder.
https://github.com/satazor/SparkMD5

it is pretty easy to calculate the MD5 hash using the MD5 function of CryptoJS and the HTML5 FileReader API. The following code snippet shows how you can read the binary data and calculate the MD5 hash from an image that has been dragged into your Browser:
var holder = document.getElementById('holder');
holder.ondragover = function() {
return false;
};
holder.ondragend = function() {
return false;
};
holder.ondrop = function(event) {
event.preventDefault();
var file = event.dataTransfer.files[0];
var reader = new FileReader();
reader.onload = function(event) {
var binary = event.target.result;
var md5 = CryptoJS.MD5(binary).toString();
console.log(md5);
};
reader.readAsBinaryString(file);
};
I recommend to add some CSS to see the Drag & Drop area:
#holder {
border: 10px dashed #ccc;
width: 300px;
height: 300px;
}
#holder.hover {
border: 10px dashed #333;
}
More about the Drag & Drop functionality can be found here: File API & FileReader
I tested the sample in Google Chrome Version 32.

The following snippet shows an example, which can archive a throughput of 400 MB/s while reading and hashing the file.
It is using a library called hash-wasm, which is based on WebAssembly and calculates the hash faster than js-only libraries. As of 2020, all modern browsers support WebAssembly.
const chunkSize = 64 * 1024 * 1024;
const fileReader = new FileReader();
let hasher = null;
function hashChunk(chunk) {
return new Promise((resolve, reject) => {
fileReader.onload = async(e) => {
const view = new Uint8Array(e.target.result);
hasher.update(view);
resolve();
};
fileReader.readAsArrayBuffer(chunk);
});
}
const readFile = async(file) => {
if (hasher) {
hasher.init();
} else {
hasher = await hashwasm.createMD5();
}
const chunkNumber = Math.floor(file.size / chunkSize);
for (let i = 0; i <= chunkNumber; i++) {
const chunk = file.slice(
chunkSize * i,
Math.min(chunkSize * (i + 1), file.size)
);
await hashChunk(chunk);
}
const hash = hasher.digest();
return Promise.resolve(hash);
};
const fileSelector = document.getElementById("file-input");
const resultElement = document.getElementById("result");
fileSelector.addEventListener("change", async(event) => {
const file = event.target.files[0];
resultElement.innerHTML = "Loading...";
const start = Date.now();
const hash = await readFile(file);
const end = Date.now();
const duration = end - start;
const fileSizeMB = file.size / 1024 / 1024;
const throughput = fileSizeMB / (duration / 1000);
resultElement.innerHTML = `
Hash: ${hash}<br>
Duration: ${duration} ms<br>
Throughput: ${throughput.toFixed(2)} MB/s
`;
});
<script src="https://cdn.jsdelivr.net/npm/hash-wasm"></script>
<!-- defines the global `hashwasm` variable -->
<input type="file" id="file-input">
<div id="result"></div>

HTML5 + spark-md5 and Q
Assuming your'e using a modern browser (that supports HTML5 File API), here's how you calculate the MD5 Hash of a large file (it will calculate the hash on variable chunks)
function calculateMD5Hash(file, bufferSize) {
var def = Q.defer();
var fileReader = new FileReader();
var fileSlicer = File.prototype.slice || File.prototype.mozSlice || File.prototype.webkitSlice;
var hashAlgorithm = new SparkMD5();
var totalParts = Math.ceil(file.size / bufferSize);
var currentPart = 0;
var startTime = new Date().getTime();
fileReader.onload = function(e) {
currentPart += 1;
def.notify({
currentPart: currentPart,
totalParts: totalParts
});
var buffer = e.target.result;
hashAlgorithm.appendBinary(buffer);
if (currentPart < totalParts) {
processNextPart();
return;
}
def.resolve({
hashResult: hashAlgorithm.end(),
duration: new Date().getTime() - startTime
});
};
fileReader.onerror = function(e) {
def.reject(e);
};
function processNextPart() {
var start = currentPart * bufferSize;
var end = Math.min(start + bufferSize, file.size);
fileReader.readAsBinaryString(fileSlicer.call(file, start, end));
}
processNextPart();
return def.promise;
}
function calculate() {
var input = document.getElementById('file');
if (!input.files.length) {
return;
}
var file = input.files[0];
var bufferSize = Math.pow(1024, 2) * 10; // 10MB
calculateMD5Hash(file, bufferSize).then(
function(result) {
// Success
console.log(result);
},
function(err) {
// There was an error,
},
function(progress) {
// We get notified of the progress as it is executed
console.log(progress.currentPart, 'of', progress.totalParts, 'Total bytes:', progress.currentPart * bufferSize, 'of', progress.totalParts * bufferSize);
});
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/q.js/1.4.1/q.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/spark-md5/2.0.2/spark-md5.min.js"></script>
<div>
<input type="file" id="file"/>
<input type="button" onclick="calculate();" value="Calculate" class="btn primary" />
</div>

You need to to use FileAPI. It is available in the latest FF & Chrome, but not IE9.
Grab any md5 JS implementation suggested above. I've tried this and abandoned it because JS was too slow (minutes on large image files). Might revisit it if someone rewrites MD5 using typed arrays.
Code would look something like this:
HTML:
<input type="file" id="file-dialog" multiple="true" accept="image/*">
JS (w JQuery)
$("#file-dialog").change(function() {
handleFiles(this.files);
});
function handleFiles(files) {
for (var i=0; i<files.length; i++) {
var reader = new FileReader();
reader.onload = function() {
var md5 = binl_md5(reader.result, reader.result.length);
console.log("MD5 is " + md5);
};
reader.onerror = function() {
console.error("Could not read the file");
};
reader.readAsBinaryString(files.item(i));
}
}

Apart from the impossibility to get
file system access in JS, I would not
put any trust at all in a
client-generated checksum. So
generating the checksum on the server
is mandatory in any case. – Tomalak
Apr 20 '09 at 14:05
Which is useless in most cases. You want the MD5 computed at client side, so that you can compare it with the code recomputed at server side and conclude the upload went wrong if they differ. I have needed to do that in applications working with large files of scientific data, where receiving uncorrupted files were key. My cases was simple, cause users had the MD5 already computed from their data analysis tools, so I just needed to ask it to them with a text field.

If sha256 is also fine:
async sha256(file: File) {
// get byte array of file
let buffer = await file.arrayBuffer();
// hash the message
const hashBuffer = await crypto.subtle.digest('SHA-256', buffer);
// convert ArrayBuffer to Array
const hashArray = Array.from(new Uint8Array(hashBuffer));
// convert bytes to hex string
const hashHex = hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
return hashHex;
}

To get the hash of files, there are a lot of options. Normally the problem is that it's really slow to get the hash of big files.
I created a little library that get the hash of files, with the 64kb of the start of the file and the 64kb of the end of it.
Live example: http://marcu87.github.com/hashme/ and library: https://github.com/marcu87/hashme

hope you have found a good solution by now. If not, the solution below is an ES6 promise implementation based on js-spark-md5
import SparkMD5 from 'spark-md5';
// Read in chunks of 2MB
const CHUCK_SIZE = 2097152;
/**
* Incrementally calculate checksum of a given file based on MD5 algorithm
*/
export const checksum = (file) =>
new Promise((resolve, reject) => {
let currentChunk = 0;
const chunks = Math.ceil(file.size / CHUCK_SIZE);
const blobSlice =
File.prototype.slice ||
File.prototype.mozSlice ||
File.prototype.webkitSlice;
const spark = new SparkMD5.ArrayBuffer();
const fileReader = new FileReader();
const loadNext = () => {
const start = currentChunk * CHUCK_SIZE;
const end =
start + CHUCK_SIZE >= file.size ? file.size : start + CHUCK_SIZE;
// Selectively read the file and only store part of it in memory.
// This allows client-side applications to process huge files without the need for huge memory
fileReader.readAsArrayBuffer(blobSlice.call(file, start, end));
};
fileReader.onload = e => {
spark.append(e.target.result);
currentChunk++;
if (currentChunk < chunks) loadNext();
else resolve(spark.end());
};
fileReader.onerror = () => {
return reject('Calculating file checksum failed');
};
loadNext();
});

There is a couple scripts out there on the internet to create an MD5 Hash.
The one from webtoolkit is good, http://www.webtoolkit.info/javascript-md5.html
Although, I don't believe it will have access to the local filesystem as that access is limited.

This is another hash-wasm example, but using the streams API, instead of having to set FileReader:
async function calculateSHA1(file: File) {
const hasher = await createSHA1()
const hasherStream = new WritableStream<Uint8Array>({
start: () => {
hasher.init()
// you can set UI state here also
},
write: chunk => {
hasher.update(chunk)
// you can set UI state here also
},
close: () => {
// you can set UI state here also
},
})
await file.stream().pipeTo(hasherStream)
return hasher.digest('hex')
}

I don't believe there is a way in javascript to access the contents of a file upload. So you therefore cannot look at the file contents to generate an MD5 sum.
You can however send the file to the server, which can then send an MD5 sum back or send the file contents back .. but that's a lot of work and probably not worthwhile for your purposes.

We Keep Coding

JavaScript is the programming language of the Web.

Splitting large file load into chunks, stitching to AudioBuffer? - javascript

I think in principle you can. Just download each chunk as an ArrayBuffer, concatenate all of the chunks together and send that to decodeAudioData. But if you're on a slow link, I'm not sure how downloading in parallel will help.

Related

How to write BLE write characteristic over 512B

How to make upload faster in angular 7?

Read blob contents into an existing SharedArrayBuffer

How to load audio file into AudioContext like stream?

How to calculate md5 hash of a file using javascript

Categories

Resources