Large file upload with WebSocket

Large file upload with WebSocket - javascript

I'm trying to upload large files (at least 500MB, preferably up to a few GB) using the WebSocket API. The problem is that I can't figure out how to write "send this slice of the file, release the resources used then repeat". I was hoping I could avoid using something like Flash/Silverlight for this.
Currently, I'm working with something along the lines of:
function FileSlicer(file) {
// randomly picked 1MB slices,
// I don't think this size is important for this experiment
this.sliceSize = 1024*1024;
this.slices = Math.ceil(file.size / this.sliceSize);
this.currentSlice = 0;
this.getNextSlice = function() {
var start = this.currentSlice * this.sliceSize;
var end = Math.min((this.currentSlice+1) * this.sliceSize, file.size);
++this.currentSlice;
return file.slice(start, end);
}
}
Then, I would upload using:
function Uploader(url, file) {
var fs = new FileSlicer(file);
var socket = new WebSocket(url);
socket.onopen = function() {
for(var i = 0; i < fs.slices; ++i) {
socket.send(fs.getNextSlice()); // see below
}
}
}
Basically this returns immediately, bufferedAmount is unchanged (0) and it keeps iterating and adding all the slices to the queue before attempting to send it; there's no socket.afterSend to allow me to queue it properly, which is where I'm stuck.

Use web workers for large files processing instead doing it in main thread and upload chunks of file data using file.slice().
This article helps you to handle large files in workers. change XHR send to Websocket in main thread.
//Messages from worker
function onmessage(blobOrFile) {
ws.send(blobOrFile);
}
//construct file on server side based on blob or chunk information.

I believe the send() method is asynchronous which is why it will return immediately. To make it queue, you'd need the server to send a message back to the client after each slice is uploaded; the client can then decide whether it needs to send the next slice or a "upload complete" message back to the server.
This sort of thing would probably be easier using XMLHttpRequest(2); it has callback support built-in and is also more widely supported than the WebSocket API.

In order to serialize this operation you need the server to send you a signal every time a slice is received & written (or an error occurs), this way you could send the next slice in response to the onmessage event, pretty much like this:
function Uploader(url, file) {
var fs = new FileSlicer(file);
var socket = new WebSocket(url);
socket.onopen = function() {
socket.send(fs.getNextSlice());
}
socket.onmessage = function(ms){
if(ms.data=="ok"){
fs.slices--;
if(fs.slices>0) socket.send(fs.getNextSlice());
}else{
// handle the error code here.
}
}
}

You could use https://github.com/binaryjs/binaryjs or https://github.com/liamks/Delivery.js if you can run node.js on the server.

EDIT : The web world, browsers, firewalls, proxies, changed a lot since this answer was made. Right now, sending files using websockets
can be done efficiently, especially on local area networks.
Websockets are very efficient for bidirectional communication, especially when you're interested in pushing information (preferably small) from the server. They act as bidirectional sockets (hence their name).
Websockets don't look like the right technology to use in this situation. Especially given that using them adds incompatibilities with some proxies, browsers (IE) or even firewalls.
On the other end, uploading a file is simply sending a POST request to a server with the file in the body. Browsers are very good at that and the overhead for a big file is really near nothing. Don't use websockets for that task.

I think this socket.io project has a lot of potential:
https://github.com/sffc/socketio-file-upload
It supports chunked upload, progress tracking and seems fairly easy to use.

Related

Is AudioContext / getChannelData deterministic?

I'm analysing an audio file in order to use the channelData to drive another part of my webapp (basically draw graphics based on the audio file). The callback function for the playback looks something like this:
successCallback(mediaStream) {
var audioContext = new (window.AudioContext ||
window.webkitAudioContext)();
source = audioContext.createMediaStreamSource(mediaStream);
node = audioContext.createScriptProcessor(256, 1, 1);
node.onaudioprocess = function(data) {
var monoChannel = data.inputBuffer.getChannelData(0);
..
};
Somehow I thought if I run the above code with the same file it would yield the same results all the time. But that's not the case. The same audio file would trigger the onaudioprocess function sometimes 70, sometimes 72 times for instance, yielding different data all the time.
Is there a way to get consistent data of that sort in the browser?
EDIT: I'm getting the audio from a recording function on the same page. When the recording is finished the resulting file gets set as the src of an <audio> element. recorder is my MediaRecorder.
recorder.addEventListener("dataavailable", function(e) {
fileurl = URL.createObjectURL(e.data);
document.querySelector("#localaudio").src = fileurl;
..

To answer your original question: getChannelData is deterministic, i.e. it will yield the same Float32Array from the same AudioBuffer for the same channel (unless you happen to transfer the backing ArrayBuffer to another thread, in which case it will return an empty Float32Array with a detached backing buffer from then on).
I presume the problem you are encountering here is a threading issue (my guess is that the MediaStream is already playing before you start processing the audio stream from it), but it's hard to tell exactly without debugging your complete app (there are at least 3 threads at work here: an audio processing thread for the MediaStream, an audio processing thread for the AudioContext you are using, and the main thread that runs your code).
Is there a way to get consistent data of that sort in the browser?
Yes.
Instead of processing through a real-time audio stream for real-time analysis, you could just take the recording result (e.data), read it as an ArrayBuffer, and then decode it as an AudioBuffer, something like:
recorder.addEventListener("dataavailable", function (e) {
let reader = new FileReader();
reader.onload = function (e) {
audioContext.decodeAudioData(e.target.result).then(function (audioBuffer) {
var monoChannel = audioBuffer.getChannelData(0);
// monoChannel contains the entire first channel of your recording as a Float32Array
// ...
});
};
reader.readAsArrayBuffer(e.data);
}
Note: this code would become a lot simpler with async functions and Promises, but it should give a general idea of how to read the entire completed recording.
Also note: the ScriptProcessorNode is deprecated due to performance issues inherent in cross-thread data copy, especially involving the JS main thread. The preferred alternative is the much more advanced AudioWorklet, but this is a fairly new way to do things on the web and requires a solid understanding of worklets in general.

Playback when using Web Audio API skips at the beginning of every chunk

I've been building a music app and today I finally got around to the point where I started trying to work playing the music into it.
As an outline of how my environment is set up, I am storing the music files as MP3s which I have uploaded into a MongoDB database using GridFS. I then use a socket.io server to download the chunks from the MongoDB database and send them as individual emits to the front end where the are processed by the Web Audio API and scheduled to play.
When they play, they are all in the correct order but there is this very tiny glitch or skip at the same spots every time (presumably between chunks) that I can't seem to get rid of. As far as I can tell, they are all scheduled right up next to each other so I can't find a reason why there should be any sort of gap or overlap between them. Any help would be appreciated. Here's the code:
Socket Route
socket.on('stream-audio', () => {
db.client.db("dev").collection('music.files').findOne({"metadata.songId": "3"}).then((result) =>{
const bucket = new GridFSBucket(db.client.db("dev"), {
bucketName: "music"
});
bucket.openDownloadStream(result._id).on('data',(chunk) => {
socket.emit('audio-chunk',chunk)
});
});
});
Front end
//These variable are declared as object variables, hence all of the "this" keywords
context: new (window.AudioContext || window.webkitAudioContext)(),
freeTime: null,
numChunks: 0,
chunkTracker: [],
...
this.socket.on('audio-chunk', (chunk) => {
//Keeping track of chunk decoding status so that they don't get scheduled out of order
const chunkId = this.numChunks
this.chunkTracker.push({
id: chunkId,
complete: false,
});
this.numChunks += 1;
//Callback to the decodeAudioData function
const decodeCallback = (buffer) => {
var shouldExecute = false;
const trackIndex = this.chunkTracker.map((e) => e.id).indexOf(chunkId);
//Checking if either it's the first chunk or the previous chunk has completed
if(trackIndex !== 0){
const prevChunk = this.chunkTracker.filter((e) => e.id === (chunkId-1))
if (prevChunk[0].complete) {
shouldExecute = true;
}
} else {
shouldExecute = true;
}
//THIS IS THE ACTUAL WEB AUDIO API STUFF
if (shouldExecute) {
if (this.freeTime === null) {
this.freeTime = this.context.currentTime
}
const source = this.context.createBufferSource();
source.buffer = buffer
source.connect(this.context.destination)
if (this.context.currentTime >= this.freeTime){
source.start()
this.freeTime = this.context.currentTime + buffer.duration
} else {
source.start(this.freeTime)
this.freeTime += buffer.duration
}
//Update the tracker of the chunks that this one is complete
this.chunkTracker[trackIndex] = {id: chunkId, complete: true}
} else {
//If the previous chunk hasn't processed yet, check again in 50ms
setTimeout((passBuffer) => {
decodeCallback(passBuffer)
},50,buffer);
}
}
decodeCallback.bind(this);
this.context.decodeAudioData(chunk,decodeCallback);
});
Any help would be appreciated, thanks!

As an outline of how my environment is set up, I am storing the music files as MP3s which I have uploaded into a MongoDB database using GridFS.
You can do this if you want, but these days we have tools like Minio, which can make this easier using more common APIs.
I then use a socket.io server to download the chunks from the MongoDB database and send them as individual emits to the front end
Don't go this route. There's no reason for the overhead of web sockets, or Socket.IO. A normal HTTP request would be fine.
where the are processed by the Web Audio API and scheduled to play.
You can't stream this way. The Web Audio API doesn't support useful streaming, unless you happened to have raw PCM chunks, which you don't.
As far as I can tell, they are all scheduled right up next to each other so I can't find a reason why there should be any sort of gap or overlap between them.
Lossy codecs aren't going to give you sample-accurate output. Especially with MP3, if you give it some arbitrary number of samples, you're going to end up with at least one full MP3 frame (~576 samples) output. The reality is that you need data ahead of the first audio frame for it to work properly. If you want to decode a stream, you need a stream to start with. You can't independently decode MP3 this way.
Fortunately, the solution also simplifies what you're doing. Simply return an HTTP stream from your server, and use an HTML audio element <audio> or new Audio(url). The browser will handle all the buffering. Just make sure your server handles range requests, and you're good to go.

Why does datatransfer between sockets takes lot of time?

I have implemented a web based client-server system. The goal is to request for an image file to server, through the socket.
Here is my code at client end. [embedded Javascript code]
<a id="downloadLnk" download="new.jpeg" style="color:red">Download as image</a>
var socket = io("ipaddress");
socket.on("image", function(info) {
if (info.image) {
var end1 = new Date().getTime();
document.getElementById("demo1").innerHTML = end1;
var img = new Image();
img.src = 'data:image/jpeg;base64,' + info.buffer;
}
function download() {
this.href = img.src;
};
downloadLnk.addEventListener('click', download, false);
});
And this is the code at server side: [node.js server, express module, fs module]
io.on('connection', function(socket){
var start1 = new Date().getTime();
console.log(start1);
fs.readFile(__dirname + '/aorta-high512.jpg', function(err, buf){
socket.emit('image', { image: true, buffer: buf.toString('base64') });
});
});
I am transferring a 512x512 resolution image of size 88KB and it is taking approximately one second. Similarly for a 259KB file it takes around 1.2s and 2MB file it takes 2.5s. I do not understand why it is taking so much time?
I checked the bandwidth avalable, internet speed of my network in speedtest.net. The download speed is 95.97Mbps and upload speed is 23.30Mbps.
Could you please let me know, why the transfer time of data is so slow? Is there any other method to transfer data in a faster way? I definitely know that 96Mbps is the bandwidth available but still to test I downloaded a 100Mb pdf file from internet it took approximately 12-14s. Looking at this I atleast expect faster transfer of data at the rate of atleast 2-3 Mbps.

Socket.IO supports sending/receiving binary data, so taking advantage of that will allow you to avoid expensive encoding of data.
Secondly, when generating/using a data URL in browsers you have to be careful about the URL length. Many browsers impose various limits on the maximum size of such data URLs. One possible workaround to this (not including serving the image directly via HTTP GET) could include having the server split the image into a smaller set of images, which you then use with stacked img tags to give the appearance of a single image.

Non blocking loop through binary buffer and push to socket

I am uploading a file to ftp via chrome.sockets but the socket buffer size is limited, so i need to loop through the blob and send out smaller chunks of data. I have tried several methods with closures and callbacks but the only way working for me is do/while loop, which is of course blocking. Part of the problem is multiple variables that need to be kept in the closure. Can you please suggest better way of looping through the blob?
do
{
chunk = blob.slice(start,end)
start =end
end =end + 8192
chrome.socket.write(this.info.socketId, Socket.string2buffer(chunk), function(writeInfo) {});
}
while (chunk.length>0);
Complete code of the extension (single purpose ftp manager) https://github.com/vanous/minime-content-manager/tree/master/chromium-ext-broadcast

I believe something along the lines of the following should work:
var self=this;
var writeChunk=function(start,end){
var chunk = blob.slice(start,end);
chrome.socket.write(self.info.socketId, Socket.string2buffer(chunk), function(writeInfo) {
if(chunk.length>0) writeChunk(end,end+8192);
});
};
writeChunk(0,8192);

Use FileAPI to download big generated data file

The JavaScript process generates a lot of data (200-300MB). I would like to save this data for further analysis but the best I found so far is saving using this example http://jsfiddle.net/c2U2T/ which is not an option for me, because it looks like it requires all the data being available before starting the downloading. But what I need is something like
var saver = new Saver();
saver.save(); // The Save As ... dialog appears
saver.onaccepted = function () { // user accepted saving
for (var i = 0; i < 1000000; i++) {
saver.write(Math.random());
}
};
Of course, instead of the Math.random() will be some meaningful construction.

#dader - I would build upon dader's example.
Use HTML5 FileSystem API - but instead of writing to the file each and every line (more IO than it is worth), you can batch some of the lines in memory in a javascript object/array/string, and only write it to the file when they reach a certain threshold. You are thus appending to a local file as the process chugs (makes it easy to pause/restart/stop etc)
Of note is the following, which is an example of how you can spawn the dialoge to request the amount of data that you would need (it sounds large). Tested in chrome.:
navigator.persistentStorage.queryUsageAndQuota(
function (usage, quota) {
var availableSpace = quota - usage;
var requestingQuota = args.size + usage;
if (availableSpace >= args.size) {
window.requestFileSystem(PERSISTENT, availableSpace, persistentStorageGranted, persistentStorageDenied);
} else {
navigator.persistentStorage.requestQuota(
requestingQuota, function (grantedQuota) {
window.requestFileSystem(PERSISTENT, grantedQuota - usage, persistentStorageGranted, persistentStorageDenied);
}, errorCb
);
}
}, errorCb);
When you are done you can use Javascript to open a new window with the url of that blob object that you saved which you can retrieve via: fileEntry.toURL()
OR - when it is done crunching you can just display that URL in an html link and then they could right click on it and do whatever Save Link As that they want.
But this is something that is new and cool that you can do entirely in the browser without needing to involve a server in any way at all. Side note, 200-300MB of data generated by a Javascript Process sounds absolutely huge... that would be a concern for whether you are storing the "right" data...

What you actually are trying to do is a kind of streaming. I mean FileAPI is not suited for the task. Instead, I could suggest two options :
The first, using XHR facility, ie ajax, by splitting your data into several chunks which will sequencially be sent to the server, each chunk in its own request along with an id ( for identifying the stream ) and a position index ( for identifying the chunk position ). I won't recommend that, since it adds work to break up and reassemble data, and since there's a better solution.
The second way of achieving this is to use Websocket API. It allows you to send data sequentially to the server as it is generated. Following a usual stream API. I think you definitely need this.
This page may be a good place to start at : http://binaryjs.com/
That's all folks !
EDIT considering your comment :
I'm not sure to perfectly get your point though but, what about HTML5's FileSystem API ?
There are a couple examples here : http://www.html5rocks.com/en/tutorials/file/filesystem/ among which this sample that allows you to append data to an existant file. You can also create a new file, etc. :
function onInitFs(fs) {
fs.root.getFile('log.txt', {create: false}, function(fileEntry) {
// Create a FileWriter object for our FileEntry (log.txt).
fileEntry.createWriter(function(fileWriter) {
fileWriter.seek(fileWriter.length); // Start write position at EOF.
// Create a new Blob and write it to log.txt.
var blob = new Blob(['Hello World'], {type: 'text/plain'});
fileWriter.write(blob);
}, errorHandler);
}, errorHandler);
}
EDIT 2 :
What you're trying to do is not possible using javascript as said on SO here. Tha author nonetheless suggest to use Java Applet to achieve needed behaviour.
To put it in a nutshell, HTML5 Filesystem API only provides a sandboxed filesystem, ie located in some hidden directory of the browser. So if you want to access the true filesystem, using java would be just fine considering your use case. I guess there is an interface between java and javascript here.
But if you want to make your data only available from the browser ( constrained by same origin policy ), use FileSystem API.

We Keep Coding

JavaScript is the programming language of the Web.

Large file upload with WebSocket - javascript

You could use https://github.com/binaryjs/binaryjs or https://github.com/liamks/Delivery.js if you can run node.js on the server.

I think this socket.io project has a lot of potential: https://github.com/sffc/socketio-file-upload It supports chunked upload, progress tracking and seems fairly easy to use.

Related

Is AudioContext / getChannelData deterministic?

Playback when using Web Audio API skips at the beginning of every chunk

Why does datatransfer between sockets takes lot of time?

Non blocking loop through binary buffer and push to socket

Use FileAPI to download big generated data file

Categories

Resources