Getting Raw PCM data from webAudio / mozAudio

Getting Raw PCM data from webAudio / mozAudio - javascript

I am trying to save the output from webAudio API for future use , so far i think getting PCM data and saving it as a file will do my expectation , I am wondering if the webAudio or mozAudio already supports saving the output stream if not how can i get the pcm data from the output stream

There isn't a good sense of the requirements here outside of attempting to capture web audio in some programmatic way. The presumption here is you want to do this from code executing in JavaScript on the page that's currently being browsed, but that also isn't entirely clear.
As Incognito points out, you can do this in Chrome by using a callback hanging off decodeAudioData(). But, this may be overly complicated for your uses if you're simply trying to capture, for example, the output of a single web stream and decode it into PCM for use in your sound tools of choice.
Another strategy you might consider, for cases when the media URL is obscured or otherwise difficult to decode using your current tools, is capture from your underlying sound card. This gives you the decoding for free, at the potential expense of a lower sampling rate if (and only if) your sound card isn't able to sample the stream effectively.
As we know, you're already encoding analog signals digitally anyway via your desire for PCM encoding. Obviously, only do this if you have the legal right to use the files being sampled.
Regardless of the route you choose, best of luck to you. Be it programmatic stream dissection or spot sampling, you should now have more than enough information to proceed.
Edit: Based on additional information from the OP, this seems like the needed solution (merged from here and here, using NodeJS' implementation of fs):
var fs = require('fs');
function saveAudio(data, saveLocation) {
var context = new (window.AudioContext || window.webkitAudioContext)();
var source = context.createBufferSource();
if(context.decodeAudioData) {
context.decodeAudioData(data, function(buffer) {
fs.writeFile(saveLocation, buffer, function (err) {
if (err) throw err;
console.log('It\'s saved!');
});
}, function(e) {
console.log(e);
});
} else {
var buffer = context.createBuffer(data, false /*mixToMono*/);
fs.writeFile(saveLocation, buffer, function (err) {
if (err) throw err;
console.log('It\'s saved!');
});
}
}
(Warning: untested code. If this doesn't work, edits are welcome.)
This effectively spools out decodeAudioData from the Web Audio API, decodes PCM from the supplied data, then attempts to save it to the target saveLocation. Simple enough, really.

The latest WebAudio API draft introduced the OfflineAudioContext exactly for this purpose.
You use it exactly the same way as a regular AudioContext, but with an additional startRendering() method to trigger offline rendering, as well as an oncomplete callback so that you can act upon finishing rendering.

Chrome should support it (or at the least, mostly support this new feature).
decodeAudioData()
When decodeAudioData() is finished, it calls a callback function which provides the decoded PCM audio data as an AudioBuffer
It's nearly identical to the XHR2 way of doing things, so you'll likely want to make an abstraction layer for it.
Note: I haven't tested that it works, but I only see one bug in chromium regarding this, indicating it works but fails for some files.

I think that what you are looking for can be achieved with the startRendering-function in Web Audio. I dunno if the answers above did the trick, but if they didn't - here's a little something to get you going:
https://bugs.webkit.org/show_bug.cgi?id=57676 (scroll down to comment three)
This part is still undocumented, so it's nowhere to be seen in the spec, but you can console.log the audio context to confirm that it's actually there. I've only done some preliminary test with it, but I think it should be the answer to your question.

Related

Memory leak when sending ArrayBuffers with large amounts of binary data over WebRTC in electron

I am currently working on an electron application that needs to be able to send files over a WebRTC datachannel. I am currently using PeerJS as a way to abstract WebRTC away a bit and to make developing easier.
My current implementation uses a FileReader on the sender’s side to read a file in 32 kilobyte binary chunks. Those chunks then get put into an ArrayBuffer which then gets send along with some data to tell the other side what exactly it is the sender is sending. The receiver then writes the binary data to a file. While the receiver is writing that data the sender waits for a “file-proceed” signal from to receiver. When the receiver is done the sender gets notified and sends the next chunk. This goes on until the entire file is send.
This approach works until all files send over the course of application runtime reach about 500 megabytes. This is what I believe due to a memory leak which I cannot find the root cause of. As far as I know I don’t save the objects in memory and they should be cleared by GC. Another rather unusual thing is that only the recipient of the file suffers from this problem.
There is a lot going on in my application however this is the part I think is the cause of the problem. (But feel free to ask for more code).
This is the part that is supposed to write the ArrayBuffer:
sm.writeChunk = function(arrayBuffer) {
sm.receivedBytes += sm.receivedFileMeta.chunkSize;
fs.appendFileSync(sm.downloadsFolder + path.sep + sm.receivedFileMeta.name + '.part' , new Buffer(arrayBuffer , 'binary'), 'binary', function (err) {
if (err) {
console.log(err);
}
});
sm.onAction({t:'file-progress', percent: sm.receivedBytes / sm.receivedFileMeta.size * 100});
sm.dataConnection.send({t: 'file-proceed'});
};
sm is an object that holds file-transfer related functions and variables hence the “sm.” everywhere.
I’ve tried setting the ArrayBuffer as undefined or null but nothing seems to make the object dissapear from memory. Not even after the file transfer is completed. A snapshot from the heap seems to back this up. Also removing the fs.appendFileSync function to make it not write to disk makes no difference.
Is there anything I can do to fix this issue? Or is this a problem related to PeerJS? Any help or suggestions is much appreciated!

Well it appears to be a PeerJS error after all. It appears that if you want to send packets larger than 16K PeerJS will chunk them for you. The memory problem lies with the chunking. PeerJS chunks at 16K while electron (actually chrome) can send 64K at a time. This is to keep cross-browser compatibility but since I am using strictly electron I changed the PeerJS code to not chunk my 32K packets. This resolved the issue.

What are the advantages of using a stream over `fetch()`?

I was trying to download a file using the OneDrive JS SDK, so I've used the code from Microsoft:
// Download a file from OneDrive
let fs = require('fs'); // requires filesystem module
client
.api('/me/drive/root/children/Book.xlsx/content')
.getStream((err, downloadStream) => {
if (err) {
console.log(err);
return;
}
let writeStream = fs.createWriteStream('../Book1.xlsx');
downloadStream.pipe(writeStream).on('error', console.log);
});
As I want to get it working in a browser too (not just in Node), I've first tried some stream lib for browsers but couldn't get anything working. Eventually, I got it working with just the REST API and fetch() (the SDK is a wrapper over the REST API).
A simple fetch(url) did the job. So I'm wondering, why did MS go through the trouble of all the stream code above when a single line would do the job?
In particular are the performances of streams somehow better than fetch(). For example, would fetch freezes the app when downloading large files while streams wouldn't? Are there any other differences?

Streams are more efficient, in more than one way.
You can perform processing as-you-go.
For example, if you have a series of data that you want to perform processing on and it's in a remote location using a stream will allow you to perform processing on the data as it flows, therefore you can do the processing and the download, in-parallel.
This is much more efficient than waiting for the data to download, then after it's downloaded you start processing it all in one-go.
Streams consume much less memory.
If you want to download a 1GB file without using streams you would be consuming 1GB of memory since the file is downloaded in one request, stored temporarily somewhere, e.g a variable and then you start reading off that variable to save to a file. In other words, you store all your data in a buffer before you start processing it
In contrast, a stream would be writing to the file as content comes. Imagine a stream of water flowing into a jug.
AFAIK this is the main reason that data downloads are usually handled with Streams.
That being said, in most cases - apart from file downloads and real-time stuff - it doesn't make any sense to use Streams over the usual request/response scheme.
Stream handling is generally more complex to implement and reason about.

The reason why fetch wasn't used is that it's a relatively new and experimental technology. It still needs to mature in order to gain widespread adoption.
That said, fetch DOES use streams.
You may want to profile your app using both approaches and seeing which is faster. I would suspect that fetch would work best for smaller files, whereas streams would be better for larger files.
Here is a much more detailed answer for you:
Fetch stream

how to pass large data to web workers

I am working on web workers and I am passing large amount of data to web worker, which takes a lot of time. I want to know the efficient way to send the data.
I have tried the following code:
var worker = new Worker('js2.js');
worker.postMessage( buffer,[ buffer]);
worker.postMessage(obj,[obj.mat2]);
if (buffer.byteLength) {
alert('Transferables are not supported in your browser!');
}

UPDATE
Modern versions of Chrome, Edge, and Firefox now support SharedArrayBuffers (though not safari at the time of this writing see SharedArrayBuffers on MDN), so that would be another possibility for a fast transfer of data with a different set of trade offs compared to a transferrable (you can see MDN for all the trade offs and requirements of SharedArrayBuffers).
UPDATE:
According to Mozilla the SharedArrayBuffer has been disabled in all major browsers, thus the option described in the following EDIT does no longer apply.
Note that SharedArrayBuffer was disabled by default in all major
browsers on 5 January, 2018 in response to Spectre.
EDIT: There is now another option and it is sending a sharedArray buffer. This is part of ES2017 under shared memory and atomics and is now supported in FireFox 54 Nightly. If you want to read about it you can look here. I will probably write up something some time and add it to my answer. I will try and add to the performance benchmark as well.
To answer the original question:
I am working on web workers and I am passing large amount of data to
web worker, which takes a lot of time. I want to know the efficient
way to send the data.
The alternative to #MichaelDibbets answer, his sends a copy of the object to the webworker, is using a transferrable object which is zero-copy.
It shows that you were intending to make your data transferrable, but I'm guessing it didn't work out. So I will explain what it means for some data to be transferrable for you and future readers.
Transferring objects "by reference" (although that isn't the perfect term for it as explained in the next quote) doesn't just work on any JavaScript Object. It has to be a transferrable data-type.
[With Web Workers] Most browsers implement the structured cloning
algorithm, which allows you to pass more complex types in/out of
Workers such as File, Blob, ArrayBuffer, and JSON objects. However,
when passing these types of data using postMessage(), a copy is still
made. Therefore, if you're passing a large 50MB file (for example),
there's a noticeable overhead in getting that file between the worker
and the main thread.
Structured cloning is great, but a copy can take hundreds of
milliseconds. To combat the perf hit, you can use Transferable
Objects.
With Transferable Objects, data is transferred from one context to
another. It is zero-copy, which vastly improves the performance of
sending data to a Worker. Think of it as pass-by-reference if you're
from the C/C++ world. However, unlike pass-by-reference, the 'version'
from the calling context is no longer available once transferred to
the new context. For example, when transferring an ArrayBuffer from
your main app to Worker, the original ArrayBuffer is cleared and no
longer usable. Its contents are (quiet literally) transferred to the
Worker context.
- Eric Bidelman Developer at Google, source: html5rocks
The only problem is there are only two things that are transferrable as of now. ArrayBuffer, and MessagePort. (Canvas Proxies are hopefully coming later). ArrayBuffers cannot be manipulated directly through their API and should be used to create a typed array object or a DataView to give a particular view into the buffer and be able to read and write to it.
From the html5rocks link
To use transferrable objects, use a slightly different signature of
postMessage():
worker.postMessage(arrayBuffer, [arrayBuffer]);
window.postMessage(arrayBuffer, targetOrigin, [arrayBuffer]);
The worker case, the first argument is the data and the second is the
list of items that should be transferred. The first argument doesn't
have to be an ArrayBuffer by the way. For example, it can be a JSON
object:
worker.postMessage({data: int8View, moreData: anotherBuffer}, [int8View.buffer, anotherBuffer]);
So according to that your
var worker = new Worker('js2.js');
worker.postMessage(buffer, [ buffer]);
worker.postMessage(obj, [obj.mat2]);
should be performing at great speeds and should be being transferred zero-copy. The only problem would be if your buffer or obj.mat2 is not an ArrayBuffer or transferrable. You may be confusing ArrayBuffers with a view of a typed array instead of what you should be using its buffer.
So if you have this ArrayBuffer and it's Int32 representation. (though the variable is titled view it is not a DataView, but DataView's do have a property buffer just as typed arrays do. Also at the time this was written the MDN use the name 'view' for the result of calling a typed arrays constructor so I assumed it was a good way to define it.)
var buffer = new ArrayBuffer(90000000);
var view = new Int32Array(buffer);
for(var c=0;c<view.length;c++) {
view[c]=42;
}
This is what you should not do (send the view)
worker.postMessage(view);
This is what you should do (send the ArrayBuffer)
worker.postMessage(buffer, [buffer]);
These are the results after running this test on plnkr.
Average for sending views is 144.12690000608563
Average for sending ArrayBuffers is 0.3522000042721629
EDIT: As stated by #Bergi in the comments you don't need the buffer variable at all if you have the view, because you can just send view.buffer like so
worker.postMessage(view.buffer, [view.buffer]);
Just as a side note to future readers just sending an ArrayBuffer without the last argument specifying what the ArrayBuffers are you will not send the ArrayBuffer transferrably
In other words when sending transferrables you want this:
worker.postMessage(buffer, [buffer]);
Not this:
worker.postMessage(buffer);
EDIT: And one last note since you are sending a buffer don't forget to turn your buffer back into a view once it's received by the webworker. Once it's a view you can manipulate it (read and write from it) again.
And for the bounty:
I am also interested in official size limits for firefox/chrome (not
only time limit). However answer the original question qualifies for
the bounty (;
As to a webbrowsers limit to send something of a certain size I am not completeley sure, but from that quote that entry on html5rocks by Eric Bidelman when talking about workers he did bring up a 50 mb file being transferred without using a transferrable data-type in hundreds of milliseconds and as shown through my test in a only around a millisecond using a transferrable data-type. Which 50 mb is honestly pretty large.
Purely my own opinion, but I don't believe there to be a limit on the size of the file you send on a transferrable or non-transferrable data-type other than the limits of the data type itself. Of course your biggest worry would probably be for the browser stopping long running scripts if it has to copy the whole thing and is not zero-copy and transferrable.
Hope this post helps. Honestly I knew nothing about transferrables before this, but it was fun figuring out them through some tests and through that blog post by Eric Bidelman.

I had issues with webworkers too, until I just passed a single argument to the webworker.
So instead of
worker.postMessage( buffer,[ buffer]);
worker.postMessage(obj,[obj.mat2]);
Try
var myobj = {buffer:buffer,obj:obj};
worker.postMessage(myobj);
This way I found it gets passed by reference and its insanely fast. I post back and forth over 20.000 dataelements in a single push per 5 seconds without me noticing the datatransfer.
I've been exclusively working with chrome though, so I don't know how it'll hold up in other browsers.
Update
I've done some testing for some stats.
tmp = new ArrayBuffer(90000000);
test = new Int32Array(tmp);
for(c=0;c<test.length;c++) {
test[c]=42;
}
for(c=0;c<4;c++) {
window.setTimeout(function(){
// Cloning the Array. "We" will have lost the array once its sent to the webworker.
// This is to make sure we dont have to repopulate it.
testsend = new Int32Array(test);
// marking time. sister mark is in webworker
console.log("sending at at "+window.performance.now());
// post the clone to the thread.
FieldValueCommunicator.worker.postMessage(testsend);
},1000*c);
}
results of the tests. I don't know if this falls in your category of slow or not since you did not define "slow"
sending at at 28837.418999988586
recieved at 28923.06199995801
86 ms
sending at at 212387.9840001464
recieved at 212504.72499988973
117 ms
sending at at 247635.6210000813
recieved at 247760.1259998046
125 ms
sending at at 288194.15999995545
recieved at 288304.4079998508
110 ms

It depends on how large the data is
I found this article that says, the better strategy is to pass large data to a web worker and back in small bits. In addition, it also discourages the use of ArrayBuffers.
Please have a look: https://developers.redhat.com/blog/2014/05/20/communicating-large-objects-with-web-workers-in-javascript

html5 record input buffer using recorder.js

I am using html5 web audio api in my application. Application is simple, I have
BufferSourceNode -> GainNode -> lowpass filter -> context.destination
Now I want to save the output after applying the filters. So I decided to add recorder before
context.destination. But this doesn't work, it gives some noise sound while playing the audio, though my recorder records filter effects successfully.
Am I doing it in right way or is there any better way to do this?

Two things:
1) if you are going to use the buffer anyway - even if you're not() - you might want to consider using an OfflineAudioContext (https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html#OfflineAudioContext-section). OACs can run faster than real-time, so you don't need to "record" it in real-time; you set up your nodes, call startRendering(), and the oncomplete event hands you an audiobuffer. () If you still want a .WAV file, you can pull the WAV encoding function out of Recordjs and use it to encode an arbitrary buffer.
2) That sounds like an error in your code - it should work either way, without causing extra noise. Do you have a code sample you can send me?

Streaming programmatically generated audio to browser

I'm trying write a web application that takes information from the user, generates audio on the server from that information, and then plays it back in the user's browser. I've been googling a whole bunch, and I'm kind of unsure exactly what it is that I need to do to get this to happen. What is it that programs like Icecast are doing "behind the scenes" to create these streams? I feel a little bit like I don't even know how to ask the right question or search as almost all the information I'm finding is either about serving files or assumes I know more than I do about how the server side of things works.

This question may help with how to generate music programmatically; it suggests several tools that are designed for this purpose: What's a good API for creating music via programming?
Icecast is a bit of a red herring - that is the tool to broadcast an audio stream, so really you'd be looking at feeding the output of whatever tool you use to generate your music into Icecast or something similar in order to broadcast it to the world at large. However, this is more for situations where you want a single stream to be broadcast to multiple users (e.g. internet radio). If you simply want to generate audio from user input and serve it back to that user, then this isn't necessary.
I'm aware this isn't a full answer, as the question is not fully formed, but I couldn't fit it all into a comment. Hopefully it should help others who stumble upon this question... I suspect the original question writer has moved on by now.

Just have look at Media source API( under implementation). this would be what you are required.
window.MediaSource = window.MediaSource || window.WebKitMediaSource;
var ms = new MediaSource();
var audio = document.querySelector('audio');
audio.src = window.URL.createObjectURL(ms);
ms.addEventListener('webkitsourceopen', function(e) {
...
var sourceBuffer = ms.addSourceBuffer('type; codecs="codecs"');
sourceBuffer.append(oneAudioChunk); //append chunks of data
....
}, false);

We Keep Coding

JavaScript is the programming language of the Web.