Split stereo audio file Into AudioNodes for each channel - javascript

How can i split a stereo audio file (I'm currently working with a WAV, but I'm interested in how to do it for MP3 as well, if that's different) into left and right channels to feed into two separate Fast Fourier Transforms (FFT) from the P5.sound.js library.
I've written out what I think I need to be doing below in the code, but I haven't been able to find examples of anyone doing this through Google searches and all my layman's attempts are turning up nothing.
I'll share what I have below, but in all honesty, it's not much. Everything in question would go in the setup function where I've made a note:
//variable for the p5 sound object
var sound = null;
var playing = false;
function preload(){
sound = loadSound('assets/leftRight.wav');
}
function setup(){
createCanvas(windowWidth, windowHeight);
background(0);
// I need to do something here to split the audio and return a AudioNode for just
// the left stereo channel. I have a feeling it's something like
// feeding audio.getBlob() to a FileReader() and some manipulation and then converting
// the result of FileReader() to a web audio API source node and feeding that into
// fft.setInput() like justTheLeftChannel is below, but I'm not understanding how to work
// with javascript audio methods and createChannelSplitter() and the attempts I've made
// have just turned up nothing.
fft = new p5.FFT();
fft.setInput(justTheLeftChannel);
}
function draw(){
sound.pan(-1)
background(0);
push();
noFill();
stroke(255, 0, 0);
strokeWeight(2);
beginShape();
//calculate the waveform from the fft.
var wave = fft.waveform();
for (var i = 0; i < wave.length; i++){
//for each element of the waveform map it to screen
//coordinates and make a new vertex at the point.
var x = map(i, 0, wave.length, 0, width);
var y = map(wave[i], -1, 1, 0, height);
vertex(x, y);
}
endShape();
pop();
}
function mouseClicked(){
if (!playing){
sound.loop();
playing = true;
} else {
sound.stop();
playing = false;
}
}

Solution:
I'm not a p5.js expert, but I've worked with it enough that I figured there has to be a way to do this without the whole runaround of blobs / file reading. The docs aren't very helpful for complicated processing, so I dug around a little in the p5.Sound source code and this is what I came up with:
// left channel
sound.setBuffer([sound.buffer.getChannelData(0)]);
// right channel
sound.setBuffer([sound.buffer.getChannelData(1)]);
Here's a working example - clicking the canvas toggles between L/stereo/R audio playback and FFT visuals.
Explanation:
p5.SoundFile has a setBuffer method which can be used to modify the audio content of the sound file object in place. The function signature specifies that it accepts an array of buffer objects and if that array only has one item, it'll produce a mono source - which is already in the correct format to feed to the FFT! So how do we produce a buffer containing only one channel's data?
Throughout the source code there are examples of individual channel manipulation via sound.buffer.getChannelData(). I was wary of accessing undocumented properties at first, but it turns out that since p5.Sound uses the WebAudio API under the hood, this buffer is really just plain old WebAudio AudioBuffer, and the getChannelData method is well-documented.
The only downside of approach above is that setBuffer acts directly on the SoundFile so I'm loading the file again for each channel you want to separate, but I'm sure there's a workaround for that.
Happy splitting!

Related

How can I increase the perfomance of a real time video processing in browser?

I use P5.js to capture a video from a webcam. In the draw function, which is responsible for redrawing the canvas, I add a new captured frame to the frame array and calculate the resulting frame, at that point in time using this pixel cube (frame array) and its 3D slice function.
sketch.draw = () => {
sketch.loadPixels();
this.capture.loadPixels();
this.stack.push(this.capture.pixels);
const pixelsStack = this.stack.array;
for (let x = 0; x < this.w; x++) {
for (let y = 0; y < this.h; y++) {
const frameN = this.getFrameN(x, y, pixelsStack.length);
set(
sketch.pixels,
this.w,
4, x, y,
this.getPixel(pixelsStack, x, y, frameN)
)
}
}
sketch.updatePixels();
}
On my MacBook Pro it works well when the resolution of the cube frame is 240x180x240. When you increase the resolution of the frame, it begins to freeze, on weaker computers the site crashes (even at a small resolution), on phones it does not work for more than three seconds if it starts to work at all.
I thought the bottleneck was p5.js, but then I decided to write my update loop using requestAnimationFrame, and output the resulting image to the canvas I created, but this only slowed down my application more than 5 times.
I tried to use webgl for this purpose, but I'm very unfamiliar with this technology. I tried to use textures to store frames, but it turned out it is too much data for textures. I also tried to use webgl functions from p5, but I only managed to change the way of rendering but not to increase the performance of the frame calculating (and the problem seems to me here).
How and with what technologies can I increase the frame calculation speed?
If you're trying to capture video via webcam, you can make use of the MediaRecorder API along with getUserMedia. No need for p5 or WebGL. Here is an article showing how to use these to set up a simple video recorder: https://dev.to/twilio/an-introduction-to-the-mediarecorder-api-2k8i

how do getByteTimeDomain/FrequencyData() method work

I'm trying to develop pitch-detector using JavaScript Web Audio API. By googling, I've knew we perceive pitch by frequency so I found getByteFrequencyData() method. But I don't know how to use it correctly.
example.js
function draw() {
var img = new Image();
img.src="foo.jpg";
img.onload = function() {
ctx.drawImage(img, 0, 0);
var imgData=ctx.getImageData(0, 0, canvas.width, canvas.height);
var raster = imgData.data;
for (var i=0;i<raster.length;i++) {
if (i % 4 != 3) {
raster[i] = 255 - raster[i];
}
}
ctx.putImageData(imgData,0, 0);
}
}
As we see above, getImageData() returns very obvious, easy-to-access data. In contrast, What does a parameter "buffer" of getByteFrequencyData() save/represent/mean? How does it represent audio frequency data? How can I manipulate/handle it and develop my own program using these methods?
Thanks.
The spec entry for getByteFrequencyData tells you exactly what it is. The analyser node determines the frequency content in a set of bins where the value of each bin is the magnitude of that frequency component. getByteFrequencyData just converts that dB and then scales the values to the range of 0 to 255.
I generally recommend people to use getFloatFrequencyData() first because I think it's a bit easier to understand without having to deal with the scaling.

WebAudio - Oscillator setPeridiocWave

I create three different linear chirps using the code found here on SO. With some other code snippets I save those three sounds as separate .wav files. This works so far.
Now I want to play those three sounds at the exact same time. So I thought I could use the WebAudio API, feed three oscillator nodes with the float arrays I got from the code above.
But I don't get at least one oscillator node to play its sound.
My code so far (shrinked for one oscillator)
var osc = audioCtx.createOscillator();
var sineData = linearChirp(freq, (freq + signalLength), signalLength, audioCtx.sampleRate); // linearChirp from link above
// sine values; add 0 at the front because the docs states that the first value is ignored
var imag = Float32Array.from(sineData.unshift(0));
var real = new Float32Array(imag.length); // cos values
var customWave = audioCtx.createPeriodicWave(real, imag);
osc.setPeriodicWave(customWave);
osc.start();
At the moment I suppose that I do not quite understand the whole the math behind the peridioc wave.
The code that plays the three sounds at the same time works (with simple sin values in the oscillator nodes), so I assume that the problem is my peridioc wave.
Another question: is there a different way? Maybe like using three MediaElementAudioSourceNode that are linked to my three .wav files. I don't see a way to play them at the exact same time.
The PeriodicWave isn't a "stick a waveform in here and it will be used as a single oscillation" feature - it builds a waveform through specifying the relative strengths of various harmonics. Note that in that code you pointed to, they create a BufferSource node and point its .buffer to the results of linearchirp(). You can do that, too - just use BufferSource nodes to play back the linearshirp() outputs, which (I think?) are just sine waves anyway? (If so, you could just use an oscillator and skip that whole messy "create a buffer" bit.)
If you just want to play back the buffers you've created, use BufferSource. If you want to create complex harmonics, use PeriodicWave. If you've created a single-cycle waveform and you want to play it back as a source waveform, use BufferSource and loop it.

WebAudio sounds from wave point

Suppose that I make a simple canvas drawing app like this:
I now have a series of points. How can I feed them to some of the WebAudio objects (an oscillator or a sound make from a byte array or something) to actually generate and play a wave out of them (in this case a sine-like wave)? What is the theory behind it?
If you have the data from your graph in an array, y, you can do something like
var buffer = context.createBuffer(1, y.length, context.sampleRate);
buffer.copyToChannel(y);
var src = context.createBufferSource();
src.buffer = buffer;
src.start()
You may need to set the sample rate in context.createBuffer to something other than context.sampleRate, depending on the data from your graph.

speex splitted audio data - WebAudio - VOIP

Im running a little app that encodes and decodes an audio array with the speex codec in javascript: https://github.com/dbieber/audiorecorder
with a small array filled with a sin waveform
for(var i=0;i<16384;i++)
data.push(Math.sin(i/10));
this works. But I want to build a VOIP application and have more than one array. So if I split my array up in 2 parts encode>decode>merge, it doesn't sound the same as before.
Take a look at this:
fiddle: http://jsfiddle.net/exh63zqL/
Both buttons should give the same audio output.
How can i get the same output in both ways ? Is their a special mode in speex.js for split audio data?
Speex is a lossy codec, so the output is only an approximation of your initial sine wave.
Your sine frequency is about 7 KHz, which is near the upper codec 8KHz bandwith and as such even more likely to be altered.
What the codec outputs looks like a comb of dirach pulses that will sound like your initial sinusoid as heard through a phone, which is certainly different from the original.
See this fiddle where you can listen to what the codec makes of your original sine waves, be them split in half or not.
//Generate a continus sinus in 2 arrays
var len = 16384;
var buffer1 = [];
var buffer2 = [];
var buffer = [];
for(var i=0;i<len;i++){
buffer.push(Math.sin(i/10));
if(i < len/2)
buffer1.push(Math.sin(i/10));
else
buffer2.push(Math.sin(i/10));
}
//Encode and decode both arrays seperatly
var en = Codec.encode(buffer1);
var dec1 = Codec.decode(en);
var en = Codec.encode(buffer2);
var dec2 = Codec.decode(en);
//Merge the arrays to 1 output array
var merge = [];
for(var i in dec1)
merge.push(dec1[i]);
for(var i in dec2)
merge.push(dec2[i]);
//encode and decode the whole array
var en = Codec.encode(buffer);
var dec = Codec.decode(en);
//-----------------
//Down under is only for playing the 2 different arrays
//-------------------
var audioCtx = new window.AudioContext || new window.webkitAudioContext;
function play (sound)
{
var audioBuffer = audioCtx.createBuffer(1, sound.length, 44100);
var bufferData = audioBuffer.getChannelData(0);
bufferData.set(sound);
var source = audioCtx.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioCtx.destination);
source.start();
}
$("#o").click(function() { play(dec); });
$("#c1").click(function() { play(dec1); });
$("#c2").click(function() { play(dec2); });
$("#m").click(function() { play(merge); });
If you merge the two half signal decoder outputs, you will hear an additional click due to the abrupt transition from one signal to the other, sounding basically like a relay commutation.
To avoid that you would have to smooth the values around the merging point of your two buffers.
Note that Speex is a lossy codec. So, by definition, it can't give same result as the encoded buffer. Besides, it designed to be a codec for voice. So the 1-2 kHz range will be the most efficient as it expects a specific form of signal. In some way, it can be compared to JPEG technology for raster images.
I've modified slightly your jsfiddle example so you can play with different parameters and compare results. Just providing a simple sinusoid with an unknown frequency is not a proper way to check a codec. However, in the example you can see different impact on the initial signal at different frequency.
buffer1.push(Math.sin(2*Math.PI*i*frequency/sampleRate));
I think you should build an example with a recorded voice and compare results in this case. It would be more proper.
In general to get the idea in detail you would have to examine digital signal processing. I can't even provide a proper link since it is a whole science and it is mathematically intensive. (the only proper book for reading I know is in Russian). If anyone here with strong mathematics background can share proper literature for this case I would appreciate.
EDIT: as mentioned by Kuroi Neko, there is a trouble with the boundaries of the buffer. And seems like it is impossible to save decoder state as mentioned in this post, because the library in use doesn't support it. If you look at the source code you see that they use a third party speex codec and do not provide full access to it's features. I think the best approach would be to find a decent library for speex that supports state recovery similar to this

Categories