FFT analysis with JavaScript: How to find a specific frequency? - javascript

I've the following problem:
I analyse audio data using javascript and FFT. I can already write the FFT data into an array:
audioCtx = new AudioContext();
analyser = audioCtx.createAnalyser();
source = audioCtx.createMediaElementSource(audio);
source.connect(analyser);
analyser.connect(audioCtx.destination);
analyser.fftSize = 64;
var frequencyData = new Uint8Array(analyser.frequencyBinCount);
Every time I want to have new data I call:
analyser.getByteFrequencyData(frequencyData);
The variable "audio" is a mp3 file defined in HTML:
<audio id="audio" src="test.mp3"></audio>
So far so good.
My problem now is that I want to check if the current array "frequencyData" includes a specific frequency. For example: I place a 1000 Hz signal somewhere in the mp3 file and want to get a notification if this part of the mp3 file is currently in the array "frequencyData".
In a first step it would help me to solve the problem when the important part of the mp3 file only contains a 1000 Hz signal. In a second step I would also like to find the part if there is an overlay with music.

frequencyData is an array of amplitudes and each element of the array basically represents a range of frequencies. The size of each range is defined by the sample rate divided by the number of FFT points, 64 in your case. So if your sample rate was 48000 and your FFT size is 64 then each element covers a range of 48000/64 = 750 Hz. That means frequencyData[0] are the frequencies 0Hz-750Hz, frequencyData[1] is 750Hz-1500Hz, and so on. In this example the presence of a 1kHz tone would be seen as a peak in the first bin. Also, with such a small FFT you probably noticed that the resolution is very coarse. If you want to increase the frequency resolution you'll need to do a larger FFT.

Related

What time window does the data in AnalyserNode.getByteFrequencyData correspond to?

I would like to capture fourier transform of a small time window (ie ~1 second).
let audioCtx = new AudioContext();
let analyser = audioCtx.createAnalyser()
let buffer = new Uint8Array(analyser.frequencyBinCount);
//given an audio stream 'stream'
let source = audioCtx.createMediaStreamSource(stream);
setInterval(()=>{
analyser.getByteFrequencyData(buffer);
//Do some analysis with buffer...
},1000)
Using WebAPI's AnalyserNode I can get fourier data in bins of width 44100/analyser.fftSize = 22050/analyser.frequencyBinCount, allowing for specifying the resolution by setting analyser.fftSize. What is unclear to me from the documentation is how the time window is set.
From the API:
Rendering an audio graph is done in blocks of 128 samples-frames. A block of 128 samples-frames is called a render quantum, and the render quantum size is 128.
Does this imply a time window of 128/44100Hz=2.9ms?
Rendering an audio graph is done in blocks of 128 samples-frames. A block of 128 samples-frames is called a render quantum, and the render quantum size is 128.
Does this imply a time window of 128/44100Hz=2.9ms?
Not quite. The render quantum size is the number of sample frames that gets processed at a time by the render loop, but it does not prevent nodes from accumulating additional samples in buffers. In the particular case of the AnalyserNode, the last fftSize samples are kept for the FFT computation. So the time window is effectively analyser.fftSize/sampleRate, where sampleRate is your configured sample rate (which may be 44100, but could vary depending on the output device). To capture ~1 second of audio, assuming a sampling rate of 44100Hz you would thus need fftSize = 32768 (which would result in a time window of ~0.74 second).

Create custom oscillator with long periodic wave

I'm trying to create a complex periodic sound with long period. I want to define frequencies as accurately as I can, so I'm using step sampleRate*0.5/tableLen. But I have some issues with large wave tables. The sound becomes distorted and loses high frequencies.
Here is a minimal example with ~440 Hz sine wave. When I use table with length 8192, the resulting sine wave is quite recognizable:
https://jsfiddle.net/zxqzntf0/
var gAudioCtx = new AudioContext();
var osc = gAudioCtx.createOscillator();
var tableSize = 8192;
var real = new Float32Array(tableSize);
var imag = new Float32Array(tableSize);
var freq = 440;
var step = gAudioCtx.sampleRate*0.5/tableSize;
real[Math.floor(freq/step)] = 1;
var wave = gAudioCtx.createPeriodicWave(real, imag, {disableNormalization: true});
osc.frequency.value = step;
osc.setPeriodicWave(wave);
osc.connect(gAudioCtx.destination);
osc.start();
But when I increase my table size, I'm getting something strange. Result is not a sine wave at all!
https://jsfiddle.net/0cc75nnm/
This problem reproduces in all browsers (Chrome, Firefox, Edge), so it doesn't seem to be a browser bug. But I've found nothing about this in documentation.
Added
I found that if oscillator frequency is a whole number >= 2 Hz, I have no any artifacts in resulting sound with table size 16384. I think it is quite acceptable for my needs for now. But someday I may want to create longer periods. If someone explains me why I get sound artifacts when step is less than 2 Hz, I will accept his answer.
There is an example of a complex sound melody that I generate in JavaScript:
https://jsfiddle.net/h9rfzrnL/1/
You're creating you periodic waves incorrectly. When filling the arrays for the periodic wave, assume the sample rate is 1. Then if you want an oscillator at a frequency of 440 Hz, set the oscillator frequency to 440 Hz.
Thus, for a sine wave, the real array should be all zeroes and the imaginary array is [0, 1]. (You're actually creating a cosine wave, but that doesn't really matter too much.)

WebAudio sounds from wave point

Suppose that I make a simple canvas drawing app like this:
I now have a series of points. How can I feed them to some of the WebAudio objects (an oscillator or a sound make from a byte array or something) to actually generate and play a wave out of them (in this case a sine-like wave)? What is the theory behind it?
If you have the data from your graph in an array, y, you can do something like
var buffer = context.createBuffer(1, y.length, context.sampleRate);
buffer.copyToChannel(y);
var src = context.createBufferSource();
src.buffer = buffer;
src.start()
You may need to set the sample rate in context.createBuffer to something other than context.sampleRate, depending on the data from your graph.

speex splitted audio data - WebAudio - VOIP

Im running a little app that encodes and decodes an audio array with the speex codec in javascript: https://github.com/dbieber/audiorecorder
with a small array filled with a sin waveform
for(var i=0;i<16384;i++)
data.push(Math.sin(i/10));
this works. But I want to build a VOIP application and have more than one array. So if I split my array up in 2 parts encode>decode>merge, it doesn't sound the same as before.
Take a look at this:
fiddle: http://jsfiddle.net/exh63zqL/
Both buttons should give the same audio output.
How can i get the same output in both ways ? Is their a special mode in speex.js for split audio data?
Speex is a lossy codec, so the output is only an approximation of your initial sine wave.
Your sine frequency is about 7 KHz, which is near the upper codec 8KHz bandwith and as such even more likely to be altered.
What the codec outputs looks like a comb of dirach pulses that will sound like your initial sinusoid as heard through a phone, which is certainly different from the original.
See this fiddle where you can listen to what the codec makes of your original sine waves, be them split in half or not.
//Generate a continus sinus in 2 arrays
var len = 16384;
var buffer1 = [];
var buffer2 = [];
var buffer = [];
for(var i=0;i<len;i++){
buffer.push(Math.sin(i/10));
if(i < len/2)
buffer1.push(Math.sin(i/10));
else
buffer2.push(Math.sin(i/10));
}
//Encode and decode both arrays seperatly
var en = Codec.encode(buffer1);
var dec1 = Codec.decode(en);
var en = Codec.encode(buffer2);
var dec2 = Codec.decode(en);
//Merge the arrays to 1 output array
var merge = [];
for(var i in dec1)
merge.push(dec1[i]);
for(var i in dec2)
merge.push(dec2[i]);
//encode and decode the whole array
var en = Codec.encode(buffer);
var dec = Codec.decode(en);
//-----------------
//Down under is only for playing the 2 different arrays
//-------------------
var audioCtx = new window.AudioContext || new window.webkitAudioContext;
function play (sound)
{
var audioBuffer = audioCtx.createBuffer(1, sound.length, 44100);
var bufferData = audioBuffer.getChannelData(0);
bufferData.set(sound);
var source = audioCtx.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioCtx.destination);
source.start();
}
$("#o").click(function() { play(dec); });
$("#c1").click(function() { play(dec1); });
$("#c2").click(function() { play(dec2); });
$("#m").click(function() { play(merge); });
If you merge the two half signal decoder outputs, you will hear an additional click due to the abrupt transition from one signal to the other, sounding basically like a relay commutation.
To avoid that you would have to smooth the values around the merging point of your two buffers.
Note that Speex is a lossy codec. So, by definition, it can't give same result as the encoded buffer. Besides, it designed to be a codec for voice. So the 1-2 kHz range will be the most efficient as it expects a specific form of signal. In some way, it can be compared to JPEG technology for raster images.
I've modified slightly your jsfiddle example so you can play with different parameters and compare results. Just providing a simple sinusoid with an unknown frequency is not a proper way to check a codec. However, in the example you can see different impact on the initial signal at different frequency.
buffer1.push(Math.sin(2*Math.PI*i*frequency/sampleRate));
I think you should build an example with a recorded voice and compare results in this case. It would be more proper.
In general to get the idea in detail you would have to examine digital signal processing. I can't even provide a proper link since it is a whole science and it is mathematically intensive. (the only proper book for reading I know is in Russian). If anyone here with strong mathematics background can share proper literature for this case I would appreciate.
EDIT: as mentioned by Kuroi Neko, there is a trouble with the boundaries of the buffer. And seems like it is impossible to save decoder state as mentioned in this post, because the library in use doesn't support it. If you look at the source code you see that they use a third party speex codec and do not provide full access to it's features. I think the best approach would be to find a decent library for speex that supports state recovery similar to this

Why the size of the analysis array needs to be half of the fftSize aka frequencyBinCount

Trying to make sense of WebAudioAPI's spec.
What is the reason that we are using the frequencyBinCount and not the fftSize for the size of the analysis array when getting the frequency data?
And should we use frequencyBinCount or the fftSize for the size of the array when getting the time domain data?
And the last question. In the spec, it is mentioned that if we pass a larger sized array than the frequencyBinCount the excess elements will be ignored, but what if you pass it a smaller array?
So:
var analyser = new context.createAnalyser();
analyser.fftSize = 1024;
//should fft.size be used?
//or frequency.binCount
//what happens if the size is smaller than fftSize?
var timeArray = new Float32Array(analyser.fftSize);
//why are we using frequencyBinCount and not fftSize?
var freqArray = new Uint8Array(analyser.frequencyBinCount);
analyser.getByteFrequencyData(freqArray);
analyser.getFloatTimeDomainData(timeArray);
It's true that, generally, an FFT of size N will give you N frequency bins. When you're analyzing "real" signals, though, half of these bins will be redundant. Specifically, the first half of the FFT will mirror the second half: bins [2..(N/2)+1] will equal bins [N..(N/2)+1]. Since all audio signals are "real", this symmetry property will hold for any FFT you do in the Web Audio API. The result will only contain N/2 unique values.
In other words, the analysis array has size N/2 because that's the size of the result. A larger array would be wasteful.
A more rigorous discussion of FFT symmetry is here: https://dsp.stackexchange.com/questions/4825/why-is-the-fft-mirrored

Categories