I have been throwing an idea around that requires comparison of client microphone input and an existing wav file.
I would like to compare audio waves, client side if possible, returning a percentage of accuracy and was hoping this could be accomplished with the new HTML5 getUserMedia API. However I have not been able to find a viable solution thus far.
A prime example would be a graphical representation of an analogue tuner, I would like to compare mic input to audio recordings of different notes and keys.
Can this be achieved client side via JavaScript? And if not, are there any APIs out there that are already doing this?
Related
I've seen things like waveform.js which uses the Web Audio API to display waveform data, and there are many other tools out there which are able to analyze the exact sound points of an audio file in JavaScript.
If so, it should be possible to use this power of analyzation to use for real-time lip syncing using JavaScript, i.e., to get an animated character to speak at the same time the user is speaking, by simply using an audio context, and reading the data-points some how to find the right sounds.
So the question becomes, more specifically:
How exactly do I analyze audio data to extract what exact sounds are made at specific timestamps?
I want to get the end result of something like Rhubarb Lip Sync, except with JavaScript, and in real time. It doesn't have to be exact, but as close as possible.
There is no algorithm that allows you to detect phonemes correctly 100% of the time.
You didn't say whether this was for real-time use or for offline use, but that would strongly affect which algorithm you'd use.
An algorithm based on mel frequency cepstral coefficients would be expected to give you about 80% accuracy, which would be good enough for video games or the like.
Deep learning systems based on covolutional neural nets would give you excellent recognition, but they are not real time systems (yet).
You could maybe start with Meyda, for example, and compare the audio features of the signal you're listening to, with a human-cataloged library of audio features for each phoneme.
So here is my problem. I want to play audio from nodejs running on a raspberry Pi and then adjust the brightness of an LED strip also connected to the same PI based on the frequency readings from the audio file. However I can't seem to find anything in node that gives the same functionality as the WebAudio API AnalyserNode.
I found a few libraries (https://www.npmjs.com/package/audio-render) that come close and are based on Web Audio API but the frequency values it produces are completely incorrect. I verified this by comparing it to a browser version I created using the Web Audio API.
I need the audio to play from node while also being analyzed to affect the brightness levels.
Any help would be appreciated. I really thought this would be simpler to handle in node but 6 hours later and I'm still without a solution.
Victor Dibiya at IBM has an excellent example that illustrates how to use the web-audio-api module to decode an audio file into a buffer array of PCM data from which one can extract amplitude data from sound files and infer beats:
https://github.com/victordibia/beats
I have this working on a Raspberry Pi with LEDs controlled via Fadecandy.
it is possible to access different microphones at the same time using getUserMedia()?
This whould be useful to
filter out background noise;
create some sort of stereoscopic effect;
make available multiple audio tracks for an international
streaming conference.
Apparently, it is quite tricky for video source:
Capture video from several webcams with getUserMedia
I was wondering if, for the audio source, the problem was different.
You should be able to do this but I imagine the browser support will let you down somewhere along the line.
You should be able to create several media sources by specifying a microphone ID when using getUserMedia. You can find the IDs of all connected media devices using MediaDevices.enumerateDevices()
Once you have two separate microphone inputs you should be able to get the data using an AudioContext.
Then it's a case of doing what ever you're doing to the bit data before it's output to the browser.
This is all very high level and the details of the actual implementation would probably take quite a long time to figure out but as for your question: Yes, it should be possible if browser support is there.
The Web Audio API seems cool, but I'd really like to use it to processes audio files and then save them as wav files again, and I don't really need to listen to them while they are processing. Is this possible? Is there something like encodeAudioData() to turn the audio buffer back into an ArrayBuffer so I could put it back in a file?
Edit: recorderJS seems almost perfect, but it only outputs 16-bit wavs. Any chance there is something that can do pro-audio formats (24-bit or 32-bit float)?
In Web Audio API specification, there is the Offline Context which does exactly what you need.
OfflineAudioContext is a particular type of AudioContext for rendering/mixing-down (potentially) faster than real-time. It does not render to the audio hardware, but instead renders as quickly as possible, calling a completion event handler with the result provided as an AudioBuffer.
Im trying to use html5 and javascript to get the amplitude (and other components) of an mp3. Any libraries that would help?
First you need to divide the problem to real-time playback and non-linear amplitude, etc. access.
For real-time playback you can use Web Audio API
https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html
Example for beats
https://beatdetektor.svn.sourceforge.net/svnroot/beatdetektor/trunk/core/js/beatdetektor.js
For non-linear, non-real-time access, there are two ways
If you allow server-side processing you can write your proxy sending the data to Echo Nest servers and retrieve information via Echo Next Remix API
Extracting beats out of MP3 music with Python
If you want to avoid server-side processing at all you need to decode the MP3 in pure Javascript to get the access to the raw audio data in non-real-time fashion
https://github.com/devongovett/mp3.js
Then you need to apply the necessary filters on the raw audio data to extract the information you need on it. This is a problem of signal processing and not directly connected to Javascript programming. If you specify more carefully what kind of data you are after people might help you with related Javascript libraries, like ones for fast fourier transform.