The Web Audio API seems cool, but I'd really like to use it to processes audio files and then save them as wav files again, and I don't really need to listen to them while they are processing. Is this possible? Is there something like encodeAudioData() to turn the audio buffer back into an ArrayBuffer so I could put it back in a file?
Edit: recorderJS seems almost perfect, but it only outputs 16-bit wavs. Any chance there is something that can do pro-audio formats (24-bit or 32-bit float)?
In Web Audio API specification, there is the Offline Context which does exactly what you need.
OfflineAudioContext is a particular type of AudioContext for rendering/mixing-down (potentially) faster than real-time. It does not render to the audio hardware, but instead renders as quickly as possible, calling a completion event handler with the result provided as an AudioBuffer.
Related
Everything is being done in the front end.
My goal is to be able to create an audio track, in real time, and play it instantly for a user. The file would be roughly 10 minutes. However, the files are very simple, mostly silence, with a few sound clips (the sound clip is 2kb) sprinkled around. So the process for generating the data (the raw bytes) is very simple, it's either write the 2kb sound clip or place n amount of 00 for the silence. It's just that for 10 minutes. But instead of generating the entire file fully, and then playing it, I would like to stream the audio, ideally I would be generating more and more of the file while the audio was playing. It would prevent any noticeable delay between when the user clicks play and when the audio starts playing. The process of creating the file can take anywhere from 20 milliseconds to 500 milliseconds, different files are created based off user input.
The only problem is: I have no idea how to do this. I've read ideas about using web sockets, but that seems like the data would come from the server, I see no reason why to bother a server with this when the JavaScript can easily generate the audio data on its own.
I've been researching and experimenting with the Web Audio API and the Media Streams API for the past several hours, and I keep going in circles and I'm totally confused by it. I'm starting to think that these API are meant to be used for gathering data from a users mic or webcam, and not fed data directly from a readable stream.
Is what I want to do possible? Can it be achieved using something like a MediaStreamAudioSourceNode or is there another simpler way that I haven't noticed?
Any help on this topic would be so greatly appreciated. Examples of a simple working version would be even more appreciated. Thanks!
I'm going to follow this question, because a true streaming solution would be very nice to know about. My experience is limited to using WebAudio API to play to two sounds with a given pause in between them. The data is actually generated at the server and downloaded using Thymeleaf, into two javascript variables that hold the PCM data to be played. But this data could easily have been generated at the client itself via Javascript.
The following is not great, but almost could be workable, given that there are extensive silences. I'm thinking, manage an ordered FIFO queue with the variable name and some sort of timing value for when you want the associated audio played, and have a function that periodically polls the queue and loads commands into javascript setTimeout methods with the delay amount calculated based on the timing values given in the queue.
For the one limited app I have, the button calls the following (where I wrote a method that plays the sound held in the javascript variable)
playTone(pcmData1);
setTimeout(() => playTone(pcmData2), 3500);
I have the luxury of knowing that pcmData1 is 2 seconds long, and a fixed pause interval between the two sounds. I also am not counting on significant timing accuracy. For your continuous playback tool, it would just have the setTimeout part with values for the pcmData variable and the timing obtained from the scheduling FIFO queue.
Whether this is helpful and triggers a useful idea, IDK. Hopefully, someone with more experience will show us how to stream data on the fly. This is certainly something that can be easily done with Java, using it's SourceDataLine class which has useful blocking-queue aspects, but I haven't located a Javascript equivalent yet.
I am trying to trim leading and trailing silence from an audio file recorded in browser before I send it off to be stored by the server.
I have been looking for examples to better understand the WebAudioApi but examples are scattered and cover depricated methods like the "ScriptProcessorNode" I thought I was close when I found this example
HTML Audio recording until silence?
which I was eager to see at least silence being processed, which I think I can use to subsequently trim. However after loading the example in a sandbox it does not appear to detect silence in a way that I can understand.
If anyone has any help or advice it would be greatly appreciated!
While ScriptProcessorNode is deprecated, it's not going away any time soon. You should use AudioWorkletNode if you can (but not all browsers support that).
But since you have the recorded audio in a file, I would decode it using decodeAudioData to get an AudioBuffer. Then use getChannelData(n) to get a Float32Array for the n'th channel. Analyze this array however you want to determine where the silence at the beginning ends and the silence at the end begins. Do this for each n.
Now you know where the non-silent part is. WebAudio has no way of encoding this audio, so you'll either have to do your own encoding, or maybe get MediaRecorder to encode this so you can send it off to your server.
I'm beginner with Web Audio and wanted to know, if compressed (AAC, MP3, OGG) sound effects or especially music tracks are expanded in memory into megabytes, like textures in WebGL?
For example if I have an audio file with some music and file size is around ~3 MB (in any of the formats: MP3, OGG, AAC), will the file be expanded (decompressed) into 70 MB for playback like the original PCM format would take?
Also, is it possible to estimate how much memory an audio file uses when using Web Audio, with regular playback, without additional sound effects of the Web Audio's more advanced nodes.
Currently, to use a compressed audio file you must load the compressed audio file into memory and then use decodeAudioData to convert the compressed file to an AudioBuffer, consisting of float arrays internally---essentially PCM. (Firefox, however, has an optimization where, in many cases, it can use arrays of 16-bit integers instead of floats.)
If you use the number of channels and the duration of an AudioBuffer, you can get a pretty good estimate of the memory used.
If this is not suitable for you uses, the only alternative is to use a MediaElementAudioSourceNode and friends to stream the compressed file to the browser. You lose sample-accurate control of the source, however.
I've discovered (at least in chrome) web audio resamples wav files to 48k when using decodeAudioData. Any way to prevent this and force it to use the file's original sample rate? I'm sure this is fine for game development, but i'm trying to write some audio editing tools and this sort of thing isn't cool. I'd like to be fully in control of when/if resampling occurs.
As far as I know, you're just going to get whatever sampling rate your AudioContext is using (which will be determined by your sound card, I believe).
They lay out the steps here: https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html#dfn-decodeAudioData
"The decoding thread will take the result, representing the decoded linear PCM audio data, and resample it to the sample-rate of the AudioContext if it is different from the sample-rate of audioData. The final result (after possibly sample-rate converting) will be stored in an AudioBuffer."
Nope, you can't prevent the resampling of decodeAudioData into the AudioContext's sampleRate. Load and create AudioBuffers yourself, or decode the file into a buffer in an OfflineAudioContext that is fixed to the rate it was originally set to (although it's going to be hard to tell what that is, I imagine).
There is discussion on this point - https://github.com/WebAudio/web-audio-api/issues/30.
There is now an webcomponent for loading audio using sox :
https://www.npmjs.com/package/sox-element
It allows you to decode audio at the original sample rate. The data is unaltered form the original.
I am getting 16 bit audio from server and currently sending it from server.
It is interleaved.
That means I need to loop and separate right and left into 2 32 bit arrays in javascript.
Well this is just too slow for javascript to execute and schedule the play time. things get out of sync. This is a live stream. So web api seems to be implemented for only local syths and such. Streaming pcm does not seem to be a good approach.I know that you would never send PCM to begin with. Well say i wanna send vorbis or something similar. They have to be in a container like .ogg or webm or something but browsers have their internal buffering and we have very little /no control.
so O tried sending ADPCM and decoding it to PCM on client in Javascript. That is also too slow.
If i send my data and preprocess it on server. Uninterleave onserver and convert it to 32 bit floats and send it to client. The data size doubles. 16 bit to 32 bit.
So what is the best way to render 16 audio with out processing on client side.
Also can you play audio from worker threads. Would implementing the conversions in a worker thread help. I mean there is all that websocket communication going on and JS is single threaded.
I would also Like to add that doing the computation on chrome on mac pro works (much better, I almost hear no glitch between samples) when I compare it to a client running on a pc,
No, you cannot currently play audio from Worker threads. However, I really doubt that your problem is in the cost of de-interleaving audio data; have you tried just sending a mono stream? Properly synchronizing and buffering a stream in a real network environment is quite complex.