I'm using Web Audio Api to get the frequency of the dominant note which I get through a microphone.
The problem is that if I use a too-big fft, the program can't compute it, and if I use a small fft the lower frequencies cannot be correctly computed.
So I guess the solution is to do my own algorithm for the fft by parts.
However, to do that I have to treat directly the values of the ScriptProcessor buffer. Can someone tell me what are this values? I also need to know if the ScriptProcessos buffer size matters to get a good result.
Thanks in advanced
Related
bsd
I am trying achieve pitch detection, and moreover learn some basic audio physics on the way, I am actually really new to this and just trying to understand how this whole thing works...
My question is, What is exactly the audioBuffer and how is the data coming from getChannelData related to frequencies. and how can I extract frequency data from the audioBuffer...
Also, if someone can explain just a bit about sample rates etc. also this would be great.
Thanks!
An AudioBuffer simply represents an audio resource, namely audio samples and additional information about the audio.
For instance, you can access the sampleRate property of an AudioBuffer object to know about the sampling frequency of the audio contained in the AudioBuffer.
Using getChannelData will return you an array of audio samples. At every interval given by the sampling rate, you have a number (comprised between -1.0 and +1.0 for IEEE 32 float audio samples) corresponding to the audio amplitude of the sample. Thus, this array of samples contain time-domain audio information.
To do pitch detection, you need to work in the frequency domain and to go from the time domain to the frequency domain, you need to use the Fourier transform. If you want to understand underlying DSP (digital signal processing) principles, you can use a pure FFT library (for instance, node-fft). If you just want to achieve a pitch detection, using a turnkey solution such as pitch.js will be easier.
As a result of a learning algorithm in JavaScript, I have magnitude data for signals at certain frequency bins, basically spectrogram data. What I want to do is to construct a composite sound signal.
My main question is: Is this possible ?
Since the magnitude can be the result of an infinite amount of complex numbers, i.e. a circle in the complex plane, it is not clear which one to use. This might not matter, but to build the signal, in my limited understanding, I would need to know the exact complex number so that I know its phase. This might be very wrong, but that's why I'm here! :)
Then, if this is possible, how would I do this?
I thought about using a reverse FFT but I think I do not have the right input.
Please correct wrong statements.
Any relevant insights are welcome, I have a lot to learn about this, and I want to.
Thanks!
I am writing a simple mpeg-dash streaming player using HTML5 video element.
I am creating MediaSource and attaching a SourceBuffer to it. Then I am appending dash fragments into this sourcebuffer and everything is working fine.
Now, what I want to do is, I want to pre-fetch those segments dynamically depending upon current time of the media element.
While doing this there are lot of doubts and which are not answered by MediaSource document.
Is it possible to know how much data sourceBuffer can support at a time? If I have a very large video and append all the fragments into sourcebuffer, will it accommodate all fragments or cause errors or will slow down my browser?
How to compute number of fragments in sourcebuffer?
How to compute the presentation time or end time of the last segment in SourceBuffer?
How do we remove only specific set of fragments from SourceBuffer and replace them with segments with other resolutions? (I want to do it to support adaptive resolution switching run time.)
Thanks.
The maximum amount of buffered data is an implementation detail and is not exposed to the developer in any way AFAIK. According to the spec, when appending new data the browser will execute the coded frame eviction algorithm which removes any buffered data deemed unnecessary by the browser. Browsers tend to remove any part of the stream that has already been played and don't remove parts of the stream that are in the future relative to current time. This means that if the stream is very large and the dash player downloads it very quickly, faster than the MSE can play it, then there will be a lot of the stream that cannot be remove by the coded frame eviction algorithm and this may cause the append buffer method to throw a QuotaExceededError. Of course a good dash player should monitor the buffered amount and not download excessive amounts of data.
In plain text: You have nothing to worry about, unless your player downloads all of the stream as quickly as possible without taking under consideration the current buffered amount.
The MSE API works with a stream of data (audio or video). It has no knowledge of segments. Theoretically you could get the buffered timerange and map to to a pair of segments using the timing data provided in the MPD. But this is fragile IMHO. Better is to keep track of the downloaded and fed segments.
Look at the buffered property. The easiest way to get the end time in seconds of the last appended segments is simply: videoElement.buffered.end(0)
If by presentation time you mean the Presentation TimeStamp of the last buffered frame then there is no way of doing this apart from parsing the stream itself.
To remove buffered data you can use the remove method.
Quality switching is actually quite easy although the spec doesn't say much about it. To switch the qualities the only thing you have to do is append the init header for the new quality to the SourceBuffer. After that you can append the segments for the new quality as usual.
I personally find the youtube dash mse test player a good place to learn.
The amount of data a sourceBuffer can support depends on the MSE implementation and therefore the browser vendor. Once you reached the maximum value, this will of course result in an error.
You cannot directly get the number of segments in SourceBuffer, but you can get the actual buffered time. In combination with the duration of the segments you are able to compute it.
I recommend to have a look in open source DASH player projects like dashjs or ExoPlayer, which implement all your desired functionalities. Or maybe even use a commercial solution like bitdash.
I'm testing a streaming web application that uses MediaSourceAPI. Everything works fine, however when i stream big files (i.e 240MB or more), the buffer of the video has a strange behavior. To be more clear i attached three images you can check. My script creates a mediaSource object, then it calls addSourceBuffer and then it calls appendBuffer many time as there are chunks to append. I think that i do not configure well the buffer and so the mediaSource API use a default value for the buffer length.
Could you help me please?
Visit https://productforums.google.com/forum/#!category-topic/chrome/report-a-problem-and-get-troubleshooting-help/windows8/Stable/0igRzDJQ7ds
There is a max limit on the size of the SourceBuffers, maybe you're exceeding those? When they exceed the limits, the browsers will start evicting buffer segments according to some defined algorithm.
If you are appending as much data to the source buffers as you can, you might want to introduce a limit. E.g. for us, when playing HD video at 4.5mps, we could have a buffer size of about 3-4 minutes before we saw some odd behaviour (e.g. segments being evicted in front of the videos currentTime)
I have a web page that plays mp3s. I would like to create a visual graph of each mp3: volume level vs. time like Sound Cloud does. The only idea I have been able to come up with is to decode the mp3 with the web audio api, connect an analyzer node, play it through and record the levels at various times. Surely there is a better way. Does anyone know what it is?
You can grab the full AudioBuffer after decodeAudioData and just go through the samples that way (using getChannelData()). The samples will be floats from -1 to +1.
All you really have to do is group the samples into buckets of n length, where n is the total length of the AudioBuffer divided by the total number of pixels you want to render the waveform into. Then just find the maximum absolute value in each bucket and those are the values you'll draw.
No AnalyserNode needed for that, so you can do it all really quickly instead of having to do it in real-time.