H.264 decoding failing to parse unusual frame headers

H.264 decoding failing to parse unusual frame headers - javascript

I'm reading an H.264 stream from a camera on a raspberry pi. I'm trying to pass this to Broadway via websockets to render in a web page.
The stream contains NAL units, and I'm chunking it on the [0,0,0,1] start prefix code, to send and then decode NAL units individually. I think that's working fine, but Broadway can't decode the result I end up with.
Digging into the parsing code I've based this on though, it seems to be expecting the 5th byte (straight after the start prefix code) to be either:
0x65 - an I frame
0x41 - a P frame
0x67 - an SPS frame
0x68 - a PPS frame
I've seen a lot of mention of these elsewhere too. All the units I have coming through though seem to start with (in order):
0x27 0x64 (1st unit)
0x28 0xEE (2nd unit)
0x25 0x88 (3rd unit, then intermittently later on)
0x21 0x9A (every single other unit in the stream)
What do these headers mean in an H.264 stream? Do they suggest something about what I need to do to match Broadway's expectations?
(If the full code would be useful to understand this better, see https://github.com/pimterry/pi-cam/tree/d801de9b)

This was a red herring: the actual issue for me here was that some existing frame dropping logic meant that I wasn't passing Broadway the first few frames in the stream, and it was failing to render. Replaying the SPS and PPS frames for all new connections and making sure they're never dropped has fixed the issue nicely.
I also did work out what these bytes are though, which helped, and may be useful for others for reference:
Hex Binary NAL type Meaning
0x65 = 11 00101 = type 5 Coded slice of an IDR picture (I-frame)
0x41 = 10 00001 = type 1 Coded slice of a non-IDR picture (P-frame)
0x27 = 01 00111 = type 7 Sequence parameter set (B-frame)
0x28 = 01 01000 = type 8 Picture parameter set (B-frame)
0x25 = 01 00101 = type 5 Coded slice of an IDR picture (B-frame)
0x21 = 01 00001 = type 1 Coded slice of a non-IDR picture (B-frame)
Special thanks to Jaromanda X though - the NAL units article [here] and the nal_ref_idc article made working this out much easier.

Related

JavaScript Sequencer MIDI file timing resolution (PPQN)

I read a lot about MIDI resolution and studied some codes like Tone.js and heartbeat. But I don't understand why there are different Pulses Per Quarter Note (PPQN) values and what the effect it has on playing notes. When I have 960 PPQN so this means 1 quarter note has 960 ticks, 1 eight note 480 ticks, ect. And if I understand it correctly, the delta time is just a relative value.
What I don't understand right is, what should the PPQN when I play notes in JavScript, and when I set the PPQN why it should have this value? For example I use the WebAudio API for playing notes:
function nextNote() {
var quarterBeat = 60.0/tempo;
nextNoteDuration = nextNoteDuration + (quarterBeat/32);
currentNote++;
}
This way I can play different notes durations. Now when I read the MIDI file, should I just compare the delta time and convert it to my sequencers current playback? For example when I read a MIDI file with this values:
Tempo = 120
PQN = 960
4 Quarter Notes
I read the MIDI file, save the notes in an array (assume the delta time is for each note a 1/4)
duration = [quarterNote, quarterNote, quarterNote, quarterNote]
And play the notes:
while (nextNoteDuration < audioContext.currentTime) {
if (duration[i] %32 == 0) playNote(currentNote, nextNoteDuration);
nextNote();
i++;
}
Should I use PPQN only when exporting a MIDI file? If so, in relation to what should I set the PPQN? I hope someone can explain this to me in more detail.

PPQ is about resolution. More specifically about time resolution.
what should (be?) the PPQN when I play notes in JavScript, and when I set
the PPQN why it should have this value?
When your program plays notes, it may use whatever time units you want or need, for instance milliseconds, nanoseconds, movie frames, ticks. Absolute wall clock or relative times. It depends on your sequencer capabilities and your software features. It is only required to choose a PPQ value when storing MIDI sequences as MIDI files. Of course you need to be able to convert time units when reading/storing MIDI files.
Now when I read the MIDI file, should I just compare the delta time
and convert it to my sequencers current playback?
Of course, if internally your music events use a different time representation, then you need to be able to translate the delta times from the MIDI file to your internal representation.
You are only asking about PPQ, but this value is found only once, at the MIDI file header. The tempo events in contrast may occur multiple times along the file, and it affects to the translation of the next delta times into wall clock times until the next tempo event. If your sequencer/player allows the user to change/add tempo events, it would be a good idea to use relative units instead of wall clock units for internal time representation (or both).
Should I use PPQN only when exporting a MIDI file? If so, in relation
to what should I set the PPQN?
Yes, you need to choose a suitable value for PPQ when you export a MIDI file. If your internal time units are relative (as most sequencers do) then use your internal unit resolution for PPQ. If you convert from wall clock units to delta times, then you need to choose a resolution which your translation loses less details (by quantization). Higher values are better in this sense. Rosegarden stores MIDI files always with 960 PPQ. In contrast, Steinberg's Cubase used 480. I remember using Cakewalk a long time ago having only 120 PPQ, but later versions allowed this value to be changed as a configuration setting. In general you need to accommodate not only all music figures shorter than quarter, including subdivisions like triplets, etc. but you also need to take into account effects like swing that are based on fine time adjustments.

In a standard MIDI file, there’s information in the file header about “ticks per quarter note”, a.k.a. “parts per quarter” (or “PPQ”). For the purpose of this discussion, we’ll consider “beat” and “quarter note” to be synonymous, so you can think of a “tick” as a fraction of a beat. The PPQ is stated in the last word of information (the last two bytes) of the header chunk that appears at the beginning of the file. The PPQ could be a low number such as 24 or 96, which is often sufficient resolution for simple music, or it could be a larger number such as 480 for higher resolution, or even something like 500 or 1000 if one prefers to refer to time in milliseconds.
What the PPQ means in terms of absolute time depends on the designated tempo. By default, the time signature is 4/4 and the tempo is 120 beats per minute. That can be changed, however, by a “meta event” that specifies a different tempo. (You can read about the Set Tempo meta event message in the file format description document.) The tempo is expressed as a 24-bit number that designates microseconds per quarter-note. That’s kind of upside-down from the way we normally express tempo, but it has some advantages. So, for example, a tempo of 100 bpm would be 600000 microseconds per quarter note, so the MIDI meta event for expressing that would be FF 51 03 09 27 C0 (the last three bytes are the Hex for 600000). The meta event would be preceded by a delta time, just like any other MIDI message in the file, so a change of tempo can occur anywhere in the music.
Delta times are always expressed as a variable-length quantity, the format of which is explained in the document. For example, if the PPQ is 480 (standard in most MIDI sequencing software), a delta time of a dotted quarter note (720 ticks) would be expressed by the two bytes 85 50 (hexadecimal).
So, bearing all that in mind, there is a correspondence between delta times expressed in terms of ticks and note values as we think of them in human terms. The relationship depends on the PPQ specified in the header chunk. For example, if the PPQ is 96 (hex 60), then a note middle C on MIDI channel 10 with a velocity of 127 lasting a dotted quarter note (1.5 beats) would be expressed as
00 99 3C 7F // delta time 0 ticks, 153 60 127
90 99 3C 00 // delta time 144 ticks, 153 60 0

I successfully compiled my program. Now how do I run it?

I want to solve Project Euler Problem 1:
If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
Find the sum of all the multiples of 3 or 5 below 1000.
Here's my code:
\documentclass[10pt,a4paper]{article}
\usepackage{hyperref}
\newcommand*\rfrac[2]{{}^{#1}\!/_{#2}}
\title{Solution to Project Euler Problem 1}
\author{Aadit M Shah}
\begin{document}
\maketitle
We want to find the sum of all the multiples of 3 or 5 below 1000. We can use the formula of the $n^{th}$ triangular number\footnote{\url{http://en.wikipedia.org/wiki/Triangular_number}} to calculate the sum of all the multiples of a number $m$ below 1000. The formula of the $n^{th}$ triangular number is:
\begin{equation}
T_n = \sum_{k = 1}^n k = 1 + 2 + 3 + \ldots + n = \frac{n (n + 1)}{2}
\end{equation}
If the last multiple of $m$ below 1000 is $x$ then $n = \rfrac{x}{m}$. The sum of all the multiples of $m$ below 1000 is therefore:
\begin{equation}
m \times T_{\frac{x}{m}} = m \times \sum_{k = 1}^{\frac{x}{m}} k = \frac{x (\frac{x}{m} + 1)}{2}
\end{equation}
Thus the sum of all the multiples of 3 or 5 below 1000 is equal to:
\begin{equation}
3 \times T_{\frac{999}{3}} + 5 \times T_{\frac{995}{5}} - 15 \times T_{\frac{990}{15}} = \frac{999 \times 334 + 995 \times 200 - 990 \times 67}{2}
\end{equation}
\end{document}
I compiled it successfully using pdflatex:
$ pdflatex Problem1.tex
This is pdfTeX, Version 3.14159265-2.6-1.40.15 (TeX Live 2014/Arch Linux) (preloaded format=pdflatex)
.
.
.
Output written on Problem1.pdf (1 page, 106212 bytes).
Transcript written on Problem1.log.
It generated the following output PDF file along with a bunch of other files with scary extensions:
How do I run this PDF file so that it computes the solution? I know the solution to the problem but I want to know how to execute the PDF file to compute the solution.
The reason why I prefer LaTeX over other programming languages is because it supports literate programming, an approach to programming introduced by Donald Knuth, the creator of TeX and one of the greatest computer scientists of all time.
Edit: It would also be nice to be able to print the computed solution either on the screen or on paper. Computing the solution without printing it is useful for heating the room but it is so hot already with the onset of summer and global warming. In addition, printing the solution would teach me how to write a hello world program in LaTeX.

So, today seems to be a safe day to tackle this problem...
The OP does not seem to be quite so PDF-savvy.
However, he obviously is quite a literate LaTeX guy.
Which means, he also must be knowing TeX very well, given he is so much of a Donald Knuth admirer...
So much for the preliminaries.
Now for the real meat.
First, to quote the official PDF-1.7 specification document:
PDF is not a programming language, and a PDF file is not a program.
(p. 92, Section 7.10.1)
However, the pre-decessor of the PDF format, PostScript, IS a Turing-complete programming language... Turing-complete, just as TeX is, the creation of Donald Knuth, one of the greatest computer scientists of all time.
PostScript files, on the other hand, ARE programs, and can easily be executed by PostScript printers (though this execution time cannot reliably be determined in advance).
Hence, and second, the OP should be able to find a way to convert his hi-level LaTeX code to low-level TeX code.
That code needs to emit a PostScript program, which in turn can be executed by a PostScript printer.
Writing that TeX code should be trivial for somebody like the OP, once he is given the PostScript code that should be the result of his TeX code.
I myself am not so well-versed with the TeX aspect of that problem solving procedure.
However, I can help with the PostScript.
The PostScript which the OP's TeX code should produce goes like this (there are for sure more optimized versions possible -- this is only a first, quick'n'dirty shot at it):
%!PS
% define variables
/n1 999 def
/t1 334 def
/n2 995 def
/t2 200 def
/n3 990 def
/s1 67 def
/t3 2 def
% run the computational code
n1 t1 mul
n2 t2 mul
n3 s1 mul
sub
add
t3 div
% print result on printer, not on <stdout>
/Helvetica findfont
24 scalefont
setfont
30 500 moveto
(Result for 'Project Euler Problem No. 1' :) show
/Helvetica-Bold findfont
48 scalefont
setfont
80 400 moveto
( ) cvs show
showpage
Send this PostScript code to a PostScript printer, and it will compute and print the solution.
Update
To answer one of the comments: If you replace the last section of PostScript code starting with /Helvetica findfont with a simple print statement, it will not do what you might imagine.
print does not cause the printer to output paper. Instead it asks the PostScript interpreter to write the topmost item on the stack (which must be a (string)!) to the standard output channel. (If the topmost item on the stack is not of type (string), it will trigger a typecheck PostScript error.)
So sending a modified PostScript file (where print has replaced the last section of my PS code) to the printer will not work (unless that printer supports the interactive executive PostScript mode -- which is not a standard part of the PostScript language). It will work however if you feed that file to Ghostscript in a terminal or cmd.exe window.

You can't run pdf files. You need to use the latex command instead of pdflatex. e.g.
latex Problem1.tex
Here's some documentation

Get Final Output Frequency of Chained Oscillators

I've set up a web page with a theremin and I'm trying to change the color of a web page element based on the frequency of the note being played. The way I'm generating sound right now looks like this:
osc1 = page.audioCX.createOscillator();
pos = getMousePos(page.canvas, ev);
osc1.frequency.value = pos.x;
gain = page.audioCX.createGain();
gain.gain.value = 60;
osc2 = page.audioCX.createOscillator();
osc2.frequency.value = 1;
osc2.connect(gain);
gain.connect(osc1.frequency);
osc1.connect(page.audioCX.destination);
What this does is oscillate the pitch of the sound created by osc1. I can change the color to the frequency of osc1 by using osc1.frequency.value, but this doesn't factor in the changes applied by the other parts.
How can I get the resultant frequency from those chained elements?

You have to do the addition yourself (osc1.frequency.value + output of gain).
The best current (but see below) way to get access to the output of gain is probably to use a ScriptProcessorNode. You can just use the last sample from each buffer passed to the ScriptProcessorNode, and set the buffer size based on how frequently you want to update the color.
(Note on ScriptProcessorNode: There is a bug in Chrome and Safari that makes ScriptProcessorNode not work if it doesn't have at least one output channel. You'll probably have to create it with one input and one output, have it send all zeros to the output, and connect it to the destination, to get it to work.)
Near-future answer: You can also try using an AnalyserNode, but under the current spec, the time domain data can only be read from an AnalyserNode as bytes, which means the floating point samples are being converted to be in the range [0, 255] in some unspecified way (probably scaling the range [-1, 1] to [0, 255], so the values you need would be clipped). The latest draft spec includes a getFloatTimeDomainData method, which is probably your cleanest solution. It seems to have already been implemented in Chrome, but not Firefox, as far as I can tell.

Upsampling Audio PCM-data in Javascript with WebAudioApi

For a project, I'm retrieving a live audio stream via WebSockets from a Java server. On the server, I'm processing the samples in 16Bit/8000hz/mono in the form of 8-bit signed byte values (with two bytes making up one sample). On the browser, however, the lowest supported samplerate is 22050 hz. So my idea was to "simply" upsample the existing 8000 to 32000 hz, which is supported and seems to me like an easy calculation.
So far, I've tried linear upsampling and cosine interpolation, but both didn't work. In addition to sounding really distorted, the first one also added some clicking noises. I might also have trouble with the WebAudioAPI in Chrome, but at least the sound is playing and is barely recognizable as what it should be. So I guess no codec- or endianess-problem.
Here's the complete code that gets executed when a binary packet with sound data is received. I'm creating new buffers and buffersources all the time for the sake of simplicity (yeah, no good for performance). data is an ArrayBuffer. First, I'm converting the samples to Float, then I'm upsampling.
//endianess-aware buffer view
var bufferView=new DataView(data),
//the audio buffer to set for output
buffer=_audioContext.createBuffer(1,640,32000),
//reference to underlying buffer array
buf=buffer.getChannelData(0),
floatBuffer8000=new Float32Array(160);
//16Bit => Float
for(var i=0,j=null;i<160;i++){
j=bufferView.getInt16(i*2,false);
floatBuffer8000[i]=(j>0)?j/32767:j/-32767;
}
//convert 8000 => 32000
var point1,point2,point3,point4,mu=0.2,mu2=(1-Math.cos(mu*Math.PI))/2;
for(var i=0,j=0;i<160;i++){
//index for dst buffer
j=i*4;
//the points to interpolate between
point1=floatBuffer8000[i];
point2=(i<159)?floatBuffer8000[i+1]:point1;
point3=(i<158)?floatBuffer8000[i+2]:point1;
point4=(i<157)?floatBuffer8000[i+3]:point1;
//interpolate
point2=(point1*(1-mu2)+point2*mu2);
point3=(point2*(1-mu2)+point3*mu2);
point4=(point3*(1-mu2)+point4*mu2);
//put data into buffer
buf[j]=point1;
buf[j+1]=point2;
buf[j+2]=point3;
buf[j+3]=point4;
}
//playback
var node=_audioContext.createBufferSource(0);
node.buffer=buffer;
node.connect(_audioContext.destination);
node.noteOn(_audioContext.currentTime);

Finally found a solution for this. The conversion from 16Bit to Float is wrong, it just needs to be
floatBuffer8000[i]=j/32767.0;
Also, feeding the API with a lot of small samples doesn't work well, so you need to buffer some samples and play them together.

Render image from audio

Is there a possibility to render an visualization of an audio file?
Maybe with SoundManager2 / Canvas / HTML5 Audio?
Do you know some technics?
I want to create something like this:

You have a tone of samples and tutorials here : http://www.html5rocks.com/en/tutorials/#webaudio
For the moment it work in the last Chrome and the last last Firefox (Opera ?).
Demos : http://www.chromeexperiments.com/tag/audio/
To do it now, for all visitors of a web site, you can check SoundManagerV2.js who pass through a flash "proxy" to access audio data http://www.schillmania.com/projects/soundmanager2/demo/api/ (They already work on the HTML5 audio engine, to release it as soon as majors browsers implement it)
Up to you for drawing in a canvas 3 differents audio data : WaveForm, Equalizer and Peak.
soundManager.defaultOptions.whileplaying = function() { // AUDIO analyzer !!!
$document.trigger({ // DISPATCH ALL DATA RELATIVE TO AUDIO STREAM // AUDIO ANALYZER
type : 'musicLoader:whileplaying',
sound : {
position : this.position, // In milliseconds
duration : this.duration,
waveformDataLeft : this.waveformData.left, // Array of 256 floating-point (three decimal place) values from -1 to 1
waveformDataRight: this.waveformData.right,
eqDataLeft : this.eqData.left, // Containing two arrays of 256 floating-point (three decimal place) values from 0 to 1
eqDataRight : this.eqData.right, // ... , the result of an FFT on the waveform data. Can be used to draw a spectrum (frequency range)
peakDataLeft : this.peakData.left, // Floating-point values ranging from 0 to 1, indicating "peak" (volume) level
peakDataRight : this.peakData.right
}
});
};
With HTML5 you can get :
var freqByteData = new Uint8Array(analyser.frequencyBinCount);
var timeByteData = new Uint8Array(analyser.frequencyBinCount);
function onaudioprocess() {
analyser.getByteFrequencyData(freqByteData);
analyser.getByteTimeDomainData(timeByteData);
/* draw your canvas */
}
Time to work ! ;)

Run samples through an FFT, and then display the energy within a given range of frequencies as the height of the graph at a given point. You'll normally want the frequency ranges going from around 20 Hz at the left to roughly the sampling rate/2 at the right (or 20 KHz if the sampling rate exceeds 40 KHz).
I'm not so sure about doing this in JavaScript though. Don't get me wrong: JavaScript is perfectly capable of implementing an FFT -- but I'm not at all sure about doing it in real time. OTOH, for user viewing, you can get by with around 5-10 updates per second, which is likely to be a considerably easier target to reach. For example, 20 ms of samples updated every 200 ms might be halfway reasonable to hope for, though I certainly can't guarantee that you'll be able to keep up with that.

http://ajaxian.com/archives/amazing-audio-sampling-in-javascript-with-firefox
Check out the source code to see how they're visualizing the audio

This isn't possible yet except by fetching the audio as binary data and unpacking the MP3 (not JavaScript's forte), or maybe by using Java or Flash to extract the bits of information you need (it seems possible but it also seems like more headache than I personally would want to take on).
But you might be interested in Dave Humphrey's audio experiments, which include some cool visualization stuff. He's doing this by making modifications to the browser source code and recompiling it, so this is obviously not a realistic solution for you. But those experiments could lead to new features being added to the <audio> element in the future.

For this you would need to do a Fourier transform (look for FFT) which will be slow in javascript, and not possible in realtime at present.
If you really want to do this in the browser, I would suggest doing it in java/silverlight, since they deliver the fastest number crunching speed in the browser.

We Keep Coding

JavaScript is the programming language of the Web.