Mapping frequencies to loudness using the Web Audio API

Mapping frequencies to loudness using the Web Audio API - javascript

I want to detect played notes and chords using the Web Audio API (using the microphone as an input device). Before I can analyse the data, I need it the individual frequencies mapped to their loudness. I started with the following snippet:
const stream = await navigator.mediaDevices.getUserMedia({
audio: true,
video: false
});
const context = new AudioContext();
const source = context.createMediaStreamSource(stream);
const analyser = context.createAnalyser();
const data = new Uint8Array(analyser.frequencyBinCount);
analyser.getByteFrequencyData(data);
data now is an array of values between 0 to 255. The question I have now is how can I map the frequencies to the loudness values of the data array?
Ideally, I'd like an object like this:
{
...
438: 128,
439: 200,
440: 255,
441: 200,
...
}
Thanks for your help.

The value in data[k] corresponds to the frequency k * Nyquist/frequencyBinCount where Nyquist is one half of the sampling frequency, AudioContext.sampleRate.
I think that's what you're asking for. If not please clarify.

Related

Is there a way to get RGB value from a live camera feed in reactJs?

I was wondering in ReactJS/ReactApp if it is possible to capture the RGB color from Center Pixel of a Live Camera Feed displayed on a website (honestly any pixel can do fine or we could average out the entire video frame - whatever simpler)?
All this camera does is displays a feed (of the devices rear camera) which is designed by the website (no need for pictures or video taken).
function Camera() {
const videoRef = useRef(null);
useEffect(() => {
getVideo();
}, [videoRef]);
const getVideo = () => {
navigator.mediaDevices
.getUserMedia({ video: {facingMode: 'environment', width: 600 , height: 400}})
.then(stream => {
let video = videoRef.current;
video.srcObject = stream;
video.play();
})
.catch(err => {
console.error("error:", err);
});
};
return (
<div>
<video ref={videoRef} />
</div>
)
}
export default Camera;
As well, this camera Implementation was Modified from: https://itnext.io/accessing-the-webcam-with-javascript-and-react-33cbe92f49cb

You can use your videoRef and do something like this.
const {video} = videoRef.current;
const canvas = document.createElement('canvas');
canvas.width = 600;
canvas.height = 400;
const ctx = canvas.getContext('2d');
ctx.drawImage(video, 0, 0, 600, 400);
const rgbaData = ctx.getImageData(0,0,600,400).data;
The above code grabs the video stream from your videoRef then you create a 2d canvas and draw your stream onto it.
From there you just do getImage fill out the parameters on what part of the image you want the pixels from and then .data is a single dimensional array of rgba values containing all the pixels in the image.
(note: this also doesn't require drawing the canvas onto the screen)
https://developer.mozilla.org/en-US/docs/Web/API/CanvasRenderingContext2D/getImageData
And then to get the average pixel color you can look at this post:
Get average color of image via Javascript
But if you're looking to find pixel brightness you can use this
Formula to determine perceived brightness of RGB color
And lastly for contrast of pixels
How to programmatically calculate the contrast ratio between two colors?
Note you will also probably want to chunk together the single dimensional array into [[rgba],[rgba]] chunks rather than the [r,g,b,a,r,g,b,a] structure the array already comes in.
const chunk = (a, n) => [...Array(Math.ceil(a.length / n))].map((_, i) => a.slice(n * i, n + n * i));
a - being the array
n - being the chunk size (in this case 4 since rgba)

Javascript: How to convert opencv mat to tensor?

Does anybody know how to convert an opencv.js mat into a tensor so I can feed it into my tensorflow.js classifier?
The following code shows what I did to read in and preprocess the image I want to classify:
img_array = cv.imread(document.getElementById('picture1'), cv.IMREAD_GRAYSCALE);
cv.cvtColor(img_array, img_array, cv.COLOR_RGBA2GRAY);
let dsize = new cv.Size(100, 100);
cv.resize(img_array, img_array, dsize);
My classifier needs a tensor of shape (1, 100, 100, 1) as an input and I do not know how to convert the cv mat into a tensorflow.js tensor.

An image object has a properties data that can be used to get all pixels values in a flattened array. To construct a tensor, the following can be used
const src = cv.imread(imageSource)
const tensor = tf.tensor(src.data, [src.rows, src.cols, -1])

How to train a model in nodejs (tensorflow.js)?

I want to make a image classifier, but I don't know python.
Tensorflow.js works with javascript, which I am familiar with. Can models be trained with it and what would be the steps to do so?
Frankly I have no clue where to start.
The only thing I figured out is how to load "mobilenet", which apparently is a set of pre-trained models, and classify images with it:
const tf = require('#tensorflow/tfjs'),
mobilenet = require('#tensorflow-models/mobilenet'),
tfnode = require('#tensorflow/tfjs-node'),
fs = require('fs-extra');
const imageBuffer = await fs.readFile(......),
tfimage = tfnode.node.decodeImage(imageBuffer),
mobilenetModel = await mobilenet.load();
const results = await mobilenetModel.classify(tfimage);
which works, but it's no use to me because I want to train my own model using my images with labels that I create.
=======================
Say I have a bunch of images and labels. How do I use them to train a model?
const myData = JSON.parse(await fs.readFile('files.json'));
for(const data of myData){
const image = await fs.readFile(data.imagePath),
labels = data.labels;
// how to train, where to pass image and labels ?
}

First of all, the images needs to be converted to tensors. The first approach would be to create a tensor containing all the features (respectively a tensor containing all the labels). This should the way to go only if the dataset contains few images.
const imageBuffer = await fs.readFile(feature_file);
tensorFeature = tfnode.node.decodeImage(imageBuffer) // create a tensor for the image
// create an array of all the features
// by iterating over all the images
tensorFeatures = tf.stack([tensorFeature, tensorFeature2, tensorFeature3])
The labels would be an array indicating the type of each image
labelArray = [0, 1, 2] // maybe 0 for dog, 1 for cat and 2 for birds
One needs now to create a hot encoding of the labels
tensorLabels = tf.oneHot(tf.tensor1d(labelArray, 'int32'), 3);
Once there is the tensors, one would need to create the model for training. Here is a simple model.
const model = tf.sequential();
model.add(tf.layers.conv2d({
inputShape: [height, width, numberOfChannels], // numberOfChannels = 3 for colorful images and one otherwise
filters: 32,
kernelSize: 3,
activation: 'relu',
}));
model.add(tf.layers.flatten());
model.add(tf.layers.dense({units: 3, activation: 'softmax'}));
Then the model can be trained
model.fit(tensorFeatures, tensorLabels)
If the dataset contains a lot of images, one would need to create a tfDataset instead. This answer discusses why.
const genFeatureTensor = image => {
const imageBuffer = await fs.readFile(feature_file);
return tfnode.node.decodeImage(imageBuffer)
}
const labelArray = indice => Array.from({length: numberOfClasses}, (_, k) => k === indice ? 1 : 0)
function* dataGenerator() {
const numElements = numberOfImages;
let index = 0;
while (index < numFeatures) {
const feature = genFeatureTensor(imagePath);
const label = tf.tensor1d(labelArray(classImageIndex))
index++;
yield {xs: feature, ys: label};
}
}
const ds = tf.data.generator(dataGenerator).batch(1) // specify an appropriate batchsize;
And use model.fitDataset(ds) to train the model
The above is for training in nodejs. To do such a processing in the browser, genFeatureTensor can be written as follow:
function loadImage(url){
return new Promise((resolve, reject) => {
const im = new Image()
im.crossOrigin = 'anonymous'
im.src = 'url'
im.onload = () => {
resolve(im)
}
})
}
genFeatureTensor = image => {
const img = await loadImage(image);
return tf.browser.fromPixels(image);
}
One word of caution is that doing heavy processing might block the main thread in the browser. This is where web workers come into play.

Consider the exemple https://codelabs.developers.google.com/codelabs/tfjs-training-classfication/#0
What they do is:
take a BIG png image (a vertical concatenation of images)
take some labels
build the dataset (data.js)
then train
The building of the dataset is as follows:
images
The big image is divided into n vertical chunks.
(n being chunkSize)
Consider a chunkSize of size 2.
Given the pixel matrix of image 1:
1 2 3
4 5 6
Given the pixel matrix of image 2 is
7 8 9
1 2 3
The resulting array would be
1 2 3 4 5 6 7 8 9 1 2 3 (the 1D concatenation somehow)
So basically at the end of the processing, you have a big buffer representing
[...Buffer(image1), ...Buffer(image2), ...Buffer(image3)]
labels
That kind of formatting is done a lot for classification problems. Instead of classifying with a number, they take a boolean array.
To predict 7 out of 10 classes we would consider
[0,0,0,0,0,0,0,1,0,0] // 1 in 7e position, array 0-indexed
What you can do to get started
Take your image (and its associated label)
Load your image to the canvas
Extract its associated buffer
Concatenate all your image's buffer as a big buffer. That's it for xs.
Take all your associated labels, map them as a boolean array, and concatenate them.
Below, I subclass MNistData::load (the rest can be let as is (except in script.js where you need to instantiate your own class instead)
I still generate 28x28 images, write a digit on it, and get a perfect accuracy since I don't include noise or voluntarily wrong labelings.
import {MnistData} from './data.js'
const IMAGE_SIZE = 784;// actually 28*28...
const NUM_CLASSES = 10;
const NUM_DATASET_ELEMENTS = 5000;
const NUM_TRAIN_ELEMENTS = 4000;
const NUM_TEST_ELEMENTS = NUM_DATASET_ELEMENTS - NUM_TRAIN_ELEMENTS;
function makeImage (label, ctx) {
ctx.fillStyle = 'black'
ctx.fillRect(0, 0, 28, 28) // hardcoded, brrr
ctx.fillStyle = 'white'
ctx.fillText(label, 10, 20) // print a digit on the canvas
}
export class MyMnistData extends MnistData{
async load() {
const canvas = document.createElement('canvas')
canvas.width = 28
canvas.height = 28
let ctx = canvas.getContext('2d')
ctx.font = ctx.font.replace(/\d+px/, '18px')
let labels = new Uint8Array(NUM_DATASET_ELEMENTS*NUM_CLASSES)
// in data.js, they use a batch of images (aka chunksize)
// let's even remove it for simplification purpose
const datasetBytesBuffer = new ArrayBuffer(NUM_DATASET_ELEMENTS * IMAGE_SIZE * 4);
for (let i = 0; i < NUM_DATASET_ELEMENTS; i++) {
const datasetBytesView = new Float32Array(
datasetBytesBuffer, i * IMAGE_SIZE * 4,
IMAGE_SIZE);
// BEGIN our handmade label + its associated image
// notice that you could loadImage( images[i], datasetBytesView )
// so you do them by bulk and synchronize after your promises after "forloop"
const label = Math.floor(Math.random()*10)
labels[i*NUM_CLASSES + label] = 1
makeImage(label, ctx)
const imageData = ctx.getImageData(0, 0, canvas.width, canvas.height);
// END you should be able to load an image to canvas :)
for (let j = 0; j < imageData.data.length / 4; j++) {
// NOTE: you are storing a FLOAT of 4 bytes, in [0;1] even though you don't need it
// We could make it with a uint8Array (assuming gray scale like we are) without scaling to 1/255
// they probably did it so you can copy paste like me for color image afterwards...
datasetBytesView[j] = imageData.data[j * 4] / 255;
}
}
this.datasetImages = new Float32Array(datasetBytesBuffer);
this.datasetLabels = labels
//below is copy pasted
this.trainIndices = tf.util.createShuffledIndices(NUM_TRAIN_ELEMENTS);
this.testIndices = tf.util.createShuffledIndices(NUM_TEST_ELEMENTS);
this.trainImages = this.datasetImages.slice(0, IMAGE_SIZE * NUM_TRAIN_ELEMENTS);
this.testImages = this.datasetImages.slice(IMAGE_SIZE * NUM_TRAIN_ELEMENTS);
this.trainLabels =
this.datasetLabels.slice(0, NUM_CLASSES * NUM_TRAIN_ELEMENTS);// notice, each element is an array of size NUM_CLASSES
this.testLabels =
this.datasetLabels.slice(NUM_CLASSES * NUM_TRAIN_ELEMENTS);
}
}

I found a tutorial [1] how to use existing model to train new classes. Main code parts here:
index.html head:
<script src="https://unpkg.com/#tensorflow-models/knn-classifier"></script>
index.html body:
<button id="class-a">Add A</button>
<button id="class-b">Add B</button>
<button id="class-c">Add C</button>
index.js:
const classifier = knnClassifier.create();
....
// Reads an image from the webcam and associates it with a specific class
// index.
const addExample = async classId => {
// Capture an image from the web camera.
const img = await webcam.capture();
// Get the intermediate activation of MobileNet 'conv_preds' and pass that
// to the KNN classifier.
const activation = net.infer(img, 'conv_preds');
// Pass the intermediate activation to the classifier.
classifier.addExample(activation, classId);
// Dispose the tensor to release the memory.
img.dispose();
};
// When clicking a button, add an example for that class.
document.getElementById('class-a').addEventListener('click', () => addExample(0));
document.getElementById('class-b').addEventListener('click', () => addExample(1));
document.getElementById('class-c').addEventListener('click', () => addExample(2));
....
Main idea is to use existing network to make its prediction and then substitute the found label with your own one.
Complete code is in the tutorial. Another promising, more advanced one in [2]. It needs strict pre processing, so I leave it only here, I mean it is so much more advanced one.
Sources:
[1] https://codelabs.developers.google.com/codelabs/tensorflowjs-teachablemachine-codelab/index.html#6
[2] https://towardsdatascience.com/training-custom-image-classification-model-on-the-browser-with-tensorflow-js-and-angular-f1796ed24934

TL;DR
MNIST is the image recognition Hello World. After learning it by heart, these questions in your mind are easy to solve.
Question setting:
Your main question written is
// how to train, where to pass image and labels ?
inside your code block. For those I found perfect answer from examples of Tensorflow.js examples section: MNIST example. My below links have pure javascript and node.js versions of it and Wikipedia explanation. I will go them through on the level necessary to answer the main question in your mind and I will add also perspectives how your own images and labels have anything to do with MNIST image set and the examples using it.
First things first:
Code snippets.
where to pass images (Node.js sample)
async function loadImages(filename) {
const buffer = await fetchOnceAndSaveToDiskWithBuffer(filename);
const headerBytes = IMAGE_HEADER_BYTES;
const recordBytes = IMAGE_HEIGHT * IMAGE_WIDTH;
const headerValues = loadHeaderValues(buffer, headerBytes);
assert.equal(headerValues[0], IMAGE_HEADER_MAGIC_NUM);
assert.equal(headerValues[2], IMAGE_HEIGHT);
assert.equal(headerValues[3], IMAGE_WIDTH);
const images = [];
let index = headerBytes;
while (index < buffer.byteLength) {
const array = new Float32Array(recordBytes);
for (let i = 0; i < recordBytes; i++) {
// Normalize the pixel values into the 0-1 interval, from
// the original 0-255 interval.
array[i] = buffer.readUInt8(index++) / 255;
}
images.push(array);
}
assert.equal(images.length, headerValues[1]);
return images;
}
Notes:
MNIST dataset is a huge image, where in one file there are several images like tiles in puzzle, each and every with same size, side by side, like boxes in x and y coordination table. Each box has one sample and corresponding x and y in the labels array has the label. From this example, it is not a big deal to turn it to several files format, so that actually only one pic at a time is given to the while loop to handle.
Labels:
async function loadLabels(filename) {
const buffer = await fetchOnceAndSaveToDiskWithBuffer(filename);
const headerBytes = LABEL_HEADER_BYTES;
const recordBytes = LABEL_RECORD_BYTE;
const headerValues = loadHeaderValues(buffer, headerBytes);
assert.equal(headerValues[0], LABEL_HEADER_MAGIC_NUM);
const labels = [];
let index = headerBytes;
while (index < buffer.byteLength) {
const array = new Int32Array(recordBytes);
for (let i = 0; i < recordBytes; i++) {
array[i] = buffer.readUInt8(index++);
}
labels.push(array);
}
assert.equal(labels.length, headerValues[1]);
return labels;
}
Notes:
Here, labels are also byte data in a file. In Javascript world, and with the approach you have in your starting point, labels could also be a json array.
train the model:
await data.loadData();
const {images: trainImages, labels: trainLabels} = data.getTrainData();
model.summary();
let epochBeginTime;
let millisPerStep;
const validationSplit = 0.15;
const numTrainExamplesPerEpoch =
trainImages.shape[0] * (1 - validationSplit);
const numTrainBatchesPerEpoch =
Math.ceil(numTrainExamplesPerEpoch / batchSize);
await model.fit(trainImages, trainLabels, {
epochs,
batchSize,
validationSplit
});
Notes:
Here model.fit is the actual line of code that does the thing: trains the model.
Results of the whole thing:
const {images: testImages, labels: testLabels} = data.getTestData();
const evalOutput = model.evaluate(testImages, testLabels);
console.log(
`\nEvaluation result:\n` +
` Loss = ${evalOutput[0].dataSync()[0].toFixed(3)}; `+
`Accuracy = ${evalOutput[1].dataSync()[0].toFixed(3)}`);
Note:
In Data Science, also this time here, the most faschinating part is to know how well the model survives the test of new data and no labels, can it label them or not? For that is the evaluation part that now prints us some numbers.
Loss and accuracy: [4]
The lower the loss, the better a model (unless the model has over-fitted to the training data). The loss is calculated on training and validation and its interperation is how well the model is doing for these two sets. Unlike accuracy, loss is not a percentage. It is a summation of the errors made for each example in training or validation sets.
..
The accuracy of a model is usually determined after the model parameters are learned and fixed and no learning is taking place. Then the test samples are fed to the model and the number of mistakes (zero-one loss) the model makes are recorded, after comparison to the true targets.
More information:
In the github pages, in README.md file, there is a link to tutorial, where all in the github example is explained in greater detail.
[1] https://github.com/tensorflow/tfjs-examples/tree/master/mnist
[2] https://github.com/tensorflow/tfjs-examples/tree/master/mnist-node
[3] https://en.wikipedia.org/wiki/MNIST_database
[4] How to interpret "loss" and "accuracy" for a machine learning model

Is it possible to convert frequency hertz array to audiobuffer using javascript?

The array first column is frequency and second is time to play that.
// create web audio api context
var audioCtx = new(window.AudioContext || window.webkitAudioContext)();
function playNote(frequency, duration) {
// create Oscillator node
var oscillator = audioCtx.createOscillator();
oscillator.type = 'sawtooth';
oscillator.frequency.value = frequency; // value in hertz
oscillator.connect(audioCtx.destination);
oscillator.start();
setTimeout(
function() {
oscillator.stop();
playMelody();
}, duration);
}
function playMelody() {
if (notes.length > 0) {
note = notes.pop();
playNote(note[0],note[1]);
}
}
notes = [[67.40,14.84],
[58.60,17.06],
[69.80,14.33],
[69.80,14.33],
[66.30,15.08],
[62.30,16.05],
[66.90,14.95],
[65.00,15.38],
[66.00,15.15],
[88.40,11.31],
[60.60,16.50],
[63.90,15.65],
[114.20,8.76],
[114.20,8.76],
[99.70,10.03],
[344.90,2.90],
[344.90,2.90],
[70.00,14.29],
[310.90,3.22],
[310.90,3.22],
[68.30,14.64],
[71.30,14.03],
[69.40,14.41],
[101.70,9.83],
[70.40,14.20],
[67.20,14.88],
[76.00,13.16],
[59.60,16.78],
[73.30,13.64],
[62.10,16.10],
[72.60,13.77],
[76.60,13.05],
[76.80,13.02],
[52.90,18.90],
[69.50,14.39],
[72.90,13.72],
[69.90,14.31],
[69.60,14.37]];
notes.reverse();
tempo = 100;
playMelody();
I am able to play the frequency code using the audio context in a sequential way but I need a way to convert this way to the audio file or the AudioBuffer. I want to design a spectrogram with these frequencies.

I'm not entirely sure if the example above is actually doing what you want. If I understand it correctly it is more or less playing all the notes at once. It also uses setTimeout to stop the oscillators which is not super accurate. Therefore I created a simple example which plays the first two bars of Mozart's "Kleine Nachtmusik". It is well known and therefore it's hopefully easier to verify if the code is actually doing what we want it to do.
const NOTES = [ [ 783.99, 0.5 ], [ 0, 0.25 ], [ 587.33, 0.25 ], [ 783.99, 0.5 ], [ 0, 0.25 ], [ 587.33, 0.25 ], [ 783.99, 0.25 ], [ 587.33, 0.25 ], [ 783.99, 0.25 ], [ 987.77, 0.25 ], [ 1174.7, 0.25] ];
At first I created an array of notes. As in your example each note consists of the frequency and the duration.
Next I defined a function which expects to be called with an AudioContext and an array of notes and then schedules those notes to be played in sequence.
function playNotes (context, notes) {
notes.reduce((offset, [ frequency, duration ]) => {
const oscillatorNode = context.createOscillator();
oscillatorNode.frequency.value = frequency;
oscillatorNode.start(offset);
oscillatorNode.stop(offset + duration);
oscillatorNode.connect(context.destination);
return offset + duration;
}, context.currentTime);
}
The function loops through all given notes. It will create an OscillatorNode for each of them. It starts that OscillatorNode when the previous one ends and stops it again when the duration is reached.
The function can now be used like this:
const audioContext = new AudioContext();
playNotes(audioContext, NOTES);
But the big advantage is that we can call the same function with an OfflineAudioContext.
const sampleRate = 44100;
const offlineAudioContext = new OfflineAudioContext({ length: 3.25 * sampleRate, sampleRate });
playNotes(offlineAudioContext, NOTES);
This will only schedule the notes but it will not produce any audible sound. An OfflineAudioContext will instead produce an AudioBuffer which is probably what you want. You can render that buffer by calling startRendering.
offlineAudioContext
.startRendering()
.then((audioBuffer) => console.log(audioBuffer));
Please note that this code will not work in Safari because it doesn't support the latest version of the Web Audio API. But you can get it to work by writing some additional code. As I'm the author of standardized-audio-context I do of course recommend to use it instead of writing your own polyfill code.

Decibel value in web audio

I'm using cordova-plugin-audioinput plugin for a JavaScript app that I'm developing. I'm trying to get the different decibel values at different frequencies in realtime using this code:
function startCapture() {
audioinput.start({
audioSourceType: 9,
sampleRate: 44100,
streamToWebAudio: true
});
audioCtx = audioinput.getAudioContext();
analyser = audioCtx.createAnalyser();
analyser.fftSize = 8192;
analyser.maxDecibels = 0;
audioinput.connect(analyser)
bufferLength = analyser.frequencyBinCount;
dataArray = new Uint8Array(bufferLength);
}
The data gets saved into the dataArray using analyser.getByteFrequencyData(dataArray);
Even though I specify maxDecibels to 0, the dataArray gets filled with positive values which doesn't really make any sense to me. I need the end result to be in decibels and even though the values that get pushed into dataArray react accordingly to the volume in realtime, they're not in decibels.

The values returned by getByteFrequencyData() are always in the range from 0 to 255. And these values are mapped linearly from minDecibels to maxDecibels. See the getByteFrequencyData.

We Keep Coding

JavaScript is the programming language of the Web.

Mapping frequencies to loudness using the Web Audio API - javascript

The value in data[k] corresponds to the frequency k * Nyquist/frequencyBinCount where Nyquist is one half of the sampling frequency, AudioContext.sampleRate. I think that's what you're asking for. If not please clarify.

Related

Is there a way to get RGB value from a live camera feed in reactJs?

Javascript: How to convert opencv mat to tensor?

How to train a model in nodejs (tensorflow.js)?

Is it possible to convert frequency hertz array to audiobuffer using javascript?

Decibel value in web audio

Categories

Resources