Mediapipe pose SegmentationMask python javascript differences

Mediapipe pose SegmentationMask python javascript differences - javascript

I am developing a pose recognition webapp using mediapipe pose library (https://google.github.io/mediapipe/solutions/pose.html).
I am using the segmentationMask to find some specific points of the human body that satisfy a constraint (the value in the n-th pixel must be > 0.1).
I'am able to do this evaluation in python. The library returns the segmentation mask as a matrix with the same width and height as the input image, and contains values in [0.0, 1.0] where 1.0 and 0.0 indicate high certainty of a “human” and “background” pixel respectively. So I can iterate over the matrix and I am able to find the point that satisfy the constraint.
I am trying to do the same thing in javascript, but I have a problem. The The javascript version of the library does not return a matrix but returns an ImageBitmap used by the html canvas to draw the mask.
The problem is that with ImageBitmap I cannot access every point of the matrix and I am not able to find the points I am interested in.
Is there a way to transform the javascript segmentationMask ImageBitmap in order be similar to the segmenationMask of the python versione library or at least retrive the same informations (I need the values included in this range [0.0, 1.0] for every pixel of the image).
Thank you all.

There is unfortunately no direct way to get an ImageData from an ImageBitmap, but you can drawImage() this ImageBitmap on a clear canvas and then call ctx.getImageData(0, 0, canvas.width, canvas.height) to retrieve an ImageData where you'll get access to all the pixels data.
The confidence will be stored in the Alpha channel (every fourth item in imageData.data) as a value between 0 and 255.
function onResults(results) {
canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);
canvasCtx.drawImage(results.segmentationMask, 0, 0,
canvasElement.width, canvasElement.height);
const imgData = canvasCtx.getImageData(0, 0, canvasElement.width, canvasElement.height);
let i = 0;
for (let y = 0; y<imgData.height; y++) {
for (let x = 0; x<imgData.width; x++) {
const confidence = imgData.data[i + 3];
// do something with confidence here
i++;
}
}
}
And since you're gonna read a lot from that context, don't forget to pass the willReadFrequently option when you get it.
As a fiddle since StackSnippets won't allow the use of the camera.
Note that depending on what you do you may want to colorize this image from red to black using globalCompositeOperation and treat the data as an Uint32Array where the confidence would be expressed between 0 and 0xFF000000.

Related

Loading the PNG Image by webgl is not perfect

When using the canvas.getContext('2d') to load the png file which has a transparent part, it looks exactly the same as the png file itself. But when loading by canvas.getContext('webgl'), it will display as white in the transparent part. And then if you add discard in the shader, it will be better but still not perfect as the png file. How to fix this issue?
gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA, gl.RGBA, gl.UNSIGNED_BYTE, this.img);
void main() {
vec4 color = texture2D(u_image, v_texCoord);
if(color.a < 0.5) {
discard;
}
gl_FragColor = color;
}

It sounds like you may need to activate blending.
gl.enable(gl.BLEND);
And then set the blending function to work with pre-multiplied alpha (the default)
gl.blendFunc(gl.ONE, gl.ONE_MINUS_SRC_ALPHA);

Transparency is actually kind of complicated
There is
#1 what the canvas itself needs
The default canvas wants premultiplied alpha. In other words it wants you to provide RGBA values where RGB has been multiplied by A.
You can set the canvas so it does not expect premultiplied alpha when creating the webgl context by passing in premultipledAlpha: false as in
const gl = someCanvas.getContext('webgl', {premultipliedAlpha: false});
Note: IIRC This doesn't work on iOS.
#2 what format you load the images
The default for loading images in WebGL is unpremultiplied alpha. In other words if the image has a pixel that is
255, 128, 64, 128 RGBA
It will be loaded exactly like that (*)
You can tell WebGL to premultiply for you when loading an image by setting
gl.pixelStorei(gl.UNPACK_PREMULTIPLY_ALPHA_WEBGL, true);
Before calling gl.texImage2D.
Now that same pixel will above will end up being
128, 64, 16, 128 RGBA
Each of RGB has been multiplied by A (A above is 128 where 128 represents 128/255 or 0.5019607843137255)
#3 what you write out in your shaders
If you loaded un-premultiplied data you might choose to premultiply in your shader
gl_FragColor = vec4(someColor.rgb * someColor.a, someColor.a);
#4 how you blend
If you want to blend what you are drawing into what has already been drawn then you need to turn on blending
gl.enable(gl.BLEND);
But you also need to set how the blending happens. There are multiple functions that affect blending. The most common one to use is gl.blendFunc which sets how the src pixel (the one generated by your shader) and the dst pixel (the one being drawn on top of in the canvas) are affected before being combined. The 2 most common settings are
gl.blendFunc(gl.SRC_ALPHA, gl.ONE_MINUS_SRC_ALPHA); // unpremultiplied alpha
and
gl.blendFunc(gl.ONE, gl.ONE_MINUS_SRC_ALPHA); // premultiplied alpha
The first argument is how to multiply the src pixel. Above we are either multiplying by the alpha of the src (SRC_ALPHA) or by 1 (ONE). The second argument is how to multiply the dst pixel. ONE_MINUS_SRC_ALPHA is exactly what it says (1 - alpha)
How you put all these together is up to you.
This article and This one somewhat cover these issues
(*) Images may have color conversion applied.

How to transform a rectangle into a circle

I want to turn a rectangular image into a circle using the HTMLCanvas element. (Finally I only need the upper half of the circle but that could be easily managed by cutting the resulting circle in half.)
From this
To this
My idea was to do a simple line by line transformation. So far I have just the basic drawing logic but I'm totally lost with the math for the transformation.
<!DOCTYPE html>
<body>
<canvas id="canvas"></canvas>
</body>
<script type="text/javascript">
var img = new Image();
img.onload = function init() {
var img = this;
var imgH = img.height;
var imgW = img.width;
// make the canvas the same size as the image.
var c = document.getElementById("canvas");
c.width = imgW;
c.height = imgH;
var ctx = c.getContext("2d");
var halfHeight = imgH/2;
// draw the upper part
// line by line
for(var i = 0; i < halfHeight; i++) {
// I'm totally lost here.
// current output without transformation
ctx.drawImage(img, 0, i, imgW, 1, 0, i, imgW, 1);
}
// add the second half which must not be transformed
ctx.drawImage(img, 0, halfHeight, imgW, halfHeight, 0, halfHeight, imgW, halfHeight);
};
img.src = "https://i.stack.imgur.com/52TjZ.png";
</script>
</html>
A fiddle
https://jsfiddle.net/kirschkern/amq7t6ru/2/
(I need it in pure JS and 2d. No three.js, no webgl.)
Any help is highly appreciated.

I don't know much about Javascript but as this seems more of a mathematical question, I'll have my shot.
Replace the lines
// I'm totally lost here.
// current output without transformation
ctx.drawImage(img, 0, i, imgW, 1, 0, i, imgW, 1);
with
var xMargin = -Math.sqrt(1-Math.pow((i-halfHeight)/halfHeight,2))*imgW/2+imgW/2;
ctx.drawImage(img, 0, i, imgW, 1, xMargin, i, imgW-(2*xMargin), 1);
This distorts the upper half of the image as an ellipse (a circle would work only if your input image would be a square) as this:
Does this solve your question?
Explanation
I took the equation of a shifted ellipse from Wikipedia and set c1 and a to be equal to imgW/2 and c2 and b to imgH/2. Taking i for y let me compute x; I saved one of the solutions as xMargin. The width of the picture at the given vertical coordinate would be the original width minus twice the margin.
In the end, I fed drawImage() with these inputs, see the documentation.

Plain 2D JavaScript does not have primitives to distort images like this. So a simple drawImage will not be enough.
What you can do is approximate things. Write a function which for every point in the distorted image (the one with the circle) computes the corresponding position in the original image. Then you can do one of four things, in increasing order of effort and resulting quality.
Iterate over all the pixels in the destination image, and look up the corresponding pixel value in the source image.
Like before, but with subsampling: take several positions inside the square of the source pixel, and average the resuling colors for a smoother appearance.
Approximate the affine transformation in a given point (for this you will likely need partial derivatives of your mapping function) and use it to draw an affinely transformed image.
Same as 3 but with projective instead of affine transforms. That would arguably make it 3D in its formulation.
Like 1 or 2 but implement all of that in WebGL as a fragment shader. I know you said you don't want that, but in terms of performance and resulting quality this should give the best results.

how do getByteTimeDomain/FrequencyData() method work

I'm trying to develop pitch-detector using JavaScript Web Audio API. By googling, I've knew we perceive pitch by frequency so I found getByteFrequencyData() method. But I don't know how to use it correctly.
example.js
function draw() {
var img = new Image();
img.src="foo.jpg";
img.onload = function() {
ctx.drawImage(img, 0, 0);
var imgData=ctx.getImageData(0, 0, canvas.width, canvas.height);
var raster = imgData.data;
for (var i=0;i<raster.length;i++) {
if (i % 4 != 3) {
raster[i] = 255 - raster[i];
}
}
ctx.putImageData(imgData,0, 0);
}
}
As we see above, getImageData() returns very obvious, easy-to-access data. In contrast, What does a parameter "buffer" of getByteFrequencyData() save/represent/mean? How does it represent audio frequency data? How can I manipulate/handle it and develop my own program using these methods?
Thanks.

The spec entry for getByteFrequencyData tells you exactly what it is. The analyser node determines the frequency content in a set of bins where the value of each bin is the magnitude of that frequency component. getByteFrequencyData just converts that dB and then scales the values to the range of 0 to 255.
I generally recommend people to use getFloatFrequencyData() first because I think it's a bit easier to understand without having to deal with the scaling.

HTML Canvas putImageData with transparency causes incorrect RGB to be saved

I am trying to set individual pixels in the HTML canvas by using putImageData(). When I do this and then immediately read those pixels back out using getImageData(), the RBG values I just set have been changed! See example:
var ct = canvas.getContext('2d');
var image = ct.getImageData(0,0,1,1);
var data = image.data;
data[0] = 200; //r
data[1] = 100; //g
data[2] = 50; //b
data[3] = 25; //a
console.log(data); //[200, 100, 50, 25] Yeah :)
ct.putImageData(image,0,0);
var debug = ct.getImageData(0,0,1,1);
console.log(debug.data); //[204, 102, 51, 25] Boo :(
If I set the alpha channel to 255 (no transparency) then the RGB values aren't altered. If I set the alpha channel to 0 (transparent) then the RGB comes back as 0,0,0. Obviously it has something to do with transparency. It probably has something to do with the RGB color space and math.
I am trying to figure out why this happens or at least be able to predict the results in some way. Can someone please clue me in about what is going on here?

It is due to the process of compositing and in particular with premultiplying the alpha channel. The standard states:
Due to the lossy nature of converting to and from premultiplied alpha
colour values, pixels that have just been set using putImageData()
might be returned to an equivalent getImageData() as different values.
It's a relative deep and wide topic, but if you want to dive into the particulars for the math behind it, I would recommend looking at the Porter-Duff algorithms found at this link. See in particular blending and alpha compositing. Considering also that the browser uses 8-bit integer values in these formulas.

Canvas linear downscaling

im trying to do a downscale of an image using canvas to later use the data for a hash compare. however i noticed that the canvas (or at least the simple code i use) uses no mipmap filter resulting in very sharp result and makes the test against another existing hash fail (downscaling the image in gimp using linear works as expected). the code i use to downscale is
var canvas = document.createElement("canvas");
canvas.width = width; canvas.height = height;
var context = canvas.getContext('2d');
context.drawImage(image, 0, 0, width, height);
return context.getImageData(0, 0, width, height).data;
this results in this image (left) to the expected (right)
how can i get the canvas to downscale linear?

The new canvas draft specify a way to set re-sampling/interpolation for the canvas. The current method is always bi-linear, or nearest-neighbor if imageSmoothingEnabled = false (both methods are for both up-scaling and down-sampling). The new property is called imageSmoothingQuality:
context . imageSmoothingQuality [ = value ]
The value can be "low", "medium" and "high" (for example something like bi-cubic/Lanczos). However, no browsers has yet implemented this at the moment of writing this and the actual algorithms used for each value is not mandated.
The alternative approaches is to manually re-sample when you need changes above 50% using multiple steps, or to implement a re-sampling algorithm.
Here is an example of multiple steps to achieve bi-cubic quality level (and avoids initial CORS problems), as well as one showing the Lanczos algorithm (need CORS requirements to be met).
In addition to that you can apply sharpening convolution to compensate for some of the lost sharpness.

We Keep Coding

JavaScript is the programming language of the Web.

Mediapipe pose SegmentationMask python javascript differences - javascript

Related

Loading the PNG Image by webgl is not perfect

How to transform a rectangle into a circle

how do getByteTimeDomain/FrequencyData() method work

HTML Canvas putImageData with transparency causes incorrect RGB to be saved

Canvas linear downscaling

Categories

Resources