Faster web worker messaging - javascript

I've tested web worker messaging in Chrome specifically and getting results of about ~50ms latency to send and receive a message:
// Sender section
let imageData = mCtx.getImageData(0, 0, w, h);
let bitmapData = await createImageBitmap(imageData);
beforeAddBitmapFrame = performance.now();
videoWorker.postMessage({ action : 'addFrameBitmap', data: bitmapData }, [bitmapData]);
// Receiver section
videoWorker.onmessage = function (e) {
let blob = e.data.data;
beforeRenderBlobFrame = performance.now();
let latency = (beforeRenderBlobFrame - beforeAddBitmapFrame); // 50ms
if(latency > 10) {
console.log('=== Max Latency Hit ===')
}
renderBlobTest(blob);
};
This is basically a loop test where an image is sent to the web worker and the web worker will just send it back to calculate latency. 50 ms here might be nothing at first glace but if you multiply it like for a video with 30 FPS, so doing the math, 50 ms x 30 frames = 1500 ms latency (1.5 seconds) that's a lot considering this is not a network transfer.
What can be done to lower the latency of Web worker messaging?
[UPDATE]
To further test I did a simple "ping" test to the web worker at given interval
setInterval(function () {
let pingTime = new Date().getMilliseconds();
videoWorker.postMessage({ action: 'ping', pingTime : pingTime });
}, 500);
Then did
if(e.data.pingTime) {
let pongTime = new Date().getMilliseconds();
console.log('Got pong: ' + ( pongTime - e.data.pingTime ))
}
Similar result above it averages to ~50ms.

You felt in one of the micro-benchmarking traps:
Never run a single instance of the test.
The first run will always be slower, the engine has to warm up, in your case, the whole Worker thread has to be generated and a lot of other stuff have to be initialized (see this Q/A for a list of things delaying the first message).
Also, a single test is prone to report completely false results because of some external and unrelated events (a background app deciding to perform some operations just at that moment, the Garbage Collector kicking in, a UI event, anything...)
const videoWorker = new Worker( generateWorkerURL() );
let startTime;
const latencies = [];
const max_rounds = 10;
// Receiver section
videoWorker.onmessage = function (e) {
const endTime = performance.now();
e.data.close();
const latency = (endTime - startTime);
// store the current latency
latencies.push( latency );
if( latencies.length < max_rounds ) {
performTest();
}
else {
logResults();
}
};
// initial call
performTest();
// the actual test code
async function performTest() {
// we'll build a new random image every test
const w = 1920;
const h = 1080;
// make some noise
const data = Uint32Array.from( { length: w * h }, ()=> Math.random * 0xFFFFFF + 0xFF000000);
const imageData = new ImageData( new Uint8ClampedArray( data.buffer ), w, h );
let bitmapData = await createImageBitmap(imageData);
// start measuring the time it takes to transfer
startTime = performance.now();
videoWorker.postMessage( bitmapData, [ bitmapData ] );
}
// when all the tests are done
function logResults() {
const total = latencies.reduce( (total, lat) => total + lat );
const avg = total / latencies.length;
console.log( "average latency (ms)", avg );
console.log( "first ten absolute values", latencies.slice( 0, 10 ) );
}
function generateWorkerURL() {
const content = `onmessage = e => postMessage( e.data, [e.data] );`;
const blob = new Blob( [ content ], { type: 'text/javacript' } );
return URL.createObjectURL( blob );
}
Running 1000 tests leads to an average of <1.2ms per tests on my machine (and 0.12ms when not generating a new ImageData every test i.e without GC), while the first run takes about 11ms.
These results imply that the transferring of the data takes virtually no time (it's almost as fast as just waiting for the next event loop).
So your bottleneck is in an other castle and there is nothing to speed up in the messaging part.
Remember that if your main thread is blocked, so will be the handlers that fire from that main thread.

Related

JS Worker Performance - Parsing JSON

I'm experimenting with Workers as my user interface is very slow due to big tasks running in the background.
I'm starting at the simplest tasks such as parsing JSON. See below for my very simple code to create an async function running on a Worke.
Performance wise there is a big difference between:
JSON.parse(jsonStr);
and
await parseJsonAsync(jsonStr);
JSON.parse() takes 1ms whereas parseJsonAsync takes 102ms!
So my question is: are the overheads really that big for running worker threads or am I missing something ?
const worker = new Worker(new URL('../workers/parseJson.js', import.meta.url));
export async function parseJsonAsync(jsonStr) {
return new Promise(
(resolve, reject) => {
worker.onmessage = ({
data: {
jsonObject
}
}) => {
resolve(jsonObject);
};
worker.postMessage({
jsonStr: jsonStr,
});
}
);
}
parseJson.js
self.onmessage = ({
data: {
jsonStr
}
}) => {
let jsonObject = null;
try {
jsonObject = JSON.parse(jsonStr);
} catch (ex) {
} finally {
self.postMessage({
jsonObject: jsonObject
});
}
};
I can now confirm that the overhead of transferring message between threads is pretty big. But the raw performance of worker (at least in executing JSON.parse) is close to main thread.
TL;DR: Just compare numbers in 2 tables. Without sending big object via postMessage, worker perf is just fine.
For test payload jsonStr, I create a string of a long list of [{"foo":"bar"}, ...] repeat n times. The number of items in jsonStr can be tuned by changing Array.from({ length: number }).
I then do postMessage(jsonStr) to run JSON.parse in worker, when done parsing it sends back the parsed jsonObject. In main thread just I call JSON.parse(jsonStr) directly.
runTest(delay) use setTimeout to wait until the worker startup to run the actual test. runTest() without delay runs immediately so we can measure worker startup time.
Code for the test.
const blobURL = URL.createObjectURL(
new Blob(
[
"(",
function () {
self.onmessage = ({ data: jsonStr }) => {
let jsonObject = null;
try {
jsonObject = JSON.parse(jsonStr);
self.postMessage(["done", jsonObject]);
} catch (e) {
self.postMessage(["error", e]);
}
};
}.toString(),
")()",
],
{ type: "application/javascript" }
)
);
const worker = new Worker(blobURL);
const jsonStr = "[" + Array.from({ length: 1_000 }, () => `{"foo":"bar"}`).join(",") + "]";
function test(payload) {
worker.onmessage = ({ data }) => {
const delta = performance.now() - t0;
console.log("worker", delta);
console.log("worker response", data[0]);
};
const t0 = performance.now();
worker.postMessage(payload);
testParseJsonInMain(payload);
}
function testParseJsonInMain(payload) {
let obj;
try {
const t0 = performance.now();
obj = JSON.parse(payload);
const delta = performance.now() - t0;
console.log("main", delta);
} catch {}
}
function runTest(delay) {
if (delay) {
setTimeout(() => test(jsonStr), delay);
} else {
test(jsonStr);
}
}
runTest(1000);
I observe that it takes around 30ms to start the worker on my machine. If test run after worker startup, I got these numbers (unit in milliseconds):
#items in payload
main
worker
1,000
0.2
2.1
10,000
1.3
9.8
100,000
15.4
73.5
1,000,000
165
854
10,000,000
2633
15312
When payload reaches 10 million items, the worker really struggles (takes 15 seconds). At 10 million items, the jsonStr is around 140MB.
But if the worker does not send back the parsed jsonObject, the numbers are so much better. Just make a little change to above test code:
// worker code changed from:
self.postMessage(["done", jsonObject]);
// to:
self.postMessage(["done", typeof jsonObject]);
#items in payload
main
worker
1,000
0.2
1.2
10,000
2.1
3.5
100,000
15.7
26.2
1,000,000
196
232
10,000,000
2249
2801
P.S. I've actually done another test. Instead of postMessage(jsonStr), I used TextEncoder to turn the string into ArrayBuffer, then postMessage(arrayBuffer, arrayBuffer) which supposedly transfers the underlying memory from main thread directly to worker.
I did not see real difference in terms of time consumed, in fact it gets a little bit slower. Guess sending large string isn't an issue.

How to create video srcObject from VideoFrame?

I'm learning webcodecs now, and I saw things as below:
So I wonder maybe it can play video on video element with several pictures. I tried many times but it still can't work.
I create videoFrame from pictures, and then use MediaStreamTrackGenerator to creates a media track. But the video appears black when call play().
Here is my code:
const test = async () => {
const imgSrcList = [
'https://gw.alicdn.com/imgextra/i4/O1CN01CeTlwJ1Pji9Pu6KW6_!!6000000001877-2-tps-62-66.png',
'https://gw.alicdn.com/imgextra/i3/O1CN01h7tWZr1ZiTEk1K02I_!!6000000003228-2-tps-62-66.png',
'https://gw.alicdn.com/imgextra/i4/O1CN01CSwWiA1xflg5TnI9b_!!6000000006471-2-tps-62-66.png',
];
const imgEleList: HTMLImageElement[] = [];
await Promise.all(
imgSrcList.map((src, index) => {
return new Promise((resolve) => {
let img = new Image();
img.src = src;
img.crossOrigin = 'anonymous';
img.onload = () => {
imgEleList[index] = img;
resolve(true);
};
});
}),
);
const trackGenerator = new MediaStreamTrackGenerator({ kind: 'video' });
const writer = trackGenerator.writable.getWriter();
await writer.ready;
for (let i = 0; i < imgEleList.length; i++) {
const frame = new VideoFrame(imgEleList[i], {
duration: 500,
timestamp: i * 500,
alpha: 'keep',
});
await writer.write(frame);
frame.close();
}
// Call ready again to ensure that all chunks are written before closing the writer.
await writer.ready.then(() => {
writer.close();
});
const stream = new MediaStream();
stream.addTrack(trackGenerator);
const videoEle = document.getElementById('video') as HTMLVideoElement;
videoEle.onloadedmetadata = () => {
videoEle.play();
};
videoEle.srcObject = stream;
};
Thanks!
Disclaimer:
I am not an expert in this field and it's my first use of this API in this way. The specs and the current implementation don't seem to match, and it's very likely that things will change in the near future. So take this answer with all the salt you can, it is only backed by trial.
There are a few things that seems wrong in your implementation:
duration and timestamp are set in micro-seconds, that's 1/1,000,000s. Your 500 duration is then only half a millisecond, that would be something like 2000FPS and your three images would get all displayed in 1.5ms. You will want to change that.
In current Chrome's implementation, you need to specify the displayWidth and displayHeight members of the VideoFrameInit dictionary (though if I read the specs correctly that should have defaulted to the source image's width and height).
Then there is something I'm less sure about, but it seems that you can't batch-write many frames. It seems that the timestamp field is kind of useless in this case (even though it's required to be there, even with nonsensical values). Once again, specs have changed so it's hard to know if it's an implementation bug, or if it's supposed to work like that, but anyway it is how it is (unless I too missed something).
So to workaround that limitation you'll need to write periodically to the stream and append the frames when you want them to appear.
Here is one example of this, trying to keep it close to your own implementation by writing a new frame to the WritableStream when we want it to be presented.
const test = async() => {
const imgSrcList = [
'https://gw.alicdn.com/imgextra/i4/O1CN01CeTlwJ1Pji9Pu6KW6_!!6000000001877-2-tps-62-66.png',
'https://gw.alicdn.com/imgextra/i3/O1CN01h7tWZr1ZiTEk1K02I_!!6000000003228-2-tps-62-66.png',
'https://gw.alicdn.com/imgextra/i4/O1CN01CSwWiA1xflg5TnI9b_!!6000000006471-2-tps-62-66.png',
];
// rewrote this part to use ImageBitmaps,
// using HTMLImageElement works too
// but it's less efficient
const imgEleList = await Promise.all(
imgSrcList.map((src) => fetch(src)
.then(resp => resp.ok && resp.blob())
.then(createImageBitmap)
)
);
const trackGenerator = new MediaStreamTrackGenerator({
kind: 'video'
});
const duration = 1000 * 1000; // in µs (1/1,000,000s)
let i = 0;
const presentFrame = async() => {
i++;
const writer = trackGenerator.writable.getWriter();
const img = imgEleList[i % imgEleList.length];
await writer.ready;
const frame = new VideoFrame(img, {
duration, // value doesn't mean much, but required
timestamp: i * duration, // ditto
alpha: 'keep',
displayWidth: img.width * 2, // required
displayHeight: img.height * 2, // required
});
await writer.write(frame);
frame.close();
await writer.ready;
// unlock our Writable so we can write again at next frame
writer.releaseLock();
setTimeout(presentFrame, duration / 1000);
}
presentFrame();
const stream = new MediaStream();
stream.addTrack(trackGenerator);
const videoEle = document.getElementById('video');
videoEle.srcObject = stream;
};
test().catch(console.error)
<video id=video controls autoplay muted></video>

Audio playback slows down game

I am trying to develop a simple game using nw.js (node.js + chromium page).
<canvas width="1200" height="800" id="main"></canvas>
<script>
var Mouse = {x: 0, y: 0, fire: false};
(async function() {
"use strict";
const reload = 25;
var ireload = 0;
const audioCtx = new AudioContext();
let fire = await fetch('shotgun.mp3');
let bgMusic = await fetch('hard.mp3');
fire = await fire.arrayBuffer();
bgMusic = await bgMusic.arrayBuffer();
const bgMdecoded = await audioCtx.decodeAudioData(bgMusic);
const fireDecoded = await audioCtx.decodeAudioData(fire);
const bgM = audioCtx.createBufferSource();
bgM.buffer = bgMdecoded;
bgM.loop = true;
bgM.connect(audioCtx.destination)
bgM.start(0);
let shot = audioCtx.createBufferSource();
shot.buffer = fireDecoded;
shot.connect(audioCtx.destination);
document.getElementById('main').onmousedown = function(e) {
Mouse.x = e.layerX;
Mouse.y = e.layerY;
Mouse.fire = true;
}
function main(tick) {
var dt = lastTick - tick;
lastTick = tick;
///take fire
if(--ireload < 0 && Mouse.fire) {
ireload = reload;
shot.start(0);
shot = audioCtx.createBufferSource();
shot.buffer = fireDecoded;
shot.connect(audioCtx.destination);
Mouse.fire = false;
}
/* moving objects, rendering on thread with offscreen canvas */
requestAnimationFrame(main);
}
let lastTick = performance.now();
main(lastTick);
})();
</script>
I have stripped code to minimal working example.
The problem is with shooting, everytime I fire (///take fire), the game drops FPS. Exactly the same happens in Kaiido example (https://jsfiddle.net/sLpx6b3v/). This works great, using it in long periods, but playing multiple sounds (the game is shooter) several times, gives framerate drop and after some time GC hiccups.
Less than one year old gaming laptop is dropping 60fps to about 40fps, and about 44fps on Kaidos example.
What could be fixed with sound?
Desired behaviour is no lagging / no gc / no framedrops due to sound. The one in background works well.
I will try AudioWorklet, but it is hard to create one and process instantenous sounds (probably another question).
It is possible to reuse buffer, a bit hackish way.
First create
const audioCtx = new AudioContext();
then fetch resource as usual:
let fire = await fetch('shotgun.mp3');
fire = await fire.arrayBuffer();
fire = await audioCtx.decodeAudioData(fire);
const shot = audioCtx.createBufferSource();
shot.buffer = fire;
shot.loopEnd = 0.00001; //some small value to make it unplayable
shot.start(0);
Then, during event (mouse down in my case):
shot.loopEnd = 1; //that restarts sound and plays in a loop.
Next, after it was played, set again
shot.loopEnd = 0.00001;
In my case, I stop it inside requestAnimationFrame
<canvas width="1200" height="800" id="main"></canvas>
<script>
var Mouse = {x: 0, y: 0, fire: false};
(async function() {
"use strict";
const reload = 25;
var ireload = 0;
const audioCtx = new AudioContext();
let fire = await fetch('shotgun.mp3');
let bgMusic = await fetch('hard.mp3');
fire = await fire.arrayBuffer();
bgMusic = await bgMusic.arrayBuffer();
const bgMdecoded = await audioCtx.decodeAudioData(bgMusic);
const fireDecoded = await audioCtx.decodeAudioData(fire);
const bgM = audioCtx.createBufferSource();
bgM.buffer = bgMdecoded;
bgM.loop = true;
bgM.connect(audioCtx.destination)
bgM.start(0);
let shot = audioCtx.createBufferSource();
shot.buffer = fireDecoded;
shot.connect(audioCtx.destination);
shot.loopEnd = 0.00001; //some small value to make it unplayable
shot.start(0);
document.getElementById('main').onmousedown = function(e) {
Mouse.x = e.layerX;
Mouse.y = e.layerY;
Mouse.fire = true;
}
function main(tick) {
var dt = lastTick - tick;
lastTick = tick;
///take fire
//asuming 60fps, which is true in my case, I stop it after a second
if(reload < -35) {
shot.loopEnd = 0.00001;
}
if(--ireload < 0 && Mouse.fire) {
ireload = reload;
shot.loopEnd = 1; //that restarts sound and plays in a loop.
Mouse.fire = false;
}
/* moving objects, rendering on thread with offscreen canvas */
requestAnimationFrame(main);
}
let lastTick = performance.now();
main(lastTick);
})();
</script>
A note about GC, it is true that it handles audiobuffers quickly, but I have checked, GC fires only when there are allocations, and memory reallocations. Garbage Collector interupts all script execution, so there is jank, lag.
I use memory pool in tandem to this trick, allocating pool at initialisation and then only reuse objects, and get literally no GC after second sweep, it runs once, after initialisation and kicks in second time, after optimisation and reduces unused memory. After that, there is no GC at all. Using typed array and workers gives really performant combo, with 60 fps, crisp sound and no lags at all.
You may think that locking GC is a bad idea. Maybe you are right, but after all, wasting resources only because there is GC doesn't seem like a good idea either.
After tests, AudioWorklets seem to work as intended, but these are heavy, hard to maintain and consumes a lot of resources and writing processor that simply copies inputs to outputs defies it's purpose. PostMessaging system is really heavy process, and you have to either connect the standard way and recreate buffers, or copy it to Worklet space and manage it via shared arrays and atomic operations manually.
You may be interested also in: Writeup about WebAudio design where the author share the concerns and gets exactly the same problem, quote
I know I’m fighting an uphill battle here, but a GC is not what we
need during realtime audio playback.
Keeping a pool of AudioBuffers seems to work, though in my own test
app I still see slow growth to 12MB over time before a major GC wipes,
according to the Chrome profiler.
And Writeup about GC, where memory leaks in JavaScript are described. A quote:
Consider the following scenario:
A sizable set of allocations is performed.
Most of these elements (or all of them) are marked as unreachable (suppose we null a reference pointing to a cache we no
longer need).
No further allocations are performed.
In this scenario, most GCs will not run any further collection passes.
In other words, even though there are unreachable references available
for collection, these are not claimed by the collector. These are not
strictly leaks but still, result in higher-than-usual memory usage.

RTCDataChannel's "bufferedamountlow" event not firing in Safari?

I'm working on a project that utilizes WebRTC for file transfers, recently someone reported an issue saying that transfers end prematurely for bigger files. I've found the problem, and my solution to that problem was to rely on the bufferedamountlow event to coordinate the sending of chunks. I've also stopped closing the connection when the sender thinks it's complete.
For some reason, though, in Safari that event does not fire.
Here is the relevant code:
const connection = new RTCPeerConnection(rtcConfiguration);
const channel = connection.createDataChannel('sendDataChannel');
channel.binaryType = 'arraybuffer';
channel.addEventListener('open', () => {
const fileReader = new FileReader();
let offset = 0;
const nextSlice = (currentOffset: number) => {
// Do asynchronous thing with FileReader, that will result in
// channel.send(buffer) getting called.
// Also, offset gets increased by 16384 (the size of the buffer).
};
channel.bufferedAmountLowThreshold = 0;
channel.addEventListener('bufferedamountlow', () => nextSlice(offset));
nextSlice(0);
});
The longer version of my code is available here.
While researching the issue, I've realized that on Safari, my connection.stcp is undefined. (Since I've switched to connection.sctp.maxMessageSize instead of 16384 for my buffer size.) I would assume the problem is related to that.
What could be the cause for this problem? Let me add that on Chrome and Firefox everything works just fine without any issues whatsoever.
The bufferedamountlow event is not required for the proper function of my code, I would like for it to work, though, to get more precise estimates of current progress and speed on the sending end of the file transfer.
After some investigation, it comes to me that Safari has issues with 0 as a value for the bufferedAmountLowThreshold property.
When set to a non-zero value, the code functions properly.
Checking the bufferedAmount inside of the nextSlice function also increases the speed at which the chunks are being sent:
const bufferSize = connection.sctp?.maxMessageSize || 65535;
channel.addEventListener('open', () => {
const fileReader = new FileReader();
let offset = 0;
const nextSlice = (currentOffset: number) => {
const slice = file.slice(offset, currentOffset + bufferSize);
fileReader.readAsArrayBuffer(slice);
};
fileReader.addEventListener('load', e => {
const buffer = e.target.result as ArrayBuffer;
try {
channel.send(buffer);
} catch {
// Deal with failure...
}
offset += buffer.byteLength;
if (channel.bufferedAmount < bufferSize / 2) {
nextSlice(offset);
}
});
channel.bufferedAmountLowThreshold = bufferSize / 2;
channel.addEventListener('bufferedamountlow', () => nextSlice(offset));
nextSlice(0);
});

NodeJS multi-threaded app: Worker load is maxed out, thread uses <50% core time

I'm porting a game server to Javascript using NodeJS. Computation tasks are run in dedicated threads; This goes for physics simulation as well.
The CPU/Core load makes me wonder about the resource allocated to NodeJS workers. The computer is an i7 (4 cores, no hyperthreading) and a Windows 7 is run.
The code is provided right after listing the highlights of this testbed:
the main.js code simulates a parent process i.e. the listener thread. It's launching worker threads to handle calculation in non-blocking worker threads.
the physics.js code simulates a physics calculator, running CPU intensive busywork.
physics simulation is updated every 16ms
each update (step) reports its CPU usage to identify starvation
despite running 2 worker threads per CPU core in the testbed, the total CPU load won't exceed 70-80%.
I would have expected that maxing-out the worker's computation needs (i.e. reported load > 100%) would have required 1 thread per CPU core to reach 100% resource usage.
What am I doing/understanding wrong?
Below, the system monitor, showing 8 threads, all outputting >100% usage in NodeJS console yet none is above 10% of CPU processing power as per the sysmon report.
Even with 4 threads, none is above 10% of CPU processing power like there is a limitation to the max total CPU availability a NodeJS thread can consume.
main.js:
var cp = require('child_process');
for(var i = 0; i < 4; i++){
var child = cp.fork('./physics');
child.processId = i;
console.log('Process ' + i + ' created');
child.on('message', function(m) {
console.log('Process ' + this.processId + ': ' + m);
//child.kill();
});
}
Observe on line 3 that i < 4 meaning it's running 4 threads.
Now the physics.js :
function simulatePhysics(){
// var world = new b2World(new b2Vec2(0, 10), true);
process.send("Physics world creation complete");
update();
};
// Expected step duration: 16.66ms (60Hz)
var stepDuration = 1000 / 60;
function update(dt){
// Clock in
var dateIn = Date.now();
{
// rem: B2D uses seconds, not milliseconds
// world.Step(dt * 1000, 10, 10);
// world.ClearForces();
// Simulate CPU intensive processing
// rem: takes ~20ms/loop on this 2.9GHz i7-3520M, each step is over time
var a;
for(var i = 0; i < 20000000; i++) a++;
}
// Clock out
var dateOut = Date.now();
// Delta clock
var delta = dateOut - dateIn;
process.send("Core load: " + (100 * delta / stepDuration).toFixed(0) + "%");
if(delta > stepDuration){
// Immediately run another step using real step duration
setTimeout(function(){update(delta)}, 0);
}
else{
// Wait until the 1000/60ms are passed before processing the next step
var idle = stepDuration - delta;
setTimeout(function(){update(stepDuration)}, idle);
}
}

Categories