Promisify cluster worker events on message - javascript

I am trying to find a way to "promisify" event callbacks on workers, so that the master can:
wait for all workers to finish CPU intensive tasks,
then do some calculation based on the results returned.
I came up with the following code and this works fine, however I am not confident with the approach I am taking.
Function to creaet a promise that wraps the message event on a worker:
_waitAsync(worker) {
// using bluebird for Promise
return Promise.promisify((callback) => {
worker.on('message', callback.bind(undefined, undefined));
})();
}
Master calls as below:
doCPUIntensiveTaskAsync(){
const promises = [];
let k = 0;
for (var wid in cluster.workers) {
promises.push(this._waitAsync(cluster.workers[wid]).bind(this));
cluster.workers[wid].send(messages[k++]);
}
return Promise.all(promises).then(calulcate);
}
What are the recommended / better ways to take advantage of multi-core environment (ie. offload CPU intensive tasks to different thread/process) in Node.js?

start worker per each 'CPU intensive task', then let them exit. Don not reuse workers, restart them for next task.
Easy way to promisify worker exit:
import * as Q from 'q';
let worker = cluster.fork();
let workerExit = Q.denodeify(w.on.bind(w, 'exit'));
await workerExit();// here can be workerExit().then ... instead of await
console.log('child exited');

Related

does microtask queue instructions have higher priority than root-level instructions? [duplicate]

I wrote a simple function that returns Promise so should be non-blocking (in my opinion). Unfortunately, the program looks like it stops waiting for the Promise to finish. I am not sure what can be wrong here.
function longRunningFunc(val, mod) {
return new Promise((resolve, reject) => {
sum = 0;
for (var i = 0; i < 100000; i++) {
for (var j = 0; j < val; j++) {
sum += i + j % mod
}
}
resolve(sum)
})
}
console.log("before")
longRunningFunc(1000, 3).then((res) => {
console.log("Result: " + res)
})
console.log("after")
The output looks like expected:
before // delay before printing below lines
after
Result: 5000049900000
But the program waits before printing second and third lines. Can you explain what should be the proper way to get "before" and "after" printed first and then (after some time) the result?
Wrapping code in a promise (like you've done) does not make it non-blocking. The Promise executor function (the callback you pass to new Promise(fn) is called synchronously and will block which is why you see the delay in getting output.
In fact, there is no way to create your own plain Javascript code (like what you have) that is non-blocking except putting it into a child process, using a WorkerThread, using some third party library that creates new threads of Javascript or using the new experimental node.js APIs for threads. Regular node.js runs your Javascript as blocking and single threaded, whether it's wrapped in a promise or not.
You can use things like setTimeout() to change "when" your code runs, but whenever it runs, it will still be blocking (once it starts executing nothing else can run until it's done). Asynchronous operations in the node.js library all use some form of underlying native code that allows them to be asynchronous (or they just use other node.js asynchronous APIs that themselves use native code implementations).
But the program waits before printing second and third lines. Can you explain what should be the proper way to get "before" and "after" printed first and then (after some time) the result?
As I said above, wrapping things in promise executor function doesn't make them asynchronous. If you want to "shift" the timing of when things run (thought they are still synchronous), you can use a setTimeout(), but that's not really making anything non-blocking, it just makes it run later (still blocking when it runs).
So, you could do this:
function longRunningFunc(val, mod) {
return new Promise((resolve, reject) => {
setTimeout(() => {
sum = 0;
for (var i = 0; i < 100000; i++) {
for (var j = 0; j < val; j++) {
sum += i + j % mod
}
}
resolve(sum)
}, 10);
})
}
That would reschedule the time consuming for loop to run later and might "appear" to be non-blocking, but it actually still blocks - it just runs later. To make it truly non-blocking, you'd have to use one of the techniques mentioned earlier to get it out of the main Javascript thread.
Ways to create actual non-blocking code in node.js:
Run it in a separate child process and get an asynchronous notification when it's done.
Use the new experimental Worker Threads in node.js v11
Write your own native code add-on to node.js and use libuv threads or OS level threads in your implementation (or other OS level asynchronous tools).
Build on top of previously existing asynchronous APIs and have none of your own code that takes very long in the main thread.
The executor function of a promise is run synchronously, and this is why your code blocks the main thread of execution.
In order to not block the main thread of execution, you need to periodically and cooperatively yield control while the long running task is performed. In effect, you need to split the task into subtasks, and then coordinate the running of subtasks on new ticks of the event loop. In this way you give other tasks (like rendering and responding to user input) the opportunity to run.
You can either write your own async loop using the promise API, or you can use an async function. Async functions enable the suspension and resumation of functions (reentrancy) and hide most of the complexity from you.
The following code uses setTimeout to move subtasks onto new event loop ticks. Of course, this could be generalised, and batching could be used to find a balance between progress through the task and UI responsiveness; the batch size in this solution is only 1, and so progress is slow.
Finally: the real solution to this kind of problem is probably a Worker.
const $ = document.querySelector.bind(document)
const BIG_NUMBER = 1000
let count = 0
// Note that this could also use requestIdleCallback or requestAnimationFrame
const tick = (fn) => new Promise((resolve) => setTimeout(() => resolve(fn), 5))
async function longRunningTask(){
while (count++ < BIG_NUMBER) await tick()
console.log(`A big number of loops done.`)
}
console.log(`*** STARTING ***`)
longRunningTask().then(() => console.log(`*** COMPLETED ***`))
$('button').onclick = () => $('#output').innerHTML += `Current count is: ${count}<br/>`
* {
font-size: 16pt;
color: gray;
padding: 15px;
}
<button>Click me to see that the UI is still responsive.</button>
<div id="output"></div>

Optimal way of sharing load between worker threads

What is the optimal way of sharing linear tasks between worker threads to improve performance?
Take the following example of a basic Deno web-server:
Main Thread
// Create an array of four worker threads
const workers = new Array<Worker>(4).fill(
new Worker(new URL("./worker.ts", import.meta.url).href, {
type: "module",
})
);
for await (const req of server) {
// Pass this request to worker a worker thread
}
worker.ts
self.onmessage = async (req) => {
//Peform some linear task on the request and make a response
};
Would the optimal way of distributing tasks be something along the lines of this?
function* generator(): Generator<number> {
let i = 0;
while (true) {
i == 3 ? (i = 0) : i++;
yield i;
}
}
const gen = generator();
const workers = new Array<Worker>(4).fill(
new Worker(new URL("./worker.ts", import.meta.url).href, {
type: "module",
})
);
for await (const req of server) {
// Pass this request to a worker thread
workers[gen.next().value].postMessage(req);
}
Or is there a better way of doing this? Say, for example, using Attomics to determine which threads are free to accept another task.
When working with WorkerThread code like this, I found that the best way to distribute jobs was to have the WorkerThread ask the main thread for a job when the WorkerThread knew that it was done with the prior job. The main thread could then send it a new job in response to that message.
In the main thread, I maintained a queue of jobs and a queue of WorkerThreads waiting for a job. If the job queue was empty, then the WorkerThread queue would likely have some workerThreads in it waiting for a job. Then, any time a job is added to the job queue, the code checks to see if there's a workerThread waiting and, if so, removes it from the queue and sends it the next job.
Anytime a workerThread sends a message indicating it is ready for the next job, then we check the job queue. If there's a job there, it is removed and sent to that worker. If not, the worker is added to the WorkerThread queue.
This whole bit of logic was very clean, did not need atomics or shared memory (because everything was gated through the event loop of the main process) and wasn't very much code.
I arrived at this mechanism after trying several other ways that each had their own problems. In one case, I had concurrency issues, in another I was starving the event loop, in another, I didn't have proper flow control to the WorkerThreads and was overwhelming them and not distributing load equally.
There are some abstractions in Deno to handle these kind of needs very easily. Especially considering the pooledMap functionality.
So you have a server which is an async iterator and you would like to leverage threading to generate responses since each response depends on a time taking heavy computation right..?
Simple.
import { serve } from "https://deno.land/std/http/server.ts";
import { pooledMap } from "https://deno.land/std#0.173.0/async/pool.ts";
const server = serve({ port: 8000 }),
ress = pooledMap( window.navigator.hardwareConcurrency - 1
, server
, req => new Promise(v => v(respondWith(req))
);
for await (const res of ress) {
// respond with res
}
That's it. In this particular case the repondWith function bears the heavy calculation to prepare your response object. In case it's already an async function then you don't even need to wrap it into a promise. Obviously here I have just used available many less one threads but it's up to you to decide howmany threads to spawn.

Promise control flow [duplicate]

I wrote a simple function that returns Promise so should be non-blocking (in my opinion). Unfortunately, the program looks like it stops waiting for the Promise to finish. I am not sure what can be wrong here.
function longRunningFunc(val, mod) {
return new Promise((resolve, reject) => {
sum = 0;
for (var i = 0; i < 100000; i++) {
for (var j = 0; j < val; j++) {
sum += i + j % mod
}
}
resolve(sum)
})
}
console.log("before")
longRunningFunc(1000, 3).then((res) => {
console.log("Result: " + res)
})
console.log("after")
The output looks like expected:
before // delay before printing below lines
after
Result: 5000049900000
But the program waits before printing second and third lines. Can you explain what should be the proper way to get "before" and "after" printed first and then (after some time) the result?
Wrapping code in a promise (like you've done) does not make it non-blocking. The Promise executor function (the callback you pass to new Promise(fn) is called synchronously and will block which is why you see the delay in getting output.
In fact, there is no way to create your own plain Javascript code (like what you have) that is non-blocking except putting it into a child process, using a WorkerThread, using some third party library that creates new threads of Javascript or using the new experimental node.js APIs for threads. Regular node.js runs your Javascript as blocking and single threaded, whether it's wrapped in a promise or not.
You can use things like setTimeout() to change "when" your code runs, but whenever it runs, it will still be blocking (once it starts executing nothing else can run until it's done). Asynchronous operations in the node.js library all use some form of underlying native code that allows them to be asynchronous (or they just use other node.js asynchronous APIs that themselves use native code implementations).
But the program waits before printing second and third lines. Can you explain what should be the proper way to get "before" and "after" printed first and then (after some time) the result?
As I said above, wrapping things in promise executor function doesn't make them asynchronous. If you want to "shift" the timing of when things run (thought they are still synchronous), you can use a setTimeout(), but that's not really making anything non-blocking, it just makes it run later (still blocking when it runs).
So, you could do this:
function longRunningFunc(val, mod) {
return new Promise((resolve, reject) => {
setTimeout(() => {
sum = 0;
for (var i = 0; i < 100000; i++) {
for (var j = 0; j < val; j++) {
sum += i + j % mod
}
}
resolve(sum)
}, 10);
})
}
That would reschedule the time consuming for loop to run later and might "appear" to be non-blocking, but it actually still blocks - it just runs later. To make it truly non-blocking, you'd have to use one of the techniques mentioned earlier to get it out of the main Javascript thread.
Ways to create actual non-blocking code in node.js:
Run it in a separate child process and get an asynchronous notification when it's done.
Use the new experimental Worker Threads in node.js v11
Write your own native code add-on to node.js and use libuv threads or OS level threads in your implementation (or other OS level asynchronous tools).
Build on top of previously existing asynchronous APIs and have none of your own code that takes very long in the main thread.
The executor function of a promise is run synchronously, and this is why your code blocks the main thread of execution.
In order to not block the main thread of execution, you need to periodically and cooperatively yield control while the long running task is performed. In effect, you need to split the task into subtasks, and then coordinate the running of subtasks on new ticks of the event loop. In this way you give other tasks (like rendering and responding to user input) the opportunity to run.
You can either write your own async loop using the promise API, or you can use an async function. Async functions enable the suspension and resumation of functions (reentrancy) and hide most of the complexity from you.
The following code uses setTimeout to move subtasks onto new event loop ticks. Of course, this could be generalised, and batching could be used to find a balance between progress through the task and UI responsiveness; the batch size in this solution is only 1, and so progress is slow.
Finally: the real solution to this kind of problem is probably a Worker.
const $ = document.querySelector.bind(document)
const BIG_NUMBER = 1000
let count = 0
// Note that this could also use requestIdleCallback or requestAnimationFrame
const tick = (fn) => new Promise((resolve) => setTimeout(() => resolve(fn), 5))
async function longRunningTask(){
while (count++ < BIG_NUMBER) await tick()
console.log(`A big number of loops done.`)
}
console.log(`*** STARTING ***`)
longRunningTask().then(() => console.log(`*** COMPLETED ***`))
$('button').onclick = () => $('#output').innerHTML += `Current count is: ${count}<br/>`
* {
font-size: 16pt;
color: gray;
padding: 15px;
}
<button>Click me to see that the UI is still responsive.</button>
<div id="output"></div>

Difference between Javascript async functions and Web workers?

Threading-wise, what's the difference between web workers and functions declared as
async function xxx()
{
}
?
I am aware web workers are executed on separate threads, but what about async functions? Are such functions threaded in the same way as a function executed through setInterval is, or are they subject to yet another different kind of threading?
async functions are just syntactic sugar around
Promises and they are wrappers for callbacks.
// v await is just syntactic sugar
// v Promises are just wrappers
// v functions taking callbacks are actually the source for the asynchronous behavior
await new Promise(resolve => setTimeout(resolve));
Now a callback could be called back immediately by the code, e.g. if you .filter an array, or the engine could store the callback internally somewhere. Then, when a specific event occurs, it executes the callback. One could say that these are asynchronous callbacks, and those are usually the ones we wrap into Promises and await them.
To make sure that two callbacks do not run at the same time (which would make concurrent modifications possible, which causes a lot of trouble) whenever an event occurs the event does not get processed immediately, instead a Job (callback with arguments) gets placed into a Job Queue. Whenever the JavaScript Agent (= thread²) finishes execution of the current job, it looks into that queue for the next job to process¹.
Therefore one could say that an async function is just a way to express a continuous series of jobs.
async function getPage() {
// the first job starts fetching the webpage
const response = await fetch("https://stackoverflow.com"); // callback gets registered under the hood somewhere, somewhen an event gets triggered
// the second job starts parsing the content
const result = await response.json(); // again, callback and event under the hood
// the third job logs the result
console.log(result);
}
// the same series of jobs can also be found here:
fetch("https://stackoverflow.com") // first job
.then(response => response.json()) // second job / callback
.then(result => console.log(result)); // third job / callback
Although two jobs cannot run in parallel on one agent (= thread), the job of one async function might run between the jobs of another. Therefore, two async functions can run concurrently.
Now who does produce these asynchronous events? That depends on what you are awaiting in the async function (or rather: what callback you registered). If it is a timer (setTimeout), an internal timer is set and the JS-thread continues with other jobs until the timer is done and then it executes the callback passed. Some of them, especially in the Node.js environment (fetch, fs.readFile) will start another thread internally. You only hand over some arguments and receive the results when the thread is done (through an event).
To get real parallelism, that is running two jobs at the same time, multiple agents are needed. WebWorkers are exactly that - agents. The code in the WebWorker therefore runs independently (has it's own job queues and executor).
Agents can communicate with each other via events, and you can react to those events with callbacks. For sure you can await actions from another agent too, if you wrap the callbacks into Promises:
const workerDone = new Promise(res => window.onmessage = res);
(async function(){
const result = await workerDone;
//...
})();
TL;DR:
JS <---> callbacks / promises <--> internal Thread / Webworker
¹ There are other terms coined for this behavior, such as event loop / queue and others. The term Job is specified by ECMA262.
² How the engine implements agents is up to the engine, though as one agent may only execute one Job at a time, it very much makes sense to have one thread per agent.
In contrast to WebWorkers, async functions are never guaranteed to be executed on a separate thread.
They just don't block the whole thread until their response arrives. You can think of them as being registered as waiting for a result, let other code execute and when their response comes through they get executed; hence the name asynchronous programming.
This is achieved through a message queue, which is a list of messages to be processed. Each message has an associated function which gets called in order to handle the message.
Doing this:
setTimeout(() => {
console.log('foo')
}, 1000)
will simply add the callback function (that logs to the console) to the message queue. When it's 1000ms timer elapses, the message is popped from the message queue and executed.
While the timer is ticking, other code is free to execute. This is what gives the illusion of multithreading.
The setTimeout example above uses callbacks. Promises and async work the same way at a lower level — they piggyback on that message-queue concept, but are just syntactically different.
Workers are also accessed by asynchronous code (i.e. Promises) however Workers are a solution to the CPU intensive tasks which would block the thread that the JS code is being run on; even if this CPU intensive function is invoked asynchronously.
So if you have a CPU intensive function like renderThread(duration) and if you do like
new Promise((v,x) => setTimeout(_ => (renderThread(500), v(1)),0)
.then(v => console.log(v);
new Promise((v,x) => setTimeout(_ => (renderThread(100), v(2)),0)
.then(v => console.log(v);
Even if second one takes less time to complete it will only be invoked after the first one releases the CPU thread. So we will get first 1 and then 2 on console.
However had these two function been run on separate Workers, then the outcome we expect would be 2 and 1 as then they could run concurrently and the second one finishes and returns a message earlier.
So for basic IO operations standard single threaded asynchronous code is very efficient and the need for Workers arises from need of using tasks which are CPU intensive and can be segmented (assigned to multiple Workers at once) such as FFT and whatnot.
Async functions have nothing to do with web workers or node child processes - unlike those, they are not a solution for parallel processing on multiple threads.
An async function is just1 syntactic sugar for a function returning a promise then() chain.
async function example() {
await delay(1000);
console.log("waited.");
}
is just the same as
function example() {
return Promise.resolve(delay(1000)).then(() => {
console.log("waited.");
});
}
These two are virtually indistinguishable in their behaviour. The semantics of await or a specified in terms of promises, and every async function does return a promise for its result.
1: The syntactic sugar gets a bit more elaborate in the presence of control structures such as if/else or loops which are much harder to express as a linear promise chain, but it's still conceptually the same.
Are such functions threaded in the same way as a function executed through setInterval is?
Yes, the asynchronous parts of async functions run as (promise) callbacks on the standard event loop. The delay in the example above would implemented with the normal setTimeout - wrapped in a promise for easy consumption:
function delay(t) {
return new Promise(resolve => {
setTimeout(resolve, t);
});
}
I want to add my own answer to my question, with the understanding I gathered through all the other people's answers:
Ultimately, all but web workers, are glorified callbacks. Code in async functions, functions called through promises, functions called through setInterval and such - all get executed in the main thread with a mechanism akin to context switching. No parallelism exists at all.
True parallel execution with all its advantages and pitfalls, pertains to webworkers and webworkers alone.
(pity - I thought with "async functions" we finally got streamlined and "inline" threading)
Here is a way to call standard functions as workers, enabling true parallelism. It's an unholy hack written in blood with help from satan, and probably there are a ton of browser quirks that can break it, but as far as I can tell it works.
[constraints: the function header has to be as simple as function f(a,b,c) and if there's any result, it has to go through a return statement]
function Async(func, params, callback)
{
// ACQUIRE ORIGINAL FUNCTION'S CODE
var text = func.toString();
// EXTRACT ARGUMENTS
var args = text.slice(text.indexOf("(") + 1, text.indexOf(")"));
args = args.split(",");
for(arg of args) arg = arg.trim();
// ALTER FUNCTION'S CODE:
// 1) DECLARE ARGUMENTS AS VARIABLES
// 2) REPLACE RETURN STATEMENTS WITH THREAD POSTMESSAGE AND TERMINATION
var body = text.slice(text.indexOf("{") + 1, text.lastIndexOf("}"));
for(var i = 0, c = params.length; i<c; i++) body = "var " + args[i] + " = " + JSON.stringify(params[i]) + ";" + body;
body = body + " self.close();";
body = body.replace(/return\s+([^;]*);/g, 'self.postMessage($1); self.close();');
// CREATE THE WORKER FROM FUNCTION'S ALTERED CODE
var code = URL.createObjectURL(new Blob([body], {type:"text/javascript"}));
var thread = new Worker(code);
// WHEN THE WORKER SENDS BACK A RESULT, CALLBACK AND TERMINATE THE THREAD
thread.onmessage =
function(result)
{
if(callback) callback(result.data);
thread.terminate();
}
}
So, assuming you have this potentially cpu intensive function...
function HeavyWorkload(nx, ny)
{
var data = [];
for(var x = 0; x < nx; x++)
{
data[x] = [];
for(var y = 0; y < ny; y++)
{
data[x][y] = Math.random();
}
}
return data;
}
...you can now call it like this:
Async(HeavyWorkload, [1000, 1000],
function(result)
{
console.log(result);
}
);

NodeJS promise blocking requests

I am quite confused about why is my promise blocking the node app requests.
Here is my simplified code:
var express = require('express');
var someModule = require('somemodule');
app = express();
app.get('/', function (req, res) {
res.status(200).send('Main');
});
app.get('/status', function (req, res) {
res.status(200).send('Status');
});
// Init Promise
someModule.doSomething({}).then(function(){},function(){}, function(progress){
console.log(progress);
});
var server = app.listen(3000, function () {
var host = server.address().address;
var port = server.address().port;
console.log('Example app listening at http://%s:%s in %s environment',host, port, app.get('env'));
});
And the module:
var q = require('q');
function SomeModule(){
this.doSomething = function(){
return q.Promise(function(resolve, reject, notify){
for (var i=0;i<10000;i++){
notify('Progress '+i);
}
resolve();
});
}
}
module.exports = SomeModule;
Obviously this is very simplified. The promise function does some work that takes anywhere from 5 to 30 minutes and has to run only when server starts up.
There is NO async operation in that promise function. Its just a lot of data processing, loops etc.
I wont to be able to do requests right away though. So what I expect is when I run the server, I can go right away to 127.0.0.1:3000 and see Main and same for any other requests.
Eventually I want to see the progress of that task by accessing /status but Im sure I can make that work once the server works as expected.
At the moment, when I open / it just hangs until the promise job finishes..
Obviously im doing something wrong...
If your task is IO-bound go with process.nextTick. If your task is CPU-bound asynchronous calls won't offer much performance-wise. In that case you need to delegate the task to another process. An example solution would be to spawn a child process, do the work and pipe the results back to the parent process when done.
See nodejs.org/api/child_process.html for more.
If your application needs to do this often then forking lots of child processes quickly becomes a resource hog - each time you fork, a new V8 process will be loaded into memory. In this case it is probably better to use one of the multiprocessing modules like Node's own Cluster. This module offers easy creation and communication between master-worker processes and can remove a lot of complexity from your code.
See also a related question: Node.js - Sending a big object to child_process is slow
The main thread of Javascript in node.js is single threaded. So, if you do some giant loop that is processor bound, then that will hog the one thread and no other JS will run in node.js until that one operation is done.
So, when you call:
someModule.doSomething()
and that is all synchronous, then it does not return until it is done executing and thus the lines of code following that don't execute until the doSomething() method returns. And, just so you understand, the use of promises with synchronous CPU-hogging code does not help your cause at all. If it's synchronous and CPU bound, it's just going to take a long time to run before anything else can run.
If there is I/O involves in the loop (like disk I/O or network I/O), then there are opportunities to use async I/O operations and make the code non-blocking. But, if not and it's just a lot of CPU stuff, then it will block until done and no other code will run.
Your opportunities for changing this are:
Run the CPU consuming code in another process. Either create a separate program that you run as a child process that you can pass input to and get output from or create a separate server that you can then make async requests to.
Break the non-blocking work into chunks where you execute 100ms chunks of work at a time, then yield the processor back to the event loop (using something like setTimeout() to allow other things in the event queue to be serviced and run before you pick up and run the next chunk of work. You can see Best way to iterate over an array without blocking the UI for ideas on how to chunk synchronous work.
As an example, you could chunk your current loop. This runs up to 100ms of cycles and then breaks execution to give other things a chance to run. You can set the cycle time to whatever you want.
function SomeModule(){
this.doSomething = function(){
return q.Promise(function(resolve, reject, notify){
var cntr = 0, numIterations = 10000, timePerSlice = 100;
function run() {
if (cntr < numIterations) {
var start = Date.now();
while (Date.now() - start < timePerSlice && cntr < numIterations) {
notify('Progress '+cntr);
++cntr;
}
// give some other things a chance to run and then call us again
// setImmediate() is also an option here, but setTimeout() gives all
// other operations a chance to run alongside this operation
setTimeout(run, 10);
} else {
resolve();
}
}
run();
});
}
}

Categories