I have a feature in which I have to receive items from different users (identified with some ID) and process them in relative order (so user A's item 1 is processed before user A's item 2, but no restriction on the order compared to user B's item N). In order to throughput I'd like to process them asynchronously.
I've done this with multithread languages and concurrent queues before, leaving the thread running in a busy wait if there wasn't anything in the queue anymore. But I don't know how I would solve this with JavaScript's single-thread. I was thinking about using Promises and an infinite loop inside, but I don't know if that's an anti-pattern.
As a restriction, my client is reticent to use external libraries that haven't been cleared by their audit team, so I'm trying to do this artisanally.
Thank you
You generally don't busy wait in JavaScript ever. But, many things that you can do (send a request to a web server, load an image (same thing as send a request to a web server, wait for user input) take a callback or return a promise (another form of a callback)
To process lists of asynchronous things is pretty easy with async/await
function doThing(thingMsg) {
// this just represents some process that runs async
return new Promise((resolve) => {
const duration = Math.random() * 500 + 1500; // 0.5 to 2 seconds
setTimeout(() => {
resolve(thingMsg);
}, duration);
});
}
const userAToDoList = [
'mix flour',
'knead',
'bake',
'slice',
];
const userBToDoList = [
'chop',
'fry',
'braise',
'combine',
];
async function doListInOrder(list, userName) {
for (const item of list) {
const result = await doThing(item);
console.log(`${userName} did ${result}`);
}
console.log(`${userName} finished`);
}
doListInOrder(userAToDoList, 'userA');
doListInOrder(userBToDoList, 'userB');
Replace doThing with the things you want to do like "wait for a file to download", "wait for an image to download" or "wait for something from a worker" etc.
Related
What is the optimal way of sharing linear tasks between worker threads to improve performance?
Take the following example of a basic Deno web-server:
Main Thread
// Create an array of four worker threads
const workers = new Array<Worker>(4).fill(
new Worker(new URL("./worker.ts", import.meta.url).href, {
type: "module",
})
);
for await (const req of server) {
// Pass this request to worker a worker thread
}
worker.ts
self.onmessage = async (req) => {
//Peform some linear task on the request and make a response
};
Would the optimal way of distributing tasks be something along the lines of this?
function* generator(): Generator<number> {
let i = 0;
while (true) {
i == 3 ? (i = 0) : i++;
yield i;
}
}
const gen = generator();
const workers = new Array<Worker>(4).fill(
new Worker(new URL("./worker.ts", import.meta.url).href, {
type: "module",
})
);
for await (const req of server) {
// Pass this request to a worker thread
workers[gen.next().value].postMessage(req);
}
Or is there a better way of doing this? Say, for example, using Attomics to determine which threads are free to accept another task.
When working with WorkerThread code like this, I found that the best way to distribute jobs was to have the WorkerThread ask the main thread for a job when the WorkerThread knew that it was done with the prior job. The main thread could then send it a new job in response to that message.
In the main thread, I maintained a queue of jobs and a queue of WorkerThreads waiting for a job. If the job queue was empty, then the WorkerThread queue would likely have some workerThreads in it waiting for a job. Then, any time a job is added to the job queue, the code checks to see if there's a workerThread waiting and, if so, removes it from the queue and sends it the next job.
Anytime a workerThread sends a message indicating it is ready for the next job, then we check the job queue. If there's a job there, it is removed and sent to that worker. If not, the worker is added to the WorkerThread queue.
This whole bit of logic was very clean, did not need atomics or shared memory (because everything was gated through the event loop of the main process) and wasn't very much code.
I arrived at this mechanism after trying several other ways that each had their own problems. In one case, I had concurrency issues, in another I was starving the event loop, in another, I didn't have proper flow control to the WorkerThreads and was overwhelming them and not distributing load equally.
There are some abstractions in Deno to handle these kind of needs very easily. Especially considering the pooledMap functionality.
So you have a server which is an async iterator and you would like to leverage threading to generate responses since each response depends on a time taking heavy computation right..?
Simple.
import { serve } from "https://deno.land/std/http/server.ts";
import { pooledMap } from "https://deno.land/std#0.173.0/async/pool.ts";
const server = serve({ port: 8000 }),
ress = pooledMap( window.navigator.hardwareConcurrency - 1
, server
, req => new Promise(v => v(respondWith(req))
);
for await (const res of ress) {
// respond with res
}
That's it. In this particular case the repondWith function bears the heavy calculation to prepare your response object. In case it's already an async function then you don't even need to wrap it into a promise. Obviously here I have just used available many less one threads but it's up to you to decide howmany threads to spawn.
I'm stuck in Async programming in Javascript.
I know that Promise.all() will run parallel.
Could you pls tell me what I'm wrong with this code below?
It should take 100ms. But actually, it takes 200ms :(
// 1. Define function here
var getFruit = async (name) => {
const fruits = {
pineapple: ":pineapple:",
peach: ":peach:",
strawberry: ":strawberry:"
};
await fetch('https://jsonplaceholder.typicode.com/photos'); // abount 100ms
return fruits[name];
};
var makeSmoothie = async () => {
const a = getFruit('pineapple');
const b = getFruit('strawberry');
const smoothie = await Promise.all([a, b]);
return smoothie;
//return [a, b];
};
/// 2. Execute code here
var tick = Date.now();
var log = (v) => console.log(`${v} \n Elapsed: ${Date.now() - tick}`);
makeSmoothie().then(log);
Your logic is fine, it is running as parallel as possible from client side.
You can test this by waiting on setTimeout instead of the fetch:
await new Promise(resolve => setTimeout(resolve, 100));
It's possible the placeholder site is queuing connections on its side or something. Either way you should measure with something predictable, not network.
Edit:
Just to explain a bit more that won't fit in a comment.
Let's put my waiting trick into an actual function:
const wait = ms => new Promise(resolve => setTimeout(resolve, ms));
OK, now you would call this as await wait(100). Now to compare:
await wait(100)
await fetch(...)
In terms of your code setup, they are the same. They are perform some async task that takes around 100ms and returns. If Promise.all() were not running this in parallel, the wait(100) version would certainly take 200ms or longer.
fetch is not as reliable of a test as setTimeout because it is running over the network. There's a lot of things you can't control with this call, like:
Some browsers limit how many parallel connections are made to the same domain
Lag in DNS or the server itself, typical network hiccups
The domain itself might force 1 connection at a time from your IP
It's not clear exactly what is causing the apparent synchronous behavior here. Maybe you will find some answers in the network panel. But in terms of code, there is nothing else for you to do. It is provably parallelized with the setTimeout test, which is much more reliable as a test since it is just a local timer.
Disclaimer: I'm not experienced with programming or with networks in general so I might be missing something quite obvious.
So i'm making a function in node.js that should go over an array of image links from my database and check if they're still working. There's thousands of links to check so I can't just fire off several thousand fetch calls at once and wait for results, instead I'm staggering the requests, going 10 by 10 and doing head requests to minimize the bandwidth usage.
I have two issues.
The first one is that after fetching the first 10-20 links quickly, the other requests take quite a bit longer and 9 or 10 out of 10 of them will time out. This might be due to some sort of network mechanism that throttles my requests when there are many being fired at once, but I'm thinking it's likely due to my second issue.
The second issue is that the checking process slows down after a few iterations. Here's an outline of what I'm doing. I'm taking the string array of image links and slicing it 10 by 10 then I check those 10 posts in 10 promises: (ignore the i and j variables, they're there just to track the individual promises and timeouts for loging/debugging)
const partialResult = await Promise.all(postsToCheck.map(async (post, j) => await this.checkPostForBrokenLink(post, i + j)));
within checkPostForBrokenLink I have a race between the fetch and a timeout of 10 seconds because I don't want to have to wait for the connection to time out every time timing out is a problem, I give it 10 seconds and then flag it as having timed out and move on.
const timeoutPromise = index => {
let timeoutRef;
const promise = new Promise<null>((resolve, reject) => {
const start = new Date().getTime();
console.log('===TIMEOUT INIT===' + index);
timeoutRef = setTimeout(() => {
const end = new Date().getTime();
console.log('===TIMEOUT FIRE===' + index, end - start);
resolve(null);
}, 10 * 1000);
});
return { timeoutRef, promise, index };
};
const fetchAndCancelTimeout = timeout => {
return fetch(post.fileUrl, { method: 'HEAD' })
.then(result => {
return result;
})
.finally(() => {
console.log('===CLEAR===' + index); //index is from the parent function
clearTimeout(timeout);
});
};
const timeout = timeoutPromise(index);
const videoTest = await Promise.race([fetchAndCancelTimeout(timeout.timeoutRef), timeout.promise]);
if fetchAndCancelTimeout finishes before timeout.promise does, it will cancel that timeout, but if the timeout finishes first the promise is still "resolving" in the background, despite the code having moved on. I'm guessing this is why my code is slowing down. The later timeouts take 20-30 seconds from being set up to firing, despite being set to 10 seconds. As far as I know, this has to be because the main process is busy and doesn't have time to execute the event queue, though I don't really know what it could be doing except waiting for the promises to resolve.
So the question is, first off, am I doing something stupid here that I shouldn't be doing and that's causing everything to be slow? Secondly, if not, can I somehow manually stop the execution of the fetch promise if the timeout fires first so as not to waste resources on a pointless process? Lastly, is there a better way to check if a large number of links are valid that what I'm doing here?
I found the problem and it wasn't, at least not directly, related to promise buildup. The code shown was for checking video links but, for images, the fetch call was done by a plugin and that plugin was causing the slowdown. When I started using the same code for both videos and images, the process suddenly became orders of magnitude quicker. I didn't think to check the plugin at first because it was supposed to only do a head request and format the results which shouldn't be an issue.
For anyone looking at this trying to find a way to cancel a fetch, #some provided an idea that seems like it might work. Check out https://www.npmjs.com/package/node-fetch#request-cancellation-with-abortsignal
Something you might want to investigate here is the Bluebird Promise library.
There are two functions in particular that I believe could simplify your implementation regarding rate limiting your requests and handling timeouts.
Bluebird Promise.map has a concurrency option (link), which allows you to set the number of concurrent requests and it also has a Promise.timeout function (link) which will return a rejection of the promise if a certain timeout has occurred.
I am using npm ws module (or actually the wrapper called isomorphic-ws) for websocket connection.
NPM Module: isomorphic-ws
I use it to receive some array data from a websocket++ server running on the same PC. This data is then processed and displayed as a series of charts.
Now the problem is that the handling itself takes a very long time. I use one message to calculate 16 charts and for each of them I need to calculate a lot of logarithms and other slow operations and all that in JS. Well, the whole refresh operation takes about 20 seconds.
Now I could actually live with that, but the problem is that when I get a new request it is processed after the whole message handler is finished. And if I get several requests in the meantime, all of them shall be processed as they came in. And so the requests are there queued and the current state gets more and more outdated as the time goes on...
I would like to have a way of detecting that there is another message waiting to be processed. If that is the case I could just stop the current handler at any time and start over... So when using npm ws, is there a way of telling that there is another message waiting in to be processed?
Thanks
You need to create some sort of cancelable job wrapper. It's hard to give a concrete suggestion without seeing your code. But it could be something like this.
const processArray = array => {
let canceled = false;
const promise = new Promise((resolve, reject) => {
// do something with the array
for(let i = 0; i < array.length; i++) {
// check on each iteration if the job has been canceled
if(canceled) return reject({ reason: 'canceled' });
doSomething(array[i])
}
resolve(result)
})
return {
cancel: () => {
cancel = true
},
promise
}
}
const job = processArray([1, 2, 3, ...1000000]) // huge array
// handle the success
job.promise.then(result => console.log(result))
// Cancel the job
job.cancel()
I'm sure there are libraries to serve this exact purpose. But I just wanted to give a basic example of how it could be done.
Threading-wise, what's the difference between web workers and functions declared as
async function xxx()
{
}
?
I am aware web workers are executed on separate threads, but what about async functions? Are such functions threaded in the same way as a function executed through setInterval is, or are they subject to yet another different kind of threading?
async functions are just syntactic sugar around
Promises and they are wrappers for callbacks.
// v await is just syntactic sugar
// v Promises are just wrappers
// v functions taking callbacks are actually the source for the asynchronous behavior
await new Promise(resolve => setTimeout(resolve));
Now a callback could be called back immediately by the code, e.g. if you .filter an array, or the engine could store the callback internally somewhere. Then, when a specific event occurs, it executes the callback. One could say that these are asynchronous callbacks, and those are usually the ones we wrap into Promises and await them.
To make sure that two callbacks do not run at the same time (which would make concurrent modifications possible, which causes a lot of trouble) whenever an event occurs the event does not get processed immediately, instead a Job (callback with arguments) gets placed into a Job Queue. Whenever the JavaScript Agent (= thread²) finishes execution of the current job, it looks into that queue for the next job to process¹.
Therefore one could say that an async function is just a way to express a continuous series of jobs.
async function getPage() {
// the first job starts fetching the webpage
const response = await fetch("https://stackoverflow.com"); // callback gets registered under the hood somewhere, somewhen an event gets triggered
// the second job starts parsing the content
const result = await response.json(); // again, callback and event under the hood
// the third job logs the result
console.log(result);
}
// the same series of jobs can also be found here:
fetch("https://stackoverflow.com") // first job
.then(response => response.json()) // second job / callback
.then(result => console.log(result)); // third job / callback
Although two jobs cannot run in parallel on one agent (= thread), the job of one async function might run between the jobs of another. Therefore, two async functions can run concurrently.
Now who does produce these asynchronous events? That depends on what you are awaiting in the async function (or rather: what callback you registered). If it is a timer (setTimeout), an internal timer is set and the JS-thread continues with other jobs until the timer is done and then it executes the callback passed. Some of them, especially in the Node.js environment (fetch, fs.readFile) will start another thread internally. You only hand over some arguments and receive the results when the thread is done (through an event).
To get real parallelism, that is running two jobs at the same time, multiple agents are needed. WebWorkers are exactly that - agents. The code in the WebWorker therefore runs independently (has it's own job queues and executor).
Agents can communicate with each other via events, and you can react to those events with callbacks. For sure you can await actions from another agent too, if you wrap the callbacks into Promises:
const workerDone = new Promise(res => window.onmessage = res);
(async function(){
const result = await workerDone;
//...
})();
TL;DR:
JS <---> callbacks / promises <--> internal Thread / Webworker
¹ There are other terms coined for this behavior, such as event loop / queue and others. The term Job is specified by ECMA262.
² How the engine implements agents is up to the engine, though as one agent may only execute one Job at a time, it very much makes sense to have one thread per agent.
In contrast to WebWorkers, async functions are never guaranteed to be executed on a separate thread.
They just don't block the whole thread until their response arrives. You can think of them as being registered as waiting for a result, let other code execute and when their response comes through they get executed; hence the name asynchronous programming.
This is achieved through a message queue, which is a list of messages to be processed. Each message has an associated function which gets called in order to handle the message.
Doing this:
setTimeout(() => {
console.log('foo')
}, 1000)
will simply add the callback function (that logs to the console) to the message queue. When it's 1000ms timer elapses, the message is popped from the message queue and executed.
While the timer is ticking, other code is free to execute. This is what gives the illusion of multithreading.
The setTimeout example above uses callbacks. Promises and async work the same way at a lower level — they piggyback on that message-queue concept, but are just syntactically different.
Workers are also accessed by asynchronous code (i.e. Promises) however Workers are a solution to the CPU intensive tasks which would block the thread that the JS code is being run on; even if this CPU intensive function is invoked asynchronously.
So if you have a CPU intensive function like renderThread(duration) and if you do like
new Promise((v,x) => setTimeout(_ => (renderThread(500), v(1)),0)
.then(v => console.log(v);
new Promise((v,x) => setTimeout(_ => (renderThread(100), v(2)),0)
.then(v => console.log(v);
Even if second one takes less time to complete it will only be invoked after the first one releases the CPU thread. So we will get first 1 and then 2 on console.
However had these two function been run on separate Workers, then the outcome we expect would be 2 and 1 as then they could run concurrently and the second one finishes and returns a message earlier.
So for basic IO operations standard single threaded asynchronous code is very efficient and the need for Workers arises from need of using tasks which are CPU intensive and can be segmented (assigned to multiple Workers at once) such as FFT and whatnot.
Async functions have nothing to do with web workers or node child processes - unlike those, they are not a solution for parallel processing on multiple threads.
An async function is just1 syntactic sugar for a function returning a promise then() chain.
async function example() {
await delay(1000);
console.log("waited.");
}
is just the same as
function example() {
return Promise.resolve(delay(1000)).then(() => {
console.log("waited.");
});
}
These two are virtually indistinguishable in their behaviour. The semantics of await or a specified in terms of promises, and every async function does return a promise for its result.
1: The syntactic sugar gets a bit more elaborate in the presence of control structures such as if/else or loops which are much harder to express as a linear promise chain, but it's still conceptually the same.
Are such functions threaded in the same way as a function executed through setInterval is?
Yes, the asynchronous parts of async functions run as (promise) callbacks on the standard event loop. The delay in the example above would implemented with the normal setTimeout - wrapped in a promise for easy consumption:
function delay(t) {
return new Promise(resolve => {
setTimeout(resolve, t);
});
}
I want to add my own answer to my question, with the understanding I gathered through all the other people's answers:
Ultimately, all but web workers, are glorified callbacks. Code in async functions, functions called through promises, functions called through setInterval and such - all get executed in the main thread with a mechanism akin to context switching. No parallelism exists at all.
True parallel execution with all its advantages and pitfalls, pertains to webworkers and webworkers alone.
(pity - I thought with "async functions" we finally got streamlined and "inline" threading)
Here is a way to call standard functions as workers, enabling true parallelism. It's an unholy hack written in blood with help from satan, and probably there are a ton of browser quirks that can break it, but as far as I can tell it works.
[constraints: the function header has to be as simple as function f(a,b,c) and if there's any result, it has to go through a return statement]
function Async(func, params, callback)
{
// ACQUIRE ORIGINAL FUNCTION'S CODE
var text = func.toString();
// EXTRACT ARGUMENTS
var args = text.slice(text.indexOf("(") + 1, text.indexOf(")"));
args = args.split(",");
for(arg of args) arg = arg.trim();
// ALTER FUNCTION'S CODE:
// 1) DECLARE ARGUMENTS AS VARIABLES
// 2) REPLACE RETURN STATEMENTS WITH THREAD POSTMESSAGE AND TERMINATION
var body = text.slice(text.indexOf("{") + 1, text.lastIndexOf("}"));
for(var i = 0, c = params.length; i<c; i++) body = "var " + args[i] + " = " + JSON.stringify(params[i]) + ";" + body;
body = body + " self.close();";
body = body.replace(/return\s+([^;]*);/g, 'self.postMessage($1); self.close();');
// CREATE THE WORKER FROM FUNCTION'S ALTERED CODE
var code = URL.createObjectURL(new Blob([body], {type:"text/javascript"}));
var thread = new Worker(code);
// WHEN THE WORKER SENDS BACK A RESULT, CALLBACK AND TERMINATE THE THREAD
thread.onmessage =
function(result)
{
if(callback) callback(result.data);
thread.terminate();
}
}
So, assuming you have this potentially cpu intensive function...
function HeavyWorkload(nx, ny)
{
var data = [];
for(var x = 0; x < nx; x++)
{
data[x] = [];
for(var y = 0; y < ny; y++)
{
data[x][y] = Math.random();
}
}
return data;
}
...you can now call it like this:
Async(HeavyWorkload, [1000, 1000],
function(result)
{
console.log(result);
}
);