Related
I am trying to understand the working of workers in NodeJS.
My understanding is everytime we spawn a worker , it will create a new thread with it own Node/V8 instance.
So will the below code spawn 50 threads?
How is it distributed over the cpu cores?
This is the index.js
const { Worker } = require("worker_threads");
var count = 0;
console.log("Start Program");
const runService = () => {
return new Promise((resolve, reject) => {
const worker = new Worker("./service.js", {});
worker.on("message", resolve);
worker.on("error", reject);
worker.on("exit", code => {
if (code != 0) {
reject(new Error("Worker has stopped"));
}
});
});
};
const run = async () => {
const result = await runService();
console.log(count++);
console.log(result);
};
for (let i = 0; i < 50; i++) {
run().catch(error => console.log(error));
}
setTimeout(() => console.log("End Program"), 2000);
and this is the service.js file
const { workerData, parentPort } = require("worker_threads");
// You can do any heavy stuff here, in a synchronous way
// without blocking the "main thread"
const sleep = () => {
return new Promise(resolve => setTimeout(() => resolve, 500));
};
let cnt = 0;
for (let i = 0; i < 10e8; i += 1) {
cnt += 1;
}
parentPort.postMessage({ data: cnt });
So will the below code spawn 50 threads?
....
for (let i = 0; i < 50; i++) {
run().catch(error => console.log(error));
}
Yes.
How is it distributed over the cpu cores?
The OS will handle this.
Depending on the OS, there is a feature called processor affinity that allow you to manually set the "affinity" or preference a task has for a CPU core. On many OSes this is just a hint and the OS will override your preference if it needs to. Some real-time OSes treat this as mandatory allowing you more control over the hardware (when writing algorithms for self-driving cars or factory robots you sometimes don't want the OS to take control of your carefully crafted software at random times).
Some OSes like Linux allow you to set processor affinity with command line commands so you can easily write a shell script or use child_process to fine-tune your threads. At the moment there is no built-in way to manage processor affinity for worker threads. There is a third party module that I'm aware of that does this on Windows and Linux: nodeaffinity but it doesn't work on Max OSX (and other OSes like BSD, Solaris/Illumos etc.).
See try to understand this Nodejs is single threaded and when started it uses the thread so the number of workers depend on how much thread your system creates and trying forking the child process more the number of your threads wont help it will just slow down the whole process and result in lest productivity.
In my JS app I'm using the async / await feature. I would like to perform multiple API calls and would like them to be fired one after other. In other words I would like to replace this simple method:
const addTask = async (url, options) => {
return await fetch(url, options)
}
with something more complex.. like:
let tasksQueue = []
const addTask = async (url, options) => {
tasksQueue.push({url, options})
...// perform fetch in queue
return await ...
}
What will be the best way to handle the asynchronous returns?
You could use a Queue data structure as your base and add special behavior in a child class. a Queue has a well known interface of two methods enqueue() (add new item to end) and dequeue() (remove first item). In your case dequeue() awaits for the async task.
Special behavior:
Each time a new task (e.g. fetch('url')) gets enqueued, this.dequeue() gets invoked.
What dequeue() does:
if queue is empty ➜ return false (break out of recursion)
if queue is busy ➜ return false (prev. task not finished)
else ➜ remove first task from queue and run it
on task "complete" (successful or with errors) ➜ recursive call dequeue() (2.), until queue is empty..
class Queue {
constructor() { this._items = []; }
enqueue(item) { this._items.push(item); }
dequeue() { return this._items.shift(); }
get size() { return this._items.length; }
}
class AutoQueue extends Queue {
constructor() {
super();
this._pendingPromise = false;
}
enqueue(action) {
return new Promise((resolve, reject) => {
super.enqueue({ action, resolve, reject });
this.dequeue();
});
}
async dequeue() {
if (this._pendingPromise) return false;
let item = super.dequeue();
if (!item) return false;
try {
this._pendingPromise = true;
let payload = await item.action(this);
this._pendingPromise = false;
item.resolve(payload);
} catch (e) {
this._pendingPromise = false;
item.reject(e);
} finally {
this.dequeue();
}
return true;
}
}
// Helper function for 'fake' tasks
// Returned Promise is wrapped! (tasks should not run right after initialization)
let _ = ({ ms, ...foo } = {}) => () => new Promise(resolve => setTimeout(resolve, ms, foo));
// ... create some fake tasks
let p1 = _({ ms: 50, url: '❪𝟭❫', data: { w: 1 } });
let p2 = _({ ms: 20, url: '❪𝟮❫', data: { x: 2 } });
let p3 = _({ ms: 70, url: '❪𝟯❫', data: { y: 3 } });
let p4 = _({ ms: 30, url: '❪𝟰❫', data: { z: 4 } });
const aQueue = new AutoQueue();
const start = performance.now();
aQueue.enqueue(p1).then(({ url, data }) => console.log('%s DONE %fms', url, performance.now() - start)); // = 50
aQueue.enqueue(p2).then(({ url, data }) => console.log('%s DONE %fms', url, performance.now() - start)); // 50 + 20 = 70
aQueue.enqueue(p3).then(({ url, data }) => console.log('%s DONE %fms', url, performance.now() - start)); // 70 + 70 = 140
aQueue.enqueue(p4).then(({ url, data }) => console.log('%s DONE %fms', url, performance.now() - start)); // 140 + 30 = 170
Interactive demo:
Full code demo: https://codesandbox.io/s/async-queue-ghpqm?file=/src/index.js
You can play around and watch results in console and/or dev-tools "performance" tab. The rest of this answer is based on it.
Explain:
enqueue() returns a new Promise, that will be resolved(or rejected) at some point later. This Promise can be used to handle the response of your async task Fn.
enqueue() actually push() an Object into the queue, that holds the task Fn and the control methods for the returned Promise.
Since the unwrapped returned Promise insta. starts to run, this.dequeue() is invoked each time we enqueue a new task.
With some performance.measure() added to our task, we get good visualization of our queue:
(*.gif animation)
1st row is our queue instance
New enqueued tasks have a "❚❚ waiting.." period (3nd row) (might be < 1ms if queue is empty`)
At some point it is dequeued and "▶ runs.." for a while (2nd row)
Log output (console.table()):
Explain:
1st task is enqueue()d at 2.58ms right after queue initialization.
Since our queue is empty there is like no ❚❚ waiting (0.04ms➜ ~40μm).
Task runtime 13.88ms ➜ dequeue
Class Queue is just a wrapper for native Array Fn´s!
You can of course implement this in one class. I just want to show, that you can build what you want from already known data structures. There are some good reasons for not using an Array:
A Queue data-structure is defined by an Interface of two public methods. Using an Array might tempt others to use native Array methods on it like .reverse(),.. which would break the definition.
enqueue() and dequeue() are much more readable than push() and shift()
If you already have an out-implemented Queue class, you can extend from it (re-usable code)
You can replace the item Array in class Queue by an other data structure: A "Doubly Linked List" which would reduce code complexity for Array.shift() from O(n) [linear] to O(1) [constant]. (➜ better time complexity than native array Fn!) (➜ final demo)
Code limitations
This AutoQueue class is not limited to async functions. It handles anything, that can be called like await item[MyTask](this):
let task = queue => {..} ➜ sync functions
let task = async queue => {..} ➜ async functions
let task = queue => new Promise(resolve => setTimeout(resolve, 100) ➜ new Promise()
Note: We already call our tasks with await, where await wraps the response of the task into a Promise.
Nr 2. (async function), always returns a Promise on its own, and the await call just wraps a Promise into an other Promise, which is slightly less efficient.
Nr 3. is fine. Returned Promises will not get wrapped by await
This is how async functions are executed: (source)
The result of an async function is always a Promise p. That Promise is created when starting the execution of the async function.
The body is executed. Execution may finish permanently via return or throw. Or it may finish temporarily via await; in which case execution will usually continue later on.
The Promise p is returned.
The following code demonstrates how that works:
async function asyncFunc() {
console.log('asyncFunc()'); // (A)
return 'abc';
}
asyncFunc().
then(x => console.log(`Resolved: ${x}`)); // (B)
console.log('main'); // (C)
// Output:
// asyncFunc()
// main
// Resolved: abc
You can rely on the following order:
Line (A): the async function is started synchronously. The async function’s Promise is resolved via return.
Line (C): execution continues.
Line (B): Notification of Promise resolution happens asynchronously.
Read more: "Callable values"
Read more: "Async functions"
Performance limitations
Since AutoQueue is limited to handle one task after the other, it might become a bottleneck in our app. Limiting factors are:
Tasks per time: ➜ Frequency of new enqueue()d tasks.
Runtime per task ➜ Blocking time in dequeue() until task complete
1. Tasks per time
This is our responsibility! We can get the current size of the queue at any time: size = queue.size. Your outer script needs a "fail-over" case for a steadily growing queue (check "Stacked wait times" section).
You want to avoid a "queue overflow" like this, where average/mean waitTime increases over time.
+-------+----------------+----------------+----------------+----------------+
| tasks | enqueueMin(ms) | enqueueMax(ms) | runtimeMin(ms) | runtimeMax(ms) |
| 20 | 0 | 200 | 10 | 30 |
+-------+----------------+----------------+----------------+----------------+
➜ Task 20/20 waits for 195ms until exec starts
➜ From the time, our last task was randomly enqueued, it takes another + ~232ms, until all tasks were resolved.
2. Runtime per task
This one is harder to deal with. (Waiting for a fetch() can not be improved and we need to wait until the HTTP request completed).
Maybe your fetch() tasks rely on each others response and a long runtime will block the others.
But there are some things we could do:
Maybe we could cache responses ➜ Reduce runtime on next enqueue.
Maybe we fetch() from a CDN and have an alternative URI we could use. In this case we can return a new Promise from our task that will be run before the next task is enqueue()d. (see "Error handling"):
queue.enqueue(queue => Promise.race(fetch('url1'), fetch('url2')));
Maybe your have some kind of "long polling" or periodic ajax task that runs every x seconds thatcan not be cached. Even if you can not reduce the runtime itself, you could record the runtimes which would give you an aprox. estimation of the next run. Maybe can swap long running tasks to other queue instances.
Balanced AutoQueue
What is an "efficient" Queue? - Your first thought might be something like:
The most efficient Queue handles most tasks in shortest period of time?
Since we can not improve our task runtime, can we lower the waiting time? The example is a queue with zero (~0ms) waiting time between tasks.
Hint: In order to compare our next examples we need some base stats that will not change:
+-------+----------------+----------------+------------------+------------------+
| count | random fake runtime for tasks | random enqueue() offset for tasks |
+-------+----------------+----------------+------------------+------------------+
| tasks | runtimeMin(ms) | runtimeMax(ms) | msEnqueueMin(ms) | msEnqueueMax(ms) |
| 200 | 10 | 30 | 0 | 4000 |
+-------+----------------+----------------+------------------+------------------+
Avg. task runtime: ⇒ (10ms + 30ms) / 2 = 20ms
Total time: ⇒ 20ms * 200 = 4000ms ≙ 4s
➜ We expect our queue to be resolved after ~4s
➜ For consistent enqueue() frequency we set msEnqueueMax to 4000
AutoQueue finished last dequeue() after ~4.12s (^^ see tooltip).
Which is ~120ms longer than our expected 4s:
Hint: There is a small "Log" block" after each task ~0.3ms, where I build/push an Object with log marks to a global 'Array' for the console.table() log at the end. This explains 200 * 0.3ms = 60ms.. The missing 60msare untracked (you see the small gap between the tasks) -> 0.3ms/task for our test loop and probably some delay from open Dev-Tools,..
We come back to these timings later.
The initialization code for our queue:
const queue = new AutoQueue();
// .. get 200 random Int numbers for our task "fake" runtimes [10-30]ms
let runtimes = Array.from({ length: 200 }, () => rndInt(10, 30));
let i = 0;
let enqueue = queue => {
if (i >= 200) {
return queue; // break out condition
}
i++;
queue
.enqueue(
newTask({ // generate a "fake" task with of a rand. runtime
ms: runtimes[i - 1],
url: _(i)
})
)
.then(payload => {
enqueue(queue);
});
};
enqueue(queue); // start recurion
We recursively enqueue() our next task, right after the previous finished. You might have noticed the analogy to a typical Promise.then() chain, right?
Hint: We don´t need a Queue if we already know the order and total number of tasks to run in sequence. We can use a Promise chain and get the same results.
Sometimes we don´t know all next steps right at the start of our script..
..You might need more flexibility, and the next task we want to run depends on the response of the previous task. - Maybe your app relies on a REST API (multiple endpoints), and you are limited to max X simultaneous API request(s). We can not spam the API with requests from all over your app. You even don´t know when the next request gets enqueue()d (e.g. API requests are triggered by click() events?..
Ok, for the next example I changed the initialization code a bit:
We now enqueue 200 tasks randomly within [0-4000ms] period. - To be fair, we reduced the range by 30ms (max task runtime) to [0-3970ms]. Now our randomly filled queue has a chance to keep inside 4000ms limit.
What we can get out or the Dev-Tools performance login:
Random enqueue() leads to a big number of "waiting" tasks.
Makes sense, since we enqueued all tasks within first ~4000ms, they must overlap somehow. Checking the table output we can verify: Max queue.size is 22 at the time task 170/200was enqueued.
Waiting tasks are not evenly distributed. Right after start there are even some idle section.
Because of the random enqueue() it it unlikely to get a 0ms offset for our first task.
~20ms runtime for each task lead to the stacking effect over time.
We can sort tasks by "wait ms" (see screen): Longest waiting time was >400ms.
There might be a relation between queue.size (column: sizeOnAdd) and wait ms (see next section).
Our AwaitQueue completed last dequeue() ~4.37s after its initialization (check tooltip in "performance" tab). An average runtime of 20,786ms / task (expected: 20ms) gives us a total runtime of 4157.13ms (expected: 4000ms ≙ 4s).
We still have our "Log" blocks and the exec. time of our test script it self ~120ms. Still ~37ms longer? Summing up all idle "gaps" right at the start explains the missing ~37ms
Back to our initial "definition"
The most efficient Queue handles most tasks in shortest period of time?
Assumption: Apart from the random offset, tasks get enqueue()d in the previous example, both queues handled the same number of tasks (equal avg. runtime) within the same period of time. Neither the waiting time of an enqueued task nor the queue.size affect the total runtime. Both have the same efficiency?
Since a Queue, by its nature, shrinks our coding possibilities it is best not to use a Queue if we talk about efficient code (tasks per time).
A queue helps us to straighten tasks in an async environment into a sync pattern. That is exactly what we want. ➜ "Run an unknown sequence of tasks in a row".
If you find yourself asking things like: "If a new task gets enqueued into an already filled queue, the time we have to wait for our result, is increased by the run times of the others. That´s less efficient!".
Then you are doing it wrong:
You either enqueue tasks that have no dependency (in some way) on each other (logical oder programmatical dependency) or there is a dependency which would not increase the total runtime of our script. - We have to wait for the others anyway.
Stacked wait times
We have see a peak wait time of 461.05ms for a task before it runs. Wouldn't it be great if we could forecast the wait time for a task before we decide to enqueue it?
At first we analyse the behavior of our AutoQueue class over longer times.
(re-post screens)
We can build a chart from from what console.table() output:
Beside the wait time of a task, we can see the random [10-30ms] runtime and 3 curves, representing the current queue.size, recorded at the time a task ..
.. is enqueued()
.. starts to run. (dequeue())
.. the task finished (right before the next dequeue())
2 more runs for comparison (similar trend):
chart run 2: https://i.imgur.com/EjpRyuv.png
chart run 3: https://i.imgur.com/0jp5ciV.png
Can we find dependencies among each other?
If we could find a relation between any of those recorded chart lines, it might help us to understand how a queue behaves over time (➜ constantly filled up with new tasks).
Exkurs: What is a relation?
We are looking for an equation that projects the wait ms curve onto one of the 3 queue.size records. This would proof a direct dependency between both.
For our last run, we changed our start parameters:
Task count: 200 ➜ 1000 (5x)
msEnqueueMax: 4000ms ➜ 20000ms (5x)
+-------+----------------+----------------+------------------+------------------+
| count | random fake runtime for tasks | random enqueue() offset for tasks |
+-------+----------------+----------------+------------------+------------------+
| tasks | runtimeMin(ms) | runtimeMax(ms) | msEnqueueMin(ms) | msEnqueueMax(ms) |
| 1000 | 10 | 30 | 0 | 20000 |
+-------+----------------+----------------+------------------+------------------+
Avg. task runtime: ⇒ (10ms + 30ms) / 2 = 20ms (like before)
Total time: ⇒ 20ms * 1000 = 20000ms ≙ 20s
➜ We expect our queue to be resolved after ~20s
➜ For consistent enqueue() frequency we set msEnqueueMax to 20000
(interactive chart: https://datawrapper.dwcdn.net/p4ZYx/2/)
We see the same trend. wait ms increases over time (nothing new). Since our 3 queue.size lines at the bottom were drawn into the same chart (Y-axis has ms scale), they are barely visible. Quick switch to a logarithmic scale for better comparison:
(interactive chart: https://datawrapper.dwcdn.net/lZngg/1/)
The two dotted lines for queue.size [on start] and queue.size [on end] pretty much overlap each other and fall down to "0" once our queue gets empty, at the end.
queue.size [on add] looks very similar to the wait ms line. That is what we need.
{queue.size [on add]} * X = {wait ms}
⇔ X = {wait ms} / {queue.size [on add]}
This alone does not help us at runtime because wait ms is unknown for a new enqueued task (has not yet been run). So we still have 2 unknown variable: X and wait ms. We need an other relation that helps us.
First of all, we print our new ration {wait ms} / {queue.size [on add]} into the chart (light green), and its mean/average (light green horizontal dashed). This is pretty close to 20ms (avg. run ms of our tasks), right?
Switch back to linear Y-axes and set its "max scale" to 80ms to get a better view of it. (hint: wait ms is now beyond the view port)
(interactive chart: https://datawrapper.dwcdn.net/Tknnr/4/)
Back to the random runtimes of our tasks (dot cloud). We still have our "total mean" of 20.72ms (dark green dashed horizontal). We can also calc the mean of our previous tasks at runtime (e.g. task 370 gets enqueued ➜ What is the current mean runtime for task [1,.., 269] = mean runtime). But we could even be more precise:
The more tasks we enqueue the less impact they have on total "mean runtime". So let´s just calc the "mean runtime" of the last e.g. 50 tasks. Which leads to a consistent impact of 1/50 per task for the "mean runtime". ➜ Peak runtimes get straighten and the trend (up/down) is taken into account. (dark green horizontal path curve next to the light green from our 1. equation).
Things we can do now:
We can eliminate X from our 1st equation (light green). ➜ X can be expressed by the "mean runtimes of previous n e.g. 50 tasks (dark green).
Our new equation just depends on variables, that are known at runtime, right at the point of enqueue:
// mean runtime from prev. n tasks:
X = {[taskRun[-50], .. , taskRun[-2], taskRun[-1] ] / n } ms
// .. replace X in 1st equation:
⇒ {wait ms} = {queue.size [on add]} * {[runtime[-50], .. , runtime[-2], runtime[-1] ] / n } ms
We can draw a new diagram curve to our chart and check how close it is compared to the recorded wait ms (orange)
(interactive chart: https://datawrapper.dwcdn.net/LFp1d/2/)
Conclusion
We can forecast the wait for a task before it gets enqueued, given the fact the run times of our tasks can be determined somehow. So it works best in situations where you enqueue tasks of the same type/function:
Use case: An AutoQueue instance is filled up with render tasks for your UI components. Render time might not change chat much (compared to fetch()). Maybe you render 1000 location marks on a map. Each mark is an instance of a class with a render() Fn.
Tips
Queues are used for various tasks. ➜ Implement dedicated Queue class variations for different kinds of logic (not mix different logic in one class)
Check all tasks that might be enqueued to the same AutoQueue instance (now or in future), they could be blocked by all the others.
An AutoQueue will not improve the runtime, at best it will not get lowered.
Use different AutoQueue instances for different Task types.
Monitor the size of your AutoQueue, particular ..
.. on heavy usage (high frequently of enqueue())
.. on long or unknown task runtimes
Check your error handling. Since errors inside your tasks will just reject their returned promise on enqueue (promise = queue.enqueue(..)) and will not stop the dequeue process. You can handle errors..
.. inside your tasks ➜ `try{..} catch(e){ .. }
.. right after it (before the next) ➜ return new Promise()
.. "async" ➜ queue.enqueue(..).catch(e => {..})
.. "global" ➜ error handler inside the AutoQueue class
Depending on the implementation of your Queue you might watch the queue.size. An Array, filled up with 1000 tasks, is less effective than a decentralized data-structure like the "Doubly Linked List" I used in the final code.
Avoid recursion hell. (It is OK to use tasks that enqueue() others) - But, it is no fun to debug an AutoQueue where tasks are dynamically enqueue()e by others in an async environment..
At first glance a Queue might solve a problem (at a certain level of abstraction). However, in most cases it shrinks existing flexibility. It adds an additional "control layer" to our code (which in most cases, is what we want) at the same time, we sign a contract to accept the strict rules of a Queue. Even if it solves a problem, it might not be the best solution.
Add more features [basic]
Stop "Auto dequeue()" on enqueue():
Since our AutoQueue class is generic and not limited to long running HTTP requests(), you can enqueue() any function that has to run in sequence, even 3min running functions for e.g. "store updates for modules",.. You can not guarantee, that when you enqueue() 100 tasks in a loop, the prev added task is not already dequeued().
You might want to prevent enqueue() from calling dequeue() until all where added.
enqueue(action, autoDequeue = true) { // new
return new Promise((resolve, reject) => {
super.enqueue({ action, resolve, reject });
if (autoDequeue) this.dequeue(); // new
});
}
.. and then call queue.dequeue() manually at some point.
Control methods: stop / pause / start
You can add more control methods. Maybe your app has multiple modules that all try to fetch() there resources on pageload. An AutoQueue() works like a Controller. You can monitor how many tasks are "waiting.." and add more controls:
class AutoQueue extends Queue {
constructor() {
this._stop = false; // new
this._pause = false; // new
}
enqueue(action) { .. }
async dequeue() {
if (this._pendingPromise) return false;
if (this._pause ) return false; // new
if (this._stop) { // new
this._queue = [];
this._stop = false;
return false;
}
let item = super.dequeue();
..
}
stop() { // new
this._stop = true;
}
pause() { // new
this._pause = true;
}
start() { // new
this._stop = false;
this._pause = false;
return await this.dequeue();
}
}
Forward response:
You might want to process the "response/value" of a task in the next task. It is not guaranteed that our prev. task has not already finished, at the time we enqueue the 2nd task.
Therefore it might be best to store the response of the prev. task inside the class and forward it to the next: this._payload = await item.action(this._payload)
Error handling
Thrown errors inside a task Fn reject the promise returned by enqueue() and will not stop the dequeue process. You might want to handle error before next task starts to run:
queue.enqueue(queue => myTask() ).catch({ .. }); // async error handling
queue.enqueue(queue =>
myTask()
.then(payload=> otherTask(payload)) // .. inner task
.catch(() => { .. }) // sync error handling
);
Since our Queue is dump, and just await´s for our task´s to be resolved (item.action(this)), no one prevents you from returning a new Promise() from the current running task Fn. - It will be resolved before the next task gets dequeued.
You can throw new Error() inside task Fn´s and handle them "outside"/after run:queue.enqueue(..).catch().
You can easily add a custom Error handling inside the dequeue() method that calls this.stop() to clear "on hold"(enqueued) tasks..
You can even manipulate the queue from inside your task functions. Check: await item.action(this) invokes with this and gives access to the Queue instance. (this is optional). There are use cases where task Fn´s should not be able to.
Add more features [advanced]
... text limt reached :D
more: https://gist.github.com/exodus4d/6f02ed518c5a5494808366291ff1e206
Read more
Blog: "Asynchronous Recursion with Callbacks, Promises and Async"
Book: "Callable values"
Book: "Async functions"
You could save previous pending promise, await for it before calling next fetch.
// fake fetch for demo purposes only
const fetch = (url, options) => new Promise(resolve => setTimeout(resolve, 1000, {url, options}))
// task executor
const addTask = (() => {
let pending = Promise.resolve();
const run = async (url, options) => {
try {
await pending;
} finally {
return fetch(url, options);
}
}
// update pending promise so that next task could await for it
return (url, options) => (pending = run(url, options))
})();
addTask('url1', {options: 1}).then(console.log)
addTask('url2', {options: 2}).then(console.log)
addTask('url3', {options: 3}).then(console.log)
Here's one I made earlier, also available in typescript
function createAsyncQueue(opts = { dedupe: false }) {
const { dedupe } = opts
let queue = []
let running
const push = task => {
if (dedupe) queue = []
queue.push(task)
if (!running) running = start()
return running.finally(() => {
running = undefined
})
}
const start = async () => {
const res = []
while (queue.length) {
const item = queue.shift()
res.push(await item())
}
return res
}
return { push, queue, flush: () => running || Promise.resolve([]) }
}
// ----- tests below 👇
const sleep = ms => new Promise(r => setTimeout(r, ms))
async function test1() {
const myQueue = createAsyncQueue()
myQueue.push(async () => {
console.log(100)
await sleep(100)
return 100
})
myQueue.push(async () => {
console.log(10)
await sleep(10)
return 10
})
console.log(await myQueue.flush())
}
async function test2() {
const myQueue = createAsyncQueue({ dedupe: true })
myQueue.push(async () => {
console.log(100)
await sleep(100)
return 100
})
myQueue.push(async () => {
console.log(10)
await sleep(10)
return 10
})
myQueue.push(async () => {
console.log(9)
await sleep(9)
return 9
})
// only 100 and 9 will be executed
// concurrent executions will be deduped
console.log(await myQueue.flush())
}
test1().then(test2)
Example usage:
const queue = createAsyncQueue()
const task1 = async () => {
await fetchItem()
}
queue.push(task1)
const task2 = async () => {
await fetchItem()
}
queue.push(task2)
// task1 will be guaranteed to be executed before task2
I would think there is a simple solution like follows.
class AsyncQueue {
constructor() {
this.promise = Promise.resolve()
}
push = (task) => {
this.promise = this.promise.then(task)
}
}
let count = 0
let dummy = () =>
new Promise((res) => {
const ms = 400 + Math.ceil(1200 * Math.random())
console.log('will wait', ms, 'ms')
setTimeout(res, ms)
})
const foo = async (args) => {
const s = ++count
console.log('start foo', s)
await dummy()
console.log('end foo', s)
}
console.log('begin')
const q = new AsyncQueue()
q.push(foo)
q.push(foo)
q.push(foo)
q.push(foo)
console.log('end')
For your case you can do something like this:
const q = new AsyncQueue()
const addTask = (url, options) => {
q.push(() => fetch(url, options))
}
If you want to handle some results:
const q = new AsyncQueue()
const addTask = (url, options, handleResults) => {
q.push(async () => handleResults(await fetch(url, options)))
}
Not sure about performance, I just think it is a quick clean solution.
https://stackoverflow.com/a/71239408/8784402
Schedule task parallelly from array without waiting any to finish within allowed threads
const fastQueue = async <T, Q>(
x: T[],
threads: number,
fn: (v: T, i: number, a: T[]) => Promise<Q>
) => {
let k = 0;
const result = Array(x.length) as Q[];
await Promise.all(
[...Array(threads)].map(async () => {
while (k < x.length) result[k] = await fn(x[k], k++, x);
})
);
return result;
};
const demo = async () => {
const wait = (x: number) => new Promise(r => setTimeout(r, x, x))
console.time('a')
console.log(await fastQueue([1000, 2000, 3000, 2000, 2000], 4, (v) => wait(v)))
console.timeEnd('a')
}
demo();
I need to control concurrency in a Node.js script I'm making. Currently I'm trying to use npm promise-task-queue but I'm open to other suggestions.
I'm not sure how to implement promise-task-queue into my code. This is my original program:
readURLsfromFile().then( (urls) => {
urls.reduce( (accumulator, current, i) => {
return accumulator.then( () => {
return main(urls[i], i, urls.length)
})
}, Promise.resolve())
})
As you can see I'm reading urls from a file, then using .reduce() to run main() in serial on each one of these urls. Serial was too slow though so I need to do it with controlled concurrency.
Here's the code I started to write using promise-task-queue (It's very wrong, I have no idea what I'm doing):
var taskQueue = require("promise-task-queue");
var queue = taskQueue();
var failedRequests = 0;
queue.on("failed:apiRequest", function(task) {
failedRequests += 1;
});
queue.define("apiRequest", function(task) {
return Promise.try( () => {
return main(urls[i], i, urls.length));
}).then( () => {
return console.log("DONE!");
});
}, {
concurrency: 2
});
Promise.try( () => {
/* The following queues up the actual task. Note how it returns a Promise! */
return queue.push("apiRequest", {url: urls[i], iteration: i, amountToDo: urls.length)});
})
As you can see I've put my main() function with its argument after the Promise.try, and I've put my arguments after the return queue.push. Not sure if that's correct or not.
But regardless now I'm stuck, how do I load all the iterations into the queue?
You could use the qew module from npm: https://www.npmjs.com/package/qew.
Install using npm install qew.
To initialise you do
const Qew = require('qew');
const maxConcurrent = 3;
const qew = new Qew(maxConcurrent);
Using the above code qew will now be a queue that allows you to push asynchronous functions onto that will execute with a maximum concurrency of 3.
To push a new async function onto the qew you can do
qew.pushProm(asyncFunc);
So in your case if I understood you correctly you could do something like
readURLsfromFile()
.then(urls => {
return Promise.all(urls.map(url => { // wait for all promises to resolve
return qew.pushProm(() => main(url)); // push function onto queue
}));
})
.then(results => {
// do stuff with results
})
In this snippet you are reading urls from a file, and then loading a bunch of functions into the qew one by one and waiting for them all to resolve before doing something with them.
Full disclaimer: I am the author of this package.
Suppose you need to do some operations that depend on some temp file. Since
we're talking about Node here, those operations are obviously asynchronous.
What is the idiomatic way to wait for all operations to finish in order to
know when the temp file can be deleted?
Here is some code showing what I want to do:
do_something(tmp_file_name, function(err) {});
do_something_other(tmp_file_name, function(err) {});
fs.unlink(tmp_file_name);
But if I write it this way, the third call can be executed before the first two
get a chance to use the file. I need some way to guarantee that the first two
calls already finished (invoked their callbacks) before moving on without nesting
the calls (and making them synchronous in practice).
I thought about using event emitters on the callbacks and registering a counter
as receiver. The counter would receive the finished events and count how many
operations were still pending. When the last one finished, it would delete the
file. But there is the risk of a race condition and I'm not sure this is
usually how this stuff is done.
How do Node people solve this kind of problem?
Update:
Now I would advise to have a look at:
Promises
The Promise object is used for deferred and asynchronous computations.
A Promise represents an operation that hasn't completed yet, but is
expected in the future.
A popular promises library is bluebird. A would advise to have a look at why promises.
You should use promises to turn this:
fs.readFile("file.json", function (err, val) {
if (err) {
console.error("unable to read file");
}
else {
try {
val = JSON.parse(val);
console.log(val.success);
}
catch (e) {
console.error("invalid json in file");
}
}
});
Into this:
fs.readFileAsync("file.json").then(JSON.parse).then(function (val) {
console.log(val.success);
})
.catch(SyntaxError, function (e) {
console.error("invalid json in file");
})
.catch(function (e) {
console.error("unable to read file");
});
generators: For example via co.
Generator based control flow goodness for nodejs and the browser,
using promises, letting you write non-blocking code in a nice-ish way.
var co = require('co');
co(function *(){
// yield any promise
var result = yield Promise.resolve(true);
}).catch(onerror);
co(function *(){
// resolve multiple promises in parallel
var a = Promise.resolve(1);
var b = Promise.resolve(2);
var c = Promise.resolve(3);
var res = yield [a, b, c];
console.log(res);
// => [1, 2, 3]
}).catch(onerror);
// errors can be try/catched
co(function *(){
try {
yield Promise.reject(new Error('boom'));
} catch (err) {
console.error(err.message); // "boom"
}
}).catch(onerror);
function onerror(err) {
// log any uncaught errors
// co will not throw any errors you do not handle!!!
// HANDLE ALL YOUR ERRORS!!!
console.error(err.stack);
}
If I understand correctly I think you should have a look at the very good async library. You should especially have a look at the series. Just a copy from the snippets from github page:
async.series([
function(callback){
// do some stuff ...
callback(null, 'one');
},
function(callback){
// do some more stuff ...
callback(null, 'two');
},
],
// optional callback
function(err, results){
// results is now equal to ['one', 'two']
});
// an example using an object instead of an array
async.series({
one: function(callback){
setTimeout(function(){
callback(null, 1);
}, 200);
},
two: function(callback){
setTimeout(function(){
callback(null, 2);
}, 100);
},
},
function(err, results) {
// results is now equals to: {one: 1, two: 2}
});
As a plus this library can also run in the browser.
The simplest way increment an integer counter when you start an async operation and then, in the callback, decrement the counter. Depending on the complexity, the callback could check the counter for zero and then delete the file.
A little more complex would be to maintain a list of objects, and each object would have any attributes that you need to identify the operation (it could even be the function call) as well as a status code. The callbacks would set the status code to completed.
Then you would have a loop that waits (using process.nextTick) and checks to see if all tasks are completed. The advantage of this method over the counter, is that if it is possible for all outstanding tasks to complete, before all tasks are issued, the counter technique would cause you to delete the file prematurely.
// simple countdown latch
function CDL(countdown, completion) {
this.signal = function() {
if(--countdown < 1) completion();
};
}
// usage
var latch = new CDL(10, function() {
console.log("latch.signal() was called 10 times.");
});
There is no "native" solution, but there are a million flow control libraries for node. You might like Step:
Step(
function(){
do_something(tmp_file_name, this.parallel());
do_something_else(tmp_file_name, this.parallel());
},
function(err) {
if (err) throw err;
fs.unlink(tmp_file_name);
}
)
Or, as Michael suggested, counters could be a simpler solution. Take a look at this semaphore mock-up. You'd use it like this:
do_something1(file, queue('myqueue'));
do_something2(file, queue('myqueue'));
queue.done('myqueue', function(){
fs.unlink(file);
});
I'd like to offer another solution that utilizes the speed and efficiency of the programming paradigm at the very core of Node: events.
Everything you can do with Promises or modules designed to manage flow-control, like async, can be accomplished using events and a simple state-machine, which I believe offers a methodology that is, perhaps, easier to understand than other options.
For example assume you wish to sum the length of multiple files in parallel:
const EventEmitter = require('events').EventEmitter;
// simple event-driven state machine
const sm = new EventEmitter();
// running state
let context={
tasks: 0, // number of total tasks
active: 0, // number of active tasks
results: [] // task results
};
const next = (result) => { // must be called when each task chain completes
if(result) { // preserve result of task chain
context.results.push(result);
}
// decrement the number of running tasks
context.active -= 1;
// when all tasks complete, trigger done state
if(!context.active) {
sm.emit('done');
}
};
// operational states
// start state - initializes context
sm.on('start', (paths) => {
const len=paths.length;
console.log(`start: beginning processing of ${len} paths`);
context.tasks = len; // total number of tasks
context.active = len; // number of active tasks
sm.emit('forEachPath', paths); // go to next state
});
// start processing of each path
sm.on('forEachPath', (paths)=>{
console.log(`forEachPath: starting ${paths.length} process chains`);
paths.forEach((path) => sm.emit('readPath', path));
});
// read contents from path
sm.on('readPath', (path) => {
console.log(` readPath: ${path}`);
fs.readFile(path,(err,buf) => {
if(err) {
sm.emit('error',err);
return;
}
sm.emit('processContent', buf.toString(), path);
});
});
// compute length of path contents
sm.on('processContent', (str, path) => {
console.log(` processContent: ${path}`);
next(str.length);
});
// when processing is complete
sm.on('done', () => {
const total = context.results.reduce((sum,n) => sum + n);
console.log(`The total of ${context.tasks} files is ${total}`);
});
// error state
sm.on('error', (err) => { throw err; });
// ======================================================
// start processing - ok, let's go
// ======================================================
sm.emit('start', ['file1','file2','file3','file4']);
Which will output:
start: beginning processing of 4 paths
forEachPath: starting 4 process chains
readPath: file1
readPath: file2
processContent: file1
readPath: file3
processContent: file2
processContent: file3
readPath: file4
processContent: file4
The total of 4 files is 4021
Note that the ordering of the process chain tasks is dependent upon system load.
You can envision the program flow as:
start -> forEachPath -+-> readPath1 -> processContent1 -+-> done
+-> readFile2 -> processContent2 -+
+-> readFile3 -> processContent3 -+
+-> readFile4 -> processContent4 -+
For reuse, it would be trivial to create a module to support the various flow-control patterns, i.e. series, parallel, batch, while, until, etc.
The simplest solution is to run the do_something* and unlink in sequence as follows:
do_something(tmp_file_name, function(err) {
do_something_other(tmp_file_name, function(err) {
fs.unlink(tmp_file_name);
});
});
Unless, for performance reasons, you want to execute do_something() and do_something_other() in parallel, I suggest to keep it simple and go this way.
Wait.for https://github.com/luciotato/waitfor
using Wait.for:
var wait=require('wait.for');
...in a fiber...
wait.for(do_something,tmp_file_name);
wait.for(do_something_other,tmp_file_name);
fs.unlink(tmp_file_name);
With pure Promises it could be a bit more messy, but if you use Deferred Promises then it's not so bad:
Install:
npm install --save #bitbar/deferred-promise
Modify your code:
const DeferredPromise = require('#bitbar/deferred-promise');
const promises = [
new DeferredPromise(),
new DeferredPromise()
];
do_something(tmp_file_name, (err) => {
if (err) {
promises[0].reject(err);
} else {
promises[0].resolve();
}
});
do_something_other(tmp_file_name, (err) => {
if (err) {
promises[1].reject(err);
} else {
promises[1].resolve();
}
});
Promise.all(promises).then( () => {
fs.unlink(tmp_file_name);
});
In the code attached i am looking to run the function returnFile after all database querys have run, but the problem is that i am unable to tell which response will be the last from inside of the query response, so what I was thinking was to separate the loops and just have the last callback run the returnFile function but that would dramatically slow things down.
for (var i = 0, len = articleRevisionData.length; i < len; i++) {
tagNames=[]
console.log("step 1, "+articleRevisionData.length+" i:"+i);
if(articleRevisionData[i]["tags"]){
for (var x = 0, len2 = articleRevisionData[i]["tags"].length; x < len2; x++) {
console.log("step 2, I: "+i+" x: "+x+articleRevisionData[i]["articleID"])
tagData.find({"tagID":articleRevisionData[i]["tags"][x]}).toArray( function(iteration,len3,iterationA,error, resultC){
console.log("step 3, I: "+i+" x: "+x+" iteration: "+iteration+" len3: "+len3)
if(resultC.length>0){
tagNames.push(resultC[0]["tagName"]);
}
//console.log("iteration: "+iteration+" len: "+len3)
if(iteration+1==len3){
console.log("step 4, iterationA: "+iterationA+" I: "+iteration)
articleRevisionData[iterationA]["tags"]=tagNames.join(",");
}
}.bind(tagData,x,len2,i));
}
}
if(i==len-1){
templateData={
name:userData["firstName"]+" "+userData["lastName"],
articleData:articleData,
articleRevisionData:articleRevisionData
}
returnFile(res,"/usr/share/node/Admin/anonymousAttempt2/Admin/Articles/home.html",templateData);
}
}
It is rarely a good idea to call an asynchronous function from within a loop since, as you've discovered, you cannot know when all the calls complete (which is the nature of asynchrony.)
In your example, it's important to note that all of your async calls run concurrently, which can consume more system resources than you might wish.
I've found that the best solution to these kinds of problems is to use events to manage execution flow, as in:
const EventEmitter = require('events');
const emitter = new EventEmitter();
let iterations = articleRevisionData.length;
// start up state
emitter.on('start', () => {
// do setup here
emitter.emit('next_iteration');
});
// loop state
emitter.on('next_iteration', () => {
if(iterations--) {
asyncFunc(args, (err,result) => {
if(err) {
emitter.emit('error', err);
return;
}
// do something with result
emitter.emit('next_iteration');
});
return;
}
// no more iterations
emitter.emit('complete');
});
// error state
emitter.on('error', (e) => {
console.error(`processing failed on iteration ${iterations+1}: ${e.toString()}`);
});
// processing complete state
emitter.on('complete', () => {
// do something with all results
console.log('all iterations complete');
});
// start processing
emitter.emit('start');
Note how simple and clean this code is, lacking any "callback hell", and how easy it is to visualize program flow.
It is also worth noting that you can express every kind of execution control (doWhile, doUntil, map/reduce, queue workers, etc.) using events and since event handling is at the very core of Node, you'll find using them in this manner will outperform most, if not all, other solutions.
See Node Events for more information on event handling in Node.