I am working on a NestJS API which receives some article codes and saves them in a DB with status as queue and then i want to build a background process which searches for articles with queue status and fetches a website with that code and fills in the information about that article ID.
I want it to fetch codes in 5 parallel routines, each one should get a new code from the list as soon as it finished fetching the current code.
The way it's built right now, it gets 5 codes and uses Promise.all but what i don't like on this approach is that if one of the await is significantly slower than the other, they all way for the slowest one.
const promiseResult = await Promise.all(promiseList);
Which approach is the best one for me to learn in order to achieve what i am planning to do?
Basically you do your job when all the promises are fulfilled. You are correct when you say that you wait for the slowest one, as this is exactly what await Promise.all(promiseList) does.
Now you need to ask yourself the question whether you are able to process the quicker promises while you await for the slowest one. If so, then you can iterate your promises and define a .then() for them:
for (let promise of promiseList) {
promise.then((value) => {
//Do something
});
}
This way all your promises have a then handler for the case when the promise is fulfilled and your Javascript flows normally, without waiting for any of the promises. Once each promise is fulfilled, the event loop will detect that an event happened and call your callback function (in the case of the example above, the arrow function inside the then call).
Note that the then callback should never assume that the other promises were also completed, but that shouldn't be a problem in your case, because you want to work when your promise list was only partially completed as well.
I've seen many people saying that Promise.all can't achieve parallelism, since node/javascript runs on a single-threaded environment. However, if i, for instance, wrap 5 promises inside a Promise.all, in which every single one of the promises resolves after 3 seconds (a simple setTimeout promise), how come Promise.all resolves all of them in 3 seconds instead of something like 15 seconds (5 x 3 sec each)?
See the example below:
function await3seconds () {
return new Promise(function(res) {
setTimeout(() => res(), 3000)
})
}
console.time("Promise.all finished in::")
Promise.all([await3seconds(), await3seconds(), await3seconds(), await3seconds(), await3seconds()])
.then(() => {
console.timeEnd("Promise.all finished in::")
})
It logs:
Promise.all finished in::: 3.016s
How is this behavior possible without parallelism? Concurrent execution wouldn't be able to proccess all of these promises in 3 seconds either.
It's particularly useful to understand what this line of code actually does:
Promise.all([await3seconds(), await3seconds(), await3seconds(), await3seconds(), await3seconds()]).then(...)
That is fundamentally the same as:
const p1 = await3seconds();
const p2 = await3seconds();
const p3 = await3seconds();
const p4 = await3seconds();
const p5 = await3seconds();
Promise.all([p1, p2, p3, p4, p5]).then(...)
What I'm trying to show here is that ALL your functions are executed serially one after the other in the order declared and they have all returned BEFORE Promise.all() is even executed.
So, some conclusions from that:
Promise.all() didn't "run" anything. It accepts an array of promises and it justs monitors all those promises, collects their results in order and notifies you (via .then() or await) when they are all done or tells you when the first one rejects.
Your functions are already executed and have returned a promise BEFORE Promise.all() even runs. So, Promise.all() doesn't determine anything about how those functions run.
If the functions you were calling were blocking, the first would run to completion before the second was even called. Again, this has nothing to do with Promise.all() before the functions are all executed before Promise.all() is even called.
In your particular example, your functions each start a timer and immediately return. So, you essentially start 5 timers within ms of each other that are all set to fire in 3 seconds. setTimeout() is non-blocking. It tells the system to create a timer and gives the system a callback to call when that timer fires and then IMMEDIATELY returns. Sometime later, when the event loop is free, the timer will fire and the callback will get called. So, that's why all the timers are set at once and all fire at about the same time. If you wanted them to each be spread out by 3 seconds apart, you'd have to write this code differently, either to set increasing times for each timer or to not start the 2nd timer until the first one fires and so on.
So, what Promise.all() allows you to do is to monitor multiple asynchronous operations that are, by themselves (independent of Promise.all()) capable of running asynchronously. Nodejs itself, nothing to do with Promise.all(), has the ability to run multiple asynchronous operations in parallel. For example, you can make multiple http requests or make multiple read requests from the file system and nodejs will run those in parallel. They will all be "in flight" at the same time.
So, Promise.all() isn't enabling parallelism of asynchronous operations. That capability is built into the asynchronous operations themselves and how they interact with nodejs and how they are implemented. Promise.all() allows you to track multiple asynchronous operations and know when they are all done, get their results in order and/or know when there was an error in one of the operations.
If you're curious how timers work in nodejs, they are managed by libuv which is a cross platform library that nodejs uses for managing the event loop, timers, file system access, networking and a whole bunch of things.
Inside of libuv, it manages a sorted list of pending timers. The timers are sorted by their next time to fire so the soonest timer to fire is at the start of the list.
The event loop within nodejs goes in a cycle to check for a bunch of different things and one of those things is to see if the current head of the timer list has reached its firing time. If so, it removes that timer from the list, grabs the callback associated with that timer and calls it.
Other types of asynchronous operations such as file system access work completely differently. The asynchronous file operations in the fs module, actually use a native code thread pool. So, when you request an asynchronous file operation, it actually grabs a thread from the thread pool, gives it the job for the particular file operation you requested and sends the thread on its way. That native code thread, then runs independently from the Javascript interpreter which is free to go do other things. At some future time when the thread finishes the file operation, it calls a thread safe interface of the event loop to add a file completion event to a queue. When whatever Javascript is currently executing finishes and returns control back to the event loop, one of the things the event loop will do is check if there are any file system completion events waiting to be executed. If so, it will remove it from the queue and call the callback associated with it.
So, while individual asynchronous operation can themselves run in parallel if designed appropriately (usually backed by native code), signaling completion or errors to your Javascript all runs through the event loop and the interpreter is only running one piece of Javascript at a time.
Note: this is ignoring for the purposes of this discussion, the WorkerThread capability which actually fires up a whole different interpreter and can run multiple sets of Javascript at the same time. But, each individual interpreter still runs your Javascript single threaded and is still coordinated with the outside world through the event loop.
First, Promise.all has nothing to do with JS code running in parallel. You can think of Promise.all as code organizer, it puts code together and wait for response.
What's responsible for the code to run in a way that "looks like" it's parallel is the event-based nature of JS. So in your case:
Promise.all([await3seconds(), await3seconds(),
await3seconds(), await3seconds(), await3seconds()])
.then(() => {
console.timeEnd("Promise.all finished in::")
})
Let's say that each function inside Promise.all is called a1 : a5, What will happen is:
The Event loop will take a1 : a5 and put them in the "Callback Queue/Task Queue" sequentially (one after the other), But it will not take too much time, because it's just putting it in there, not executing anything.
The timer will start immediately after each function is put by the Event Loop in the "Callback Queue/Task Queue" (so there will be a minor delay between the start of each one).
Whenever a timer finishes, the Event loop will take the related callback function and put it in the "Call Stack" to be executed.
Promise.all will resolve after the last function is popped out of the "Call Stack".
As you can see in here
Promise.all finished in::: 3.016s
The 0.16s delay is a combination between the time the Event loop took to put those callback functions sequentially in the "Callback Queue/Task Queue" + the time each function took to execute the code inside it after their timer has finished.
So the code is not executed in parallel, it's still sequential, but the event-based nature of JS is used to mimic the behavior of parallelism.
Look at this article to relate more to what I am trying to say.
Synchronous vs Asynchronous JavaScript
No they are not executed in parallel but you can conceptualize them this way. This is just how the event queue works. If each promise contained a heavy compute task, they would still be executed one at a time -
function await3seconds(id) {
return new Promise(res => {
console.log(id, Date.now())
setTimeout(_=> {
console.log(id, Date.now())
res()
}, 3000)
})
}
console.time("Promise.all finished in::")
Promise.all([await3seconds(1), await3seconds(2), await3seconds(3), await3seconds(4), await3seconds(5)])
.then(() => {
console.timeEnd("Promise.all finished in::")
})
time
1
2
3
4
5
1621997095724
start
⌛
⌛
⌛
⌛
1621997095725
↓
start
⌛
⌛
⌛
1621997095725
↓
↓
start
⌛
⌛
1621997095725
↓
↓
↓
start
⌛
1621997095726
↓
↓
↓
↓
start
1621997098738
end
↓
↓
↓
↓
1621997098740
✓
end
↓
↓
↓
1621997098740
✓
✓
end
↓
↓
1621997098741
✓
✓
✓
end
↓
1621997098742
✓
✓
✓
✓
end
In this related Q&A we build a batch processing Pool that emulates threads. Check it out if that kind of thing interests you!
Does anyone used Web Worker and Async together or does anyone vote against this?
For example, if I want to send out 1000 async calls and wait for them to finish, it is synchronous, so, it is slow. While it doesn't block the main thread, it is slow to do await one by one.
Can I wait on a single async method that creates 1000 Web Workers and sends out the 1000 fetches in parallel (one fetch per worker). And each web worker will wait for the fetch result and post the result back. And on the main async method that created 1000 Web Workers, it collects all 1000 results. Once done, it finishes the method and the main thread will continue from there?
I am not seeing examples out there. I am wondering why. Is this a bad idea? Or maybe there is a framework for it?
Thanks
You don't need workers since fetch doesn't block the main thread. On top of that there would be a significant overhead.
fetch already does what you want by default, you should simply not await every single call.
You can use Promise.allSettled to convert an array of promises to a single promise of results that you can then await.
const promises = urls.map(url => fetch(url));
const results = await Promise.allSettled(promises);
I am relatively new to NodeJS. I was wondering if the await keyword will slow down the entire javascript/nodeJS program?
For example,
If I have many express routers written on a single server file, and one router function calls the 'await' for a promise to resolve, will all others routers and asynchronous functions stay on halt/paused until the promise is resolved? Or just that thread will be paused?
In such a case the await call will cause performance issues to the Javascript program?
No. While await sounds like it is blocking, it is fully asynchronous(non blocking) as it is also implied in the required function signature by async keyword. It is a (lot) nicer way to use promises. So go ahead and use it in your code.
You also mentioned threads, I suggest you ignore thread concept while developing node.js apps and trust the node.js event loop. Just never use blocking IO calls(which are explicitly named so by having 'Sync' in the name).
await waits for a promise to get resolved, but as we know that node is asynchronous in nature, therefore, other requests will be made to the application will have no effect on them they will not wait for the previous request promise to be resolved.
example
route 1 -> it will await and iterate through million rows and return sum in response
route 2 -> it will only return '1' in response
now when we'll call route 1 first then route 2 then you'll see that still you'll get the response from route 2 immediately and when route 1 will get completed you'll get that response.
I have taken this example from the Axios AJAX library but the same goes for Promises.all().
So far I have read you can use Promises.all() to check if all promises from an array of promises have been resolved.
This is really neat but what happens if you push a promise and it resolves before the next one has been pushed?
I am guessing with the overhead of my average AJAX call of at least 50ms the push will always happen before any ajax requests but to just say taken for this granted does not really feel right.
There are 2 solutions for this that I could think of:
Use a count to ensure both(in thise case) AJAX requests are in the
array.
Check for the actual function names being there.
How are others dealing with this or are most people simply satisfied with the hope of both AJAX requests being pushed before a single one can be resolved quick enough.
axios.all([getUserAccount(), getUserPermissions()])
.then(axios.spread(function (acct, perms) {
// Both requests are now complete
}));
You do not need to worry about promises that resolve before they are passed to Promise.all: Promise.all will not be called before its (array) argument has been completely evaluated. Only when the array is ready, will Promise.all be called.
Whether or not any of those promises is already resolved, at the time Promise.all is called, is really not important. Promise.all will check which ones are in a resolved state, and will only call its then method when all of them have been fulfilled. It might even be that all the promises in the array are already fulfilled: no problem, as soon as Promise.all is executed, it will schedule the execution of the then method. Even the time during which those promises were already resolved does not matter. Even if they were resolved one hour ago, Promise.all will still do its job correctly.
Axios.all calls Promise.all which returns a single promise which resolves when all the promises in the iterable argument are resolved.
Axios.spread will be getting the resolved values from getUserAccount and getUserPermission.
Push is a sync operation, ajax is async. JS will always run all sync operations before running any async operation even if it has already finished. for example:
for (i=0;i<10000000;i++){
console.log('looping');
}
setTimeout(function(){
console.log('running async')
}, 0)
Although the timeout is set to 0, so it can immediately run, it will wait until the for loop is done and only then run the async operation.
So even if you push a promise and it immediately resolves, it will wait until the push is done and only then will react to the resolved promises.