I need to control concurrency in a Node.js script I'm making. Currently I'm trying to use npm promise-task-queue but I'm open to other suggestions.
I'm not sure how to implement promise-task-queue into my code. This is my original program:
readURLsfromFile().then( (urls) => {
urls.reduce( (accumulator, current, i) => {
return accumulator.then( () => {
return main(urls[i], i, urls.length)
})
}, Promise.resolve())
})
As you can see I'm reading urls from a file, then using .reduce() to run main() in serial on each one of these urls. Serial was too slow though so I need to do it with controlled concurrency.
Here's the code I started to write using promise-task-queue (It's very wrong, I have no idea what I'm doing):
var taskQueue = require("promise-task-queue");
var queue = taskQueue();
var failedRequests = 0;
queue.on("failed:apiRequest", function(task) {
failedRequests += 1;
});
queue.define("apiRequest", function(task) {
return Promise.try( () => {
return main(urls[i], i, urls.length));
}).then( () => {
return console.log("DONE!");
});
}, {
concurrency: 2
});
Promise.try( () => {
/* The following queues up the actual task. Note how it returns a Promise! */
return queue.push("apiRequest", {url: urls[i], iteration: i, amountToDo: urls.length)});
})
As you can see I've put my main() function with its argument after the Promise.try, and I've put my arguments after the return queue.push. Not sure if that's correct or not.
But regardless now I'm stuck, how do I load all the iterations into the queue?
You could use the qew module from npm: https://www.npmjs.com/package/qew.
Install using npm install qew.
To initialise you do
const Qew = require('qew');
const maxConcurrent = 3;
const qew = new Qew(maxConcurrent);
Using the above code qew will now be a queue that allows you to push asynchronous functions onto that will execute with a maximum concurrency of 3.
To push a new async function onto the qew you can do
qew.pushProm(asyncFunc);
So in your case if I understood you correctly you could do something like
readURLsfromFile()
.then(urls => {
return Promise.all(urls.map(url => { // wait for all promises to resolve
return qew.pushProm(() => main(url)); // push function onto queue
}));
})
.then(results => {
// do stuff with results
})
In this snippet you are reading urls from a file, and then loading a bunch of functions into the qew one by one and waiting for them all to resolve before doing something with them.
Full disclaimer: I am the author of this package.
Related
I am looking at https://www.promisejs.org/patterns/ and it mentions it can be used if you need a value in the form of a promise like:
var value = 10;
var promiseForValue = Promise.resolve(value);
What would be the use of a value in promise form though since it would run synchronously anyway?
If I had:
var value = 10;
var promiseForValue = Promise.resolve(value);
promiseForValue.then(resp => {
myFunction(resp)
})
wouldn't just using value without it being a Promise achieve the same thing:
var value = 10;
myFunction(10);
Say if you write a function that sometimes fetches something from a server, but other times immediately returns, you will probably want that function to always return a promise:
function myThingy() {
if (someCondition) {
return fetch('https://foo');
} else {
return Promise.resolve(true);
}
}
It's also useful if you receive some value that may or may not be a promise. You can wrap it in other promise, and now you are sure it's a promise:
const myValue = someStrangeFunction();
// Guarantee that myValue is a promise
Promise.resolve(myValue).then( ... );
In your examples, yes, there's no point in calling Promise.resolve(value). The use case is when you do want to wrap your already existing value in a Promise, for example to maintain the same API from a function. Let's say I have a function that conditionally does something that would return a promise — the caller of that function shouldn't be the one figuring out what the function returned, the function itself should just make that uniform. For example:
const conditionallyDoAsyncWork = (something) => {
if (something == somethingElse) {
return Promise.resolve(false)
}
return fetch(`/foo/${something}`)
.then((res) => res.json())
}
Then users of this function don't need to check if what they got back was a Promise or not:
const doSomethingWithData = () => {
conditionallyDoAsyncWork(someValue)
.then((result) => result && processData(result))
}
As a side node, using async/await syntax both hides that and makes it a bit easier to read, because any value you return from an async function is automatically wrapped in a Promise:
const conditionallyDoAsyncWork = async (something) => {
if (something == somethingElse) {
return false
}
const res = await fetch(`/foo/${something}`)
return res.json()
}
const doSomethingWithData = async () => {
const result = await conditionallyDoAsyncWork(someValue)
if (result) processData(result)
}
Another use case: dead simple async queue using Promise.resolve() as starting point.
let current = Promise.resolve();
function enqueue(fn) {
current = current.then(fn);
}
enqueue(async () => { console.log("async task") });
Edit, in response to OP's question.
Explanation
Let me break it down for you step by step.
enqueue(task) add the task function as a callback to promise.then, and replace the original current promise reference with the newly returned thenPromise.
current = Promise.resolve()
thenPromise = current.then(task)
current = thenPromise
As per promise spec, if task function in turn returns yet another promise, let's call it task() -> taskPromise, well then the thenPromise will only resolve when taskPromise resolves. thenPromise is practically equivalent to taskPromise, it's just a wrapper. Let's rewrite above code into:
current = Promise.resolve()
taskPromise = current.then(task)
current = taskPromise
So if you go like:
enqueue(task_1)
enqueue(task_2)
enqueue(task_3)
it expands into
current = Promise.resolve()
task_1_promise = current.then(task_1)
task_2_promise = task_1_promise.then(task_2)
task_3_promise = task_2_promise.then(task_3)
current = task_3_promise
effectively forms a linked-list-like struct of promises that'll execute task callbacks in sequential order.
Usage
Let's study a concrete scenario. Imaging you need to handle websocket messages in sequential order.
Let's say you need to do some heavy computation upon receiving messages, so you decide to send it off to a worker thread pool. Then you write the processed result to another message queue (MQ).
But here's the requirement, that MQ is expecting the writing order of messages to match with the order they come in from the websocket stream. What do you do?
Suppose you cannot pause the websocket stream, you can only handle them locally ASAP.
Take One:
websocket.on('message', (msg) => {
sendToWorkerThreadPool(msg).then(result => {
writeToMessageQueue(result)
})
})
This may violate the requirement, cus sendToWorkerThreadPool may not return the result in the original order since it's a pool, some threads may return faster if the workload is light.
Take Two:
websocket.on('message', (msg) => {
const task = () => sendToWorkerThreadPool(msg).then(result => {
writeToMessageQueue(result)
})
enqueue(task)
})
This time we enqueue (defer) the whole process, thus we can ensure the task execution order stays sequential. But there's a drawback, we lost the benefit of using a thread pool, cus each sendToWorkerThreadPool will only fire after last one complete. This model is equivalent to using a single worker thread.
Take Three:
websocket.on('message', (msg) => {
const promise = sendToWorkerThreadPool(msg)
const task = () => promise.then(result => {
writeToMessageQueue(result)
})
enqueue(task)
})
Improvement over take two is, we call sendToWorkerThreadPool ASAP, without deferring, but we still enqueue/defer the writeToMessageQueue part. This way we can make full use of thread pool for computation, but still ensure the sequential writing order to MQ.
I rest my case.
I can't make my code work in order. I need the connection test to come first, and finally the functions are also resolved in order to form a text string that will be sent in a tweet with an NPM package. (This is not my true code, it is a summary example)
I've tried many things and my brain is on fire
// Test DB conection
db.authenticate()
.then(() => {
const server = http.createServer(app)
server.listen(config.port, () => {
console.log(`http://localhost:${config.port}`)
})
reload(app)
})
.catch(err => {
console.log(`Error: ${err}`)
})
// Functions
resumen.man = (numRoom) => {
const registries = Registries.findOne({})
.then((registries) => {
return registries.name+' is good.'
})
}
resumen.man1 = (numRoom) => {
const registries = Registries.findOne({})
.then((registries) => {
return registries.name+' is bad.'
})
}
resumen.man2 = (numRoom) => {
const registries = Registries.findOne({})
.then((registries) => {
return registries.name+' is big.'
})
}
// Execute resumen.man(1) first and save text in $varStringMultiLine ?
// Execute resumen.man1(1) later and save text in the same $varStringMultiLine ?
// Execute resumen.man2(1) last and save text in the same $varStringMultiLine ?
sendTweet($varStringMultiLine)
Thanx.
As commented by #Barmar and #some, you could chain the promises with .then or use async / await. I would recommend the latter, since .then-chaining will get unwieldy fast.
This is a really good explanation for async / await: https://javascript.info/async-await
Basically, you can use
await db.authenticate();
to halt the code and not execute the next line before the promise is resolved. However, to not freeze the whole execution, this itself needs to be done asynchronously in a promise.
Edit: Why this is not a duplicate: because Cypress, just read instead of tagging everything as duplicate.
Edit 2: Also, see answer for better understanding of the differences between usual async for loops problems and this question.
I am writing cypress tests and I want to create a cypress command that populates my database with a list of users. I want the creation loop to wait for each user to be created before it moves on to the next one (because I want that done in a specific order).
For now, my loop looks like this:
Cypress.Commands.add("populateDb", (users) => {
var createdItems = []
for (const user of users) {
cy.createUser(user, 'passe').then(response => {
createdUsers.unshift(response.body.user)
})
}
return createdItems
})
Of course, this loop does not wait for each user to be created before moving onto the next one (I want 'sequential treatment', NOT 'parallel and then wait for all promise to resolve')
I have read the answers about async for-loop here:
JavaScript ES6 promise for loop
Using async/await with a forEach loop
How do I return the response from an asynchronous call?
But I can't seem to find what I want, mainly because cypress wont allow me to declare my function as async as follow :
Cypress.Commands.add("populateDb", async (users) => {
//Some code
})
And If I don't declare it async I am not able to use await.
Isn't there some king of get() method that just synchronously wait for a Promise to resolve?
Using a combination of wrap and each cypress command I was able to achieve a loop that waits for each iteration and can return the full results without needing a global variable.
The only reason I use the wrap command is because the cypress command each requires it to be chained off a previous cypress command. The each command will evaluate each iteration and finally call the then where you can return the complete results array. I am using this to upload multiple files and return the list of keys, but you can modify this for your own need.
Cypress.Commands.add("uploadMultipleFiles", (files) => {
var uploadedS3Keys = []
cy.wrap(files).each((file, index, list) => {
cy.uploadFileToOrdersApi(file)
.then(s3Key => uploadedS3Keys.push(s3Key))
}).then(() => uploadedS3Keys)
})
Turns out there is a reason why cypress is so restrictive about what you can do about waiting for async method to resolve: it always automatically runs all the async commands in sequential order, as they are called, not in parallel, so this will execute in the right order even if createUser is Async :
Cypress.Commands.add("populateDb", (users) => {
for (const user in users) {
cy.createUser(user).then(response => {
console.log(response)
})
}
})
If you want to get the value of what is returned (in my case, I need the user ID to delete them later), you can just store them in a var at file root level and add a cypress command that returns that var.
var createdItems = []
Cypress.Commands.add("getCreatedItems", () => {
return createdItems
})
Cypress.Commands.add("populateDb", (users) => {
for (const user in users) {
cy.createUser(user).then(response => {
createdItems.unshift(response) // I unshift() because I need to delete them in FILO order later
})
}
})
Then in the cypress test file you can just call them in the order you need them to execute :
cy.populateDb(users)
cy.getCreatedItems().then(response => {
//response is my createdItems Array, and cy.getCreatedItems() will run only when cy.populateDb() is resolved
})
You could do it also like:
Cypress.Commands.add("populateDb", (users) => {
for (const user in users) {
cy.createUser(user).then(response => {
createdItems.unshift(response) // I unshift() because I need to delete them in FILO order later
})
}
return createdItems;
})
cy.populateDb(users).then(response => {
// ... runs after populate ...
})
But the other answer is also right. Each cy.xxxxx command is actually added to command queue and ran one after each other. If during queue execution new cy.xxxxx2 commands are called, they will be added to front of the queue.
Here is simple example to explain the execution order of cypress command queue and synchronous code:
const a = 1; // a == 1
cy.cmd1().then(() => {
cy.cmd2().then(() => {
a += 1; // a == 5
});
a += 1; // a == 3
cy.cmd3()
a += 1; // a == 4
});
cy.cmd4().then(() => {
a += 1; // a == 6
});
a += 1; // a == 2
// at this point cypress command queue starts running the queue that is cmd1, cmd4
// 1. cmd1 runs and adds cmd2 and cmd3 to front of command queue also adds +2 to a
// 2. cmd2 runs and adds +1 to a
// 3. cmd3 runs
// 4. cmd4 runs and adds +1 to a
// ... all done ...
So from this example you can see that in your case you loop will be executed serially, because each cy.createUser is added to cypress command queue and executed then sequentially.
This is my first question and I'm trying to learn javascript/nodejs
I have an array x.
var x = [1,2,3,4];
Also I have a function which takes in a param, does some processing and returns a json.
function funcName (param){
//does some external API calls and returns a JSON
return result;
}
Now what I'm looking for is rather than iterating over the array and calling the function again and again, is there a way to call them in parallel and then join the result and return it together ?
Also I'm looking for ways to catch the failed function executions.
for ex: funcName(3) fails for some reason
What you could do is create a file that does your heavy lifting, then run a fork of that file.
In this function we do the following:
loop over each value in the array and create a promise that we will store in an array
Next we create a fork
We then send data to the fork using cp.send()
Wait for a response back and resolve the promise
Using promise.all we can tell when all our child processes have completed
The first parameter will be an array of all the child process results
So our main process will look a little something like this:
const { fork } = require('child_process')
let x = [1,2,3,4]
function process(x) {
let promises = []
for (let i = 0; i < x.length; i++) {
promises.push(new Promise(resolve => {
let cp = fork('my_process.js', [i])
cp.on('message', data => {
cp.kill()
resolve(data)
})
}))
}
Promise.all(promises).then(data => {
console.log(data)
})
}
process(x)
Now in our child we can listen for messages, and do our heavy lifting and return the result back like so (very simple example):
// We got some data lets process it
result = []
switch (process.argv[1]) {
case 1:
result = [1, 1, 1, 1, 1, 1]
break
case 2:
result = [2, 2, 2, 2, 2, 2]
break
}
// Send the result back to the main process
process.send(result)
The comments and other answer are correct. JavaScript has no parallel processing capability whatsoever (forking processes doesn't count).
However, you can make the API calls in a vaguely parallel fashion. Since, as they are asynchronous, the network IO can be interleaved.
Consider the following:
const urls = ['api/items/1', 'api/items/2', etc];
Promise.all(urls.map(fetch))
.then(results => {
processResults(results);
});
While that won't execute JavaScript instructions in parallel, the asynchronous fetch calls will not wait for eachother to complete but will be interleaved and the results will be collected when all have completed.
With error handling:
const urls = ['api/items/1', 'api/items/2', etc];
Promise.all(urls.map(fetch).map(promise => promise.catch(() => undefined))
.then(results => results.filter(result => result !== undefined))
.then(results => {
processResults(results);
});
I'm new to node.js and am currently trying to code array iterations. I have an array of 1,000 items - which I'd like to iterate through in blocks of 50 items at a time due to problems with server load.
I currently use a forEach loop as seen below (which I'm looking at hopefully transforming into the aforementioned block iteration)
//result is the array of 1000 items
result.forEach(function (item) {
//Do some data parsing
//And upload data to server
});
Any help would be much appreciated!
UPDATE (in reponse to reply)
async function uploadData(dataArray) {
try {
const chunks = chunkArray(dataArray, 50);
for (const chunk of chunks) {
await uploadDataChunk(chunk);
}
} catch (error) {
console.log(error)
// Catch en error here
}
}
function uploadDataChunk(chunk) {
return Promise.all(
chunk.map((item) => {
return new Promise((resolve, reject) => {
//upload code
}
})
})
)
}
You should firstly split your array to chunks of 50. Then you need to make requests one by one, not at once. Promises can be used for this purpose.
Consider this implementation:
function parseData() { } // returns an array of 1000 items
async function uploadData(dataArray) {
try {
const chunks = chunkArray(dataArray, 50);
for(const chunk of chunks) {
await uploadDataChunk(chunk);
}
} catch(error) {
// Catch an error here
}
}
function uploadDataChunk(chunk) {
// return a promise of chunk uploading result
}
const dataArray = parseData();
uploadData(dataArray);
Using async/await will use promises under the hood, so that await will wait till current chunk is uploaded and only then will upload next one (if no error occurred).
And here is my proposal of chunkArray function implementation:
function chunkArray(array, chunkSize) {
return Array.from(
{ length: Math.ceil(array.length / chunkSize) },
(_, index) => array.slice(index * chunkSize, (index + 1) * chunkSize)
);
}
Note: this code uses ES6 features, so it it desirable to use babel / TypeScript.
Update
If you create multiple asynchronous database connections, just use some database pooling tool.
Update 2
If you want to update all the chunks asynchronously, and when chunk is uploaded start to upload another one, you can do it this way:
function uploadDataChunk(chunk) {
return Promise.all(
chunk.map(uploadItemToGoogleCloud) // uploadItemToGoogleCloud should return a promise
);
}
You may chunk your array in the required chunk size as follows;
function chunkArray(a,s){ // a: array to chunk, s: size of chunks
return Array.from({length: Math.ceil(a.length / s)})
.map((_,i) => Array.from({length: s})
.map((_,j) => a[i*s+j]));
}
var arr = Array(53).fill().map((_,i) => i); // test array of 53 items
console.log(chunkArray(arr,5)) // chunks of 5 items.
.as-console-wrapper{
max-height: 100% ! important;
}
There's a library for this that used to be very popular: async.js (not to be confused with the async keyword). I still think it's sometimes the cleaner approach though these days with async/await I tend to do it manually in a for loop.
The async library implements many asynchronous flow-control design pattern. For this case you can use eachLimit:
const eachLimit = require('async/eachLimit');
eachLimit(result, 50,
function (item) {
// do your forEach stuff here
},
function (err) {
// this will be called when everything is completed
}
);
Or if you prefer you can use the promisified version so that you can await the loop:
const eachLimit = require('async/eachLimit');
async function processResult (result) {
// ...
try {
await eachLimit(result, 50, function (item) {
// do your forEach stuff here
});
}
catch (err) {
// handle thrown errors
}
}
In this specific case it's quite easy to manually batch the operations and use await to pause between batches but the async.js library includes a rich set of functions that are useful to know. Some of which are still quite difficult to do even with async/await like whilst (an asynchronous while), retry, forever etc. (see documentation: https://caolan.github.io/async/v3/docs.html)