I am using node and axios (with TS, but that's not too important) to query an API. I have a suite of scripts that make calls to different endpoints and log the data (sometimes filtering it.) These scripts are used for debugging purposes. I am trying to make these scripts "better" by adding a delay between requests so that I don't "blow up" the API, especially when I have a large array I'm trying to pass. So basically I want it to make a GET request and pause for a certain amount of time before making the next request.
I have played with trying setTimeout() functions, but I'm only putting them in places where they add the delay after the requests have executed; everywhere I have inserted the function has had this result. I understand why I am getting this result, I just had to try everything I could to at least increase my understanding of how things are working.
I have though about trying to set up a queue or trying to use interceptors, but I think I might be "straying far" from a simpler solution with those ideas.
Additionally, I have another "base script" that I wrote on the fly (sorta the birth point for this batch of scripts) that I constructed with a for loop instead of the map() function and promise.all. I have played with trying to set the delay in that script as well, but I didn't get anywhere helpful.
var axios = require('axios');
var fs = require('fs');
const Ids = [arrayOfIds];
try {
// Promise All takes an array of promises
Promise.all(Ids.map(id => {
// Return each request as its individual promise
return axios
.get(URL + 'endPoint/' + id, config)
}))
.then((vals) =>{
// Vals is the array of data from the resolved promise all
fs.appendFileSync(`${__dirname}/*responseOutput.txt`,
vals.map((v) => {
return `${JSON.stringify(v.data)} \n \r`
}).toString())
}).catch((e) => console.log)
} catch (err) {
console.log(err);
}
No errors with the above code; just can't figure out how to put the delay in correctly.
You could try Promise.map from bluebird
It has the option of setting concurrency
var axios = require('axios');
var fs = require('fs');
var Promise = require('bluebird');
const Ids = [arrayOfIds];
let concurrency = 3; // only maximum 3 HTTP request will run concurrently
try {
Promise.map(Ids, id => {
console.log(`starting request`, id);
return axios.get(URL + 'endPoint/' + id, config)
}, { concurrency })
.then(vals => {
console.log({vals});
})
;
} catch (err) {
console.log(err);
}
Related
For a project, I have to generate a PDF that lists the vehicles in a DB. To do this, I use the JSPDF library.
1st step: I get data from a DB (via asynchronous requests on an API) and images on a server that I store in an Array.
2nd step: I call a function that generates the PDF with JSPDF.
The problem is that I need to have all my data retrieved before calling my generatePDF function otherwise the fields and images are empty because they have not yet been retrieved from the DB or the server.
A solution I found is to use the setTimeout function to put a delay between each call. However, this makes the code very slow and inflexible because you have to change the timeout manually depending on the number of data and images to retrieve. Moreover, it is impossible to determine exactly how long it will take to retrieve the data, especially since this can vary depending on the state of the network, so you have to allow for a margin which is often unnecessary.
Another solution is to use callbacks or to interweave fetch / ajax calls with .then / .done calls, but this becomes very complicated when it comes to retrieving the images since they are retrieved one by one and there are more than a hundred of them.
What would be the easiest way to do this in a clean and flexible way?
Thanks for your help and sorry for the long text, I tried to be as clear as possible :)
To do a series of asynchronous things in order, you start the next operation in the fulfillment handler of the previous operation.
An async function is the easiest way:
async function buildPDF() {
const response = await fetch("/path/to/the/data");
if (!response.ok) {
throw new Error(`HTTP error ${response.status}`);
}
const data = await response.json(); // Or `.text()` or whatever
const pdf = await createPDF(data); // Assuming it returns a promise
// ...
}
If you can't use async functions in your environment and don't want to transpile, you can write your fulfullment handlers as callbacks:
function buildPDF() {
return fetch("/path/to/the/data")
.then(response => {
if (!response.ok) {
throw new Error(`HTTP error ${response.status}`);
}
return response.json(); // Or `.text()` or whatever
})
.then(data => createPDF(data))
.then(pdf => {
// ...
});
}
Note that I'm returning the result of the chain, so that the caller can handle errors.
I have a large number of images that I want to download. I'm using the request-promise package to download the images from their URL(s). Since there were a large number of images, the server was getting overloaded and the downloads would hang or get corrupted, so I decided to use the promise-limit library to set the concurrency limit. Here is my code :
const fs = require('fs');
const request = require('request-promise');
const promiseLimit = require('promise-limit');
var limit = promiseLimit(30);
async function download(data) {
console.log(data);
return Promise.all(data.map( (obj) => {
return limit(() => downloadRequest(obj))
})).then(() => {
console.log("All images downloaded.")
})
function downloadRequest(obj) {
var img = obj.dest;
var url = obj.url;
var req = request(url);
req.pipe(fs.createWriteStream(img));
}
I replicated the sample exactly as is given in the github page, but the method returns without ever having the Promise.all() fulfilled. I do not understand what I am doing wrong, I thought that Promise.all() will definitely wait till it resolves all promises.
In the backend, this call is used as
...
.then((data) => {
return downloader.download(data);
})
.then(() => {
var filename = "Sample.pdf";
// Pages.json is written early in the chain, contains URL and corresponding image paths
return generator.generate('./Pages.json', filename);
});
Does this mean NodeJS is already trying to generate the file out of pages.json? How can I make this part of the code synchronous to download?
Your function downloadRequest() does not return a promise. It must return a promise that is tied to the asynchronous operation in contains such that the promise is resolved when that asynchronous operation is complete or rejected when that asynchronous operation has an error. Only when it does that can the limit() package properly do its job.
Since you're using a stream and piping it in downloadRequest(), you will have to manually construct a promise and then monitor the various events in the stream to know when it's done or has an error so you can resolve or reject that promise.
Here's an idea how to make downloadRequest() properly return a promise:
function downloadRequest(obj) {
return new Promise((resolve, reject) => {
const img = obj.dest;
const url = obj.url;
const req = request(url);
req.on('error', reject);
const ws = fs.createWriteStream(img);
ws.on('error', reject);
req.pipe(ws).on('finish', resolve);
});
}
And, it is now recommended to use the pipeline() function instead of .pipe() because it does more complete cleanup in error conditions and there is also a promise version of that built-in:
const { pipeline } = require('stream/promises');
function downloadRequest(obj) {
return pipeline(request(obj.url), fs.createWriteStream(obj.dest));
}
P.S. In case you didn't know, the request() library has been deprecated and it is recommended that you not use it in new projects any more. There is a list of alternative libraries to choose from here, all of which also have built-in promise support. I've looked at the various choices, tried several and decided I'm using got() in my work.
I'm experimenting with Puppeteer Cluster and I just don't understand how to use queuing properly. Can it only be used for calls where you don't wait for a response? I'm using Artillery to fire a bunch of requests simultaneously, but they all fail while only some fail when I have the command execute directly.
I've taken the code straight from the examples and replaced execute with queue which I expected to work, except the code doesn't wait for the result. Is there a way to achieve this anyway?
So this works:
const screen = await cluster.execute(req.query.url);
But this breaks:
const screen = await cluster.queue(req.query.url);
Here's the full example with queue:
const express = require('express');
const app = express();
const { Cluster } = require('puppeteer-cluster');
(async () => {
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_CONTEXT,
maxConcurrency: 2,
});
await cluster.task(async ({ page, data: url }) => {
// make a screenshot
await page.goto('http://' + url);
const screen = await page.screenshot();
return screen;
});
// setup server
app.get('/', async function (req, res) {
if (!req.query.url) {
return res.end('Please specify url like this: ?url=example.com');
}
try {
const screen = await cluster.queue(req.query.url);
// respond with image
res.writeHead(200, {
'Content-Type': 'image/jpg',
'Content-Length': screen.length //variable is undefined here
});
res.end(screen);
} catch (err) {
// catch error
res.end('Error: ' + err.message);
}
});
app.listen(3000, function () {
console.log('Screenshot server listening on port 3000.');
});
})();
What am I doing wrong here? I'd really like to use queuing because without it every incoming request appears to slow down all the other ones.
Author of puppeteer-cluster here.
Quote from the docs:
cluster.queue(..): [...] Be aware that this function only returns a Promise for backward compatibility reasons. This function does not run asynchronously and will immediately return.
cluster.execute(...): [...] Works like Cluster.queue, just that this function returns a Promise which will be resolved after the task is executed. In case an error happens during the execution, this function will reject the Promise with the thrown error. There will be no "taskerror" event fired.
When to use which function:
Use cluster.queue if you want to queue a large number of jobs (e.g. list of URLs). The task function needs to take care of storing the results by printing them to console or storing them into a database.
Use cluster.execute if your task function returns a result. This will still queue the job, so this is like calling queue in addition to waiting for the job to finish. In this scenario, there is most often a "idling cluster" present which is used when a request hits the server (like in your example code).
So, you definitely want to use cluster.execute as you want to wait for the results of the task function. The reason, you do not see any errors is (as quoted above) that the errors of the cluster.queue function are emitted via a taskerror event. The cluster.execute errors are directly thrown (Promise is rejected). Most likely, in both cases your jobs fail, but it is only visible for the cluster.execute
I'm working with Cloud Functions for Firebase, and I get a timeout with some of my functions. I'm pretty new with JavaScript. It looks like I need to put a for inside a promise, and I get some problems. The code actually get off from for too early, and I think he make this in a long time. Do you have some way to improve this code and make the code faster?
exports.firebaseFunctions = functions.database.ref("mess/{pushId}").onUpdate(event => {
//first i get event val and a object inside a firebase
const original = event.data.val();
const users = original.uids; // THIS ITS ALL USERS UIDS!!
// so fist i get all users uids and put inside a array
let usersUids = [];
for (let key in users) {
usersUids.push(users[key]);
}
// so now i gonna make a promise for use all this uids and get token's device
//and save them inside a another child in firebase!!
return new Promise((resolve) => {
let userTokens = [];
usersUids.forEach(element => {
admin.database().ref('users/' + element).child('token').once('value', snapShot => {
if (snapShot.val()) { // if token exist put him inside a array
userTokens.push(snapShot.val());
}
})
})
resolve({
userTokens
})
}) // now i make then here, from get userTokens and save in another child inside a firebase database
.then((res) => {
return admin.database().ref("USERS/TOKENS").push({
userTokens: res,
})
})
})
You are making network requests with firebase, so maybe that's why it's slow. You are making one request per user, so if you have 100 ids there, it might as well take a while.
But there's another problem that I notice, that is: you are just resolving to an empty list. To wait for several promises, create an array of promises, and then use Promise.all to create a promise that waits for all of them in parallel.
When you call resolve, you have already done the forEach, and you have started every promise, but they have not been added to the list yet. To make it better, chance it to a map and collect all the returned promises, and then return Promise.all.
I need to perform lots of findOneAndUpdate() operations using mongoose as there is no way to perform an atomic operation in bulk. Therefore I create a promise array in a for loop which will be resolved afterwards. Unfortunately this takes ~2-3 seconds and during that time my Express application can't process any new requests.
The code:
const promiseArray = []
for (let i = 0; i < 1500; i++) {
const p = PlayerProfile.findOneAndUpdate(filter, updateDoc)
promiseArray.push(p)
}
return Promise.all(promiseArray).then((values) => {
// Process the values
})
Question:
How can I avoid that my Express application becomes completely unresponsive to new requests while it's resolving this promise?
More context information:
I am trying to update and return many documents with a atomic operation, hence the big for loop. It's basically selecting a document and setting up a lock for this document.
Try using update with the multi option:
PlayerProfile.update(filter, updateDoc, { multi: true }, function(err, result) {
// Do something
})
The signature is:
Model.update(conditions, update, options, callback)