I want to constantly sync data from a website, however I only have 300 calls/15 minutes. Thus I thought i could put all my sync requests (around 1000) into an array and then resolve just 300 of them every 15 minutes till the requests array is empty and then start again. However when I do the following:
let requests = []
params = 'invoice-items?invoice_id='
let positions = null
for (const invoice of invoices) {
requests.push(new Promise(async (resolve, reject) => {
positions = await getBillomatData(params + invoice.id, null, 0, null)
await updateDatabase(positions, models.billomat.Position)
}))
}
console.log(requests[0])
await requests[0]
console.log(requests[0])
As soon as I wait for the request at requests[0] it executes all of the requests and I go over the limit of calls.
All async calls are executed at once because JavaScript executes ALL code as soon as each call is processed. Await awaits the result, it does not await execution.
You'll need to use a tool such as bottleneck to rate limit your requests.
Processing 300 requests of 1000 every 15 minutes will take an hour to complete. This is a long time to keep the node job running doing nothing.
A basic a basic limiter might keep track of all the requests you want to make in an external file or database and then use a cron job to execute your JavaScript code every 15 minutes to process another 300 requests. If there are no additional requests, your app could just terminate. However, it will wake up and run every 15 minutes as long as the cron job keeps running.
The most simplest approach (not optimised though) would be
make batches of 300 calls
execute one batch and wait for all of them to be resolved before proceeding to next batch
let batch = []
// imagine urls is a array of url of 900 items
urls.map(async (url)=>{
batch.push(somePromiseFuctionToDoTheApiCall(url))
if(batch.length >= 300){
await Promise.all(batch)
// sleep is a promisified settimeout function, ref: https://stackoverflow.com/a/56520579/3359432
await sleep(calculateTimeToWaitBeforeProceedingToNextBatch)
batch = []
}
})
// there might be some leftovers at the end of batch you should process them also
if you are ok with using libraries and stop reinventing the wheel then have a look at lodash.chunk, bluebird.map, bluebird.each, bluebird.delay
Related
I am making BDD test with a Cucumber-Playwright suit. A page I am making a test for has buttons that will trigger a PUT API request and update the page (Note: The button will not link to new address, just trigger and API request).
I want to make sure that all network events have finished before moving on to the next step as it may try to act too soon before the API request has returned and cause the test to fail.
I want to avoid hard waits so I read the documentation and found a step structure that uses Promise.all([]) to combine two or more steps. From what I understand they will check that each step in the array to be true at the same time before moving on.
So the steps looks like this:
await Promise.all([inviteUserButton.click(), page.waitForLoadState('networkidle')])
await page.goto('https://example/examplepage')
This stage of the test is flaky however, it will work about 2/3 times. From the trace files I read to debug the test I see that the the network response repsondes with net::ERR_ABORTED POST https://.....
I believe this is due to to the page.goto() step has interrupted the network request/response. Due to this, it will cause the coming assertion to fail as it was not completed.
Is there a way to test that all the pages network events have finished from a onClick event or similar before moving onto the next step?
How about this:
The correct practice is to wait for response before clicking the button to avoid any race conditions so that it will work correctly every time as long as api(s) returning back with ok response.
In this , you may add N number of api calls with commas where response from multiple api is expected.
Note: You may need to change the response status code from 200 to 204 , depending on how the api is implemented.
const [resp]= await Promise.all([
this.page.waitForResponse(resp => resp.url().includes('/api/odata/any unique sub string of API endpoint') && resp.status() === 200),
//API2 ,
//API3,
//APIN,
this.page.click('inviteUSerButtonLocator'),
]);
const body= await resp.json() //Next step of your scenario
If you don't wrap it in a promise is it still flaky?
await inviteUserButton.click();
await page.waitForLoadState('networkidle');
await page.goto('https://example/examplepage');
How long do the network requests last? The default timeout is set to 30 seconds, did you try to increase it?
waitForLoadState(state?: "load"|"domcontentloaded"|"networkidle", options?: {
/**
* Maximum operation time in milliseconds, defaults to 30 seconds, pass `0` to disable timeout. The default value can be
* changed by using the
* [browserContext.setDefaultNavigationTimeout(timeout)](https://playwright.dev/docs/api/class-browsercontext#browser-context-set-default-navigation-timeout),
* [browserContext.setDefaultTimeout(timeout)](https://playwright.dev/docs/api/class-browsercontext#browser-context-set-default-timeout),
* [page.setDefaultNavigationTimeout(timeout)](https://playwright.dev/docs/api/class-page#page-set-default-navigation-timeout)
* or [page.setDefaultTimeout(timeout)](https://playwright.dev/docs/api/class-page#page-set-default-timeout) methods.
*/
timeout?: number;
}): Promise<void>;
You can try and wait for more milliseconds
await Promise.all([inviteUserButton.click(), page.waitForLoadState('networkidle', { timeout:60_000 })])
await page.goto('https://example/examplepage')
So I've got a React App that creates a video, this is a very long api request taking between 1 and 10 minutes to resolve. I have a separate api call which I need to run continually every couple of seconds to check the status until the first promise is resolved (and the video is compiled).
const promise1 = axios.post("/api/create", data);
//promise1 takes between 1 and 10 minutes to resolve (video creation).
const promise2 = axios.get(`/progress-${uniqueId}.txt`);
// I need promise2 (which checks status of promise1) to continually run
//at an interval (every 5 seconds?) until promise 1 resolves
Promise.race([promise1, promise2]).then(res=>{
//this obviously returns promise2 first, as expected, but
//once it comes back I need it to refire after 5 seconds
//continually until promise 1 resolves
console.log(res)});
};
Any idea how I can recursively call Promise2 until Promise1 Resolves?
Promises, by definition, are functions which return a value at most once, at a later point in time. You can't re-run a promise, the best you can do is recreate one using some factory pattern.
Alongside that, you need a mechanism to check if your create promise has been fulfilled.
// Send create request
const creationPromise = axios.post("/api/create", data);
// Track creationPromise state
let isCreated = false;
creationPromise.then(() => isCreated = true);
// factory for creating a new progress request on-demand
const progressFactory = () => axios.get(`/progress-${uniqueId}.txt`);
// While the created request hasn't completed, loop
while (!isCreated) {
// Send new progress request
const progress = await progressFactory();
console.log("progress", progress);
}
// Done, create returned
console.log("Finished!");
Well I have another approach. How about if instead of hanging there for up to ten minutes, send whatever is needed to the backend as soon as you get it send back a status of 202 = The HyperText Transfer Protocol (HTTP) 202 Accepted response status code indicates that the request has been accepted for processing, but the processing has not been completed; in fact, processing may not have started yet. The request might or might not eventually be acted upon, as it might be disallowed when processing actually takes place. you don’t have to send the response at the end you can do it at any time by doing so you release the client while the server keeps processing.
I am reading data in realtime from a device through http requests, however the way I am currently doing it is, something like this:
setInterval(() => {
for (let request of requests) {
await request.GetData();
}
}, 1000);
However, sometimes there is lag in the network, and since there are 4-5 requests, sometimes they don't all finish within a second, so they start stacking up, until the device eventually starts to timeout, so I need to somehow get rid of the setInterval. Increasing the time is not an option.
Essentially, I want them to get stuck in an infinite loop and I can add an inner timer to let the requests run again when half a second or a second has passed since the last run, but how do I get them stuck in an infinite loop without blocking the rest of the application?
Or maybe a way to make setInterval wait for all requests to finish before starting to count the 1 second interval?
Try:
(async () => {
while (true) {
for (let request of requests) {
await request.GetData();
}
await new Promise((resolve) => setTimeout(resolve, 1000));
}
})();
This will only start waiting once all the requests are finished, preventing them from stacking up.
Alternatively, on the inside of the async function, use:
while (true) {
await Promise.all(requests.map(request => request.GetData()));
await new Promise((resolve) => setTimeout(resolve, 1000));
}
This is different because all the calls to request.GetData() will run concurrently, which may or may not be what you want.
Disclaimer: I'm not experienced with programming or with networks in general so I might be missing something quite obvious.
So i'm making a function in node.js that should go over an array of image links from my database and check if they're still working. There's thousands of links to check so I can't just fire off several thousand fetch calls at once and wait for results, instead I'm staggering the requests, going 10 by 10 and doing head requests to minimize the bandwidth usage.
I have two issues.
The first one is that after fetching the first 10-20 links quickly, the other requests take quite a bit longer and 9 or 10 out of 10 of them will time out. This might be due to some sort of network mechanism that throttles my requests when there are many being fired at once, but I'm thinking it's likely due to my second issue.
The second issue is that the checking process slows down after a few iterations. Here's an outline of what I'm doing. I'm taking the string array of image links and slicing it 10 by 10 then I check those 10 posts in 10 promises: (ignore the i and j variables, they're there just to track the individual promises and timeouts for loging/debugging)
const partialResult = await Promise.all(postsToCheck.map(async (post, j) => await this.checkPostForBrokenLink(post, i + j)));
within checkPostForBrokenLink I have a race between the fetch and a timeout of 10 seconds because I don't want to have to wait for the connection to time out every time timing out is a problem, I give it 10 seconds and then flag it as having timed out and move on.
const timeoutPromise = index => {
let timeoutRef;
const promise = new Promise<null>((resolve, reject) => {
const start = new Date().getTime();
console.log('===TIMEOUT INIT===' + index);
timeoutRef = setTimeout(() => {
const end = new Date().getTime();
console.log('===TIMEOUT FIRE===' + index, end - start);
resolve(null);
}, 10 * 1000);
});
return { timeoutRef, promise, index };
};
const fetchAndCancelTimeout = timeout => {
return fetch(post.fileUrl, { method: 'HEAD' })
.then(result => {
return result;
})
.finally(() => {
console.log('===CLEAR===' + index); //index is from the parent function
clearTimeout(timeout);
});
};
const timeout = timeoutPromise(index);
const videoTest = await Promise.race([fetchAndCancelTimeout(timeout.timeoutRef), timeout.promise]);
if fetchAndCancelTimeout finishes before timeout.promise does, it will cancel that timeout, but if the timeout finishes first the promise is still "resolving" in the background, despite the code having moved on. I'm guessing this is why my code is slowing down. The later timeouts take 20-30 seconds from being set up to firing, despite being set to 10 seconds. As far as I know, this has to be because the main process is busy and doesn't have time to execute the event queue, though I don't really know what it could be doing except waiting for the promises to resolve.
So the question is, first off, am I doing something stupid here that I shouldn't be doing and that's causing everything to be slow? Secondly, if not, can I somehow manually stop the execution of the fetch promise if the timeout fires first so as not to waste resources on a pointless process? Lastly, is there a better way to check if a large number of links are valid that what I'm doing here?
I found the problem and it wasn't, at least not directly, related to promise buildup. The code shown was for checking video links but, for images, the fetch call was done by a plugin and that plugin was causing the slowdown. When I started using the same code for both videos and images, the process suddenly became orders of magnitude quicker. I didn't think to check the plugin at first because it was supposed to only do a head request and format the results which shouldn't be an issue.
For anyone looking at this trying to find a way to cancel a fetch, #some provided an idea that seems like it might work. Check out https://www.npmjs.com/package/node-fetch#request-cancellation-with-abortsignal
Something you might want to investigate here is the Bluebird Promise library.
There are two functions in particular that I believe could simplify your implementation regarding rate limiting your requests and handling timeouts.
Bluebird Promise.map has a concurrency option (link), which allows you to set the number of concurrent requests and it also has a Promise.timeout function (link) which will return a rejection of the promise if a certain timeout has occurred.
With node.js I want to http.get a number of remote urls in a way that only 10 (or n) runs at a time.
I also want to retry a request if an exception occures locally (m times), but when the status code returns an error (5XX, 4XX, etc) the request counts as valid.
This is really hard for me to wrap my head around.
Problems:
Cannot try-catch http.get as it is async.
Need a way to retry a request on failure.
I need some kind of semaphore that keeps track of the currently active request count.
When all requests finished I want to get the list of all request urls and response status codes in a list which I want to sort/group/manipulate, so I need to wait for all requests to finish.
Seems like for every async problem using promises are recommended, but I end up nesting too many promises and it quickly becomes uncypherable.
There are lots of ways to approach the 10 requests running at a time.
Async Library - Use the async library with the .parallelLimit() method where you can specify the number of requests you want running at one time.
Bluebird Promise Library - Use the Bluebird promise library and the request library to wrap your http.get() into something that can return a promise and then use Promise.map() with a concurrency option set to 10.
Manually coded - Code your requests manually to start up 10 and then each time one completes, start another one.
In all cases, you will have to manually write some retry code and as with all retry code, you will have to very carefully decide which types of errors you retry, how soon you retry them, how much you backoff between retry attempts and when you eventually give up (all things you have not specified).
Other related answers:
How to make millions of parallel http requests from nodejs app?
Million requests, 10 at a time - manually coded example
My preferred method is with Bluebird and promises. Including retry and result collection in order, that could look something like this:
const request = require('request');
const Promise = require('bluebird');
const get = Promise.promisify(request.get);
let remoteUrls = [...]; // large array of URLs
const maxRetryCnt = 3;
const retryDelay = 500;
Promise.map(remoteUrls, function(url) {
let retryCnt = 0;
function run() {
return get(url).then(function(result) {
// do whatever you want with the result here
return result;
}).catch(function(err) {
// decide what your retry strategy is here
// catch all errors here so other URLs continue to execute
if (err is of retry type && retryCnt < maxRetryCnt) {
++retryCnt;
// try again after a short delay
// chain onto previous promise so Promise.map() is still
// respecting our concurrency value
return Promise.delay(retryDelay).then(run);
}
// make value be null if no retries succeeded
return null;
});
}
return run();
}, {concurrency: 10}).then(function(allResults) {
// everything done here and allResults contains results with null for err URLs
});
The simple way is to use async library, it has a .parallelLimit method that does exactly what you need.