Handling 25,000 GET requests to an API - javascript

So basically I'm trying to make a ton of, (about 25,000), GET requests to an API. I'm using axios as my library for making HTTP calls.
So I have:
dsHistPromises.push(axios.get(url))
And then I'm using:
axios.all(dsHistPromises)
.then(function(results) {
results.forEach(function(response){
if (format === lists.formats.highlow) {
storage.darkskyHistoryHighLow.push(requests.parseDarkskyHighLow(response, city))
}
// parse data here and print it to files...
})
}).catch(err => {
throw err
})
to handle all of my promises.
When I try to run my code, I get errors like
(node:10400) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 2): Error: read ECONNRESET
I have to imagine this is an issue with the API I'm connecting to. The server probably is having a hard time managing my requests, correct?
Are there are any tricks to getting around this?

As you suspect, I'd say it was because of the API you're accessing. Most sane APIs will have some sort of throttle to prevent one user from bringing them down. You're probably running into that.
What you'll need to do is throttle your messages by putting a time delay in between calls.
There are a couple of ways to handle this. One would be to do each call in order, one at a time. This will help avoid throttling from the server, but may be slower.
If the server will allow it, you can do multiple requests in parallel.
Basically, it'd look something like this:
const urls = getUrlsToCallFromSomewhere();
const delay = 500; // ms
const next = () => {
if (!urls.length) {
return;
}
const url = urls.pop();
axios.get(url).then(data => {
results.push(data);
const now = Date.now();
if (now - start > delay) {
next();
} else {
setTimeout(next, delay - (now - start));
}
});
};
This isn't super perfect, but should give you an idea. Basically, call them one at a time. When it's done, check how long it's been. If you've hit the delay amount already, call it right away. If not, use setTimeout() to wait the amount of time.
You can arrange your code to get a few of these going at the same time to speed things up, and adjust the delay to be as long or as fast as you need.

Related

Can I cancel the execution of a promise? Trying to check thousands of links and don't want to wait for requests to time out

Disclaimer: I'm not experienced with programming or with networks in general so I might be missing something quite obvious.
So i'm making a function in node.js that should go over an array of image links from my database and check if they're still working. There's thousands of links to check so I can't just fire off several thousand fetch calls at once and wait for results, instead I'm staggering the requests, going 10 by 10 and doing head requests to minimize the bandwidth usage.
I have two issues.
The first one is that after fetching the first 10-20 links quickly, the other requests take quite a bit longer and 9 or 10 out of 10 of them will time out. This might be due to some sort of network mechanism that throttles my requests when there are many being fired at once, but I'm thinking it's likely due to my second issue.
The second issue is that the checking process slows down after a few iterations. Here's an outline of what I'm doing. I'm taking the string array of image links and slicing it 10 by 10 then I check those 10 posts in 10 promises: (ignore the i and j variables, they're there just to track the individual promises and timeouts for loging/debugging)
const partialResult = await Promise.all(postsToCheck.map(async (post, j) => await this.checkPostForBrokenLink(post, i + j)));
within checkPostForBrokenLink I have a race between the fetch and a timeout of 10 seconds because I don't want to have to wait for the connection to time out every time timing out is a problem, I give it 10 seconds and then flag it as having timed out and move on.
const timeoutPromise = index => {
let timeoutRef;
const promise = new Promise<null>((resolve, reject) => {
const start = new Date().getTime();
console.log('===TIMEOUT INIT===' + index);
timeoutRef = setTimeout(() => {
const end = new Date().getTime();
console.log('===TIMEOUT FIRE===' + index, end - start);
resolve(null);
}, 10 * 1000);
});
return { timeoutRef, promise, index };
};
const fetchAndCancelTimeout = timeout => {
return fetch(post.fileUrl, { method: 'HEAD' })
.then(result => {
return result;
})
.finally(() => {
console.log('===CLEAR===' + index); //index is from the parent function
clearTimeout(timeout);
});
};
const timeout = timeoutPromise(index);
const videoTest = await Promise.race([fetchAndCancelTimeout(timeout.timeoutRef), timeout.promise]);
if fetchAndCancelTimeout finishes before timeout.promise does, it will cancel that timeout, but if the timeout finishes first the promise is still "resolving" in the background, despite the code having moved on. I'm guessing this is why my code is slowing down. The later timeouts take 20-30 seconds from being set up to firing, despite being set to 10 seconds. As far as I know, this has to be because the main process is busy and doesn't have time to execute the event queue, though I don't really know what it could be doing except waiting for the promises to resolve.
So the question is, first off, am I doing something stupid here that I shouldn't be doing and that's causing everything to be slow? Secondly, if not, can I somehow manually stop the execution of the fetch promise if the timeout fires first so as not to waste resources on a pointless process? Lastly, is there a better way to check if a large number of links are valid that what I'm doing here?
I found the problem and it wasn't, at least not directly, related to promise buildup. The code shown was for checking video links but, for images, the fetch call was done by a plugin and that plugin was causing the slowdown. When I started using the same code for both videos and images, the process suddenly became orders of magnitude quicker. I didn't think to check the plugin at first because it was supposed to only do a head request and format the results which shouldn't be an issue.
For anyone looking at this trying to find a way to cancel a fetch, #some provided an idea that seems like it might work. Check out https://www.npmjs.com/package/node-fetch#request-cancellation-with-abortsignal
Something you might want to investigate here is the Bluebird Promise library.
There are two functions in particular that I believe could simplify your implementation regarding rate limiting your requests and handling timeouts.
Bluebird Promise.map has a concurrency option (link), which allows you to set the number of concurrent requests and it also has a Promise.timeout function (link) which will return a rejection of the promise if a certain timeout has occurred.

Cancel current websocket handling when a new websocket request is made

I am using npm ws module (or actually the wrapper called isomorphic-ws) for websocket connection.
NPM Module: isomorphic-ws
I use it to receive some array data from a websocket++ server running on the same PC. This data is then processed and displayed as a series of charts.
Now the problem is that the handling itself takes a very long time. I use one message to calculate 16 charts and for each of them I need to calculate a lot of logarithms and other slow operations and all that in JS. Well, the whole refresh operation takes about 20 seconds.
Now I could actually live with that, but the problem is that when I get a new request it is processed after the whole message handler is finished. And if I get several requests in the meantime, all of them shall be processed as they came in. And so the requests are there queued and the current state gets more and more outdated as the time goes on...
I would like to have a way of detecting that there is another message waiting to be processed. If that is the case I could just stop the current handler at any time and start over... So when using npm ws, is there a way of telling that there is another message waiting in to be processed?
Thanks
You need to create some sort of cancelable job wrapper. It's hard to give a concrete suggestion without seeing your code. But it could be something like this.
const processArray = array => {
let canceled = false;
const promise = new Promise((resolve, reject) => {
// do something with the array
for(let i = 0; i < array.length; i++) {
// check on each iteration if the job has been canceled
if(canceled) return reject({ reason: 'canceled' });
doSomething(array[i])
}
resolve(result)
})
return {
cancel: () => {
cancel = true
},
promise
}
}
const job = processArray([1, 2, 3, ...1000000]) // huge array
// handle the success
job.promise.then(result => console.log(result))
// Cancel the job
job.cancel()
I'm sure there are libraries to serve this exact purpose. But I just wanted to give a basic example of how it could be done.

Cloud Functions for Firebase: completing long processes without touching maximum timeout

I have to transcode videos from webm to mp4 when they're uploaded to firebase storage. I have a code demo here that works, but if the uploaded video is too large, firebase functions will time out on me before the conversion is finished. I know it's possible to increase the timeout limit for the function, but that seems messy, since I can't ever confirm the process will take less time than the timeout limit.
Is there some way to stop firebase from timing out without just increasing the maximum timeout limit?
If not, is there a way to complete time consuming processes (like video conversion) while still having each process start using firebase function triggers?
If even completing time consuming processes using firebase functions isn't something that really exists, is there some way to speed up the conversion of fluent-ffmpeg without touching the quality that much? (I realize this part is a lot to ask. I plan on lowering the quality if I absolutely have to, as the reason webms are being converted to mp4 is for IOS devices)
For reference, here's the main portion of the demo I mentioned. As I said before, the full code can be seen here, but this section of the code copied over is the part that creates the Promise that makes sure the transcoding finishes. The full code is only 70 something lines, so it should be relatively easy to go through if needed.
const functions = require('firebase-functions');
const mkdirp = require('mkdirp-promise');
const gcs = require('#google-cloud/storage')();
const Promise = require('bluebird');
const ffmpeg = require('fluent-ffmpeg');
const ffmpeg_static = require('ffmpeg-static');
(There's a bunch of text parsing code here, followed by this next chunk of code inside an onChange event)
function promisifyCommand (command) {
return new Promise( (cb) => {
command
.on( 'end', () => { cb(null) } )
.on( 'error', (error) => { cb(error) } )
.run();
})
}
return mkdirp(tempLocalDir).then(() => {
console.log('Directory Created')
//Download item from bucket
const bucket = gcs.bucket(object.bucket);
return bucket.file(filePath).download({destination: tempLocalFile}).then(() => {
console.log('file downloaded to convert. Location:', tempLocalFile)
cmd = ffmpeg({source:tempLocalFile})
.setFfmpegPath(ffmpeg_static.path)
.inputFormat(fileExtension)
.output(tempLocalMP4File)
cmd = promisifyCommand(cmd)
return cmd.then(() => {
//Getting here takes forever, because video transcoding takes forever!
console.log('mp4 created at ', tempLocalMP4File)
return bucket.upload(tempLocalMP4File, {
destination: MP4FilePath
}).then(() => {
console.log('mp4 uploaded at', filePath);
});
})
});
});
Cloud Functions for Firebase is not well suited (and not supported) for long-running tasks that can go beyond the maximum timeout. Your only real chance at using only Cloud Functions to perform very heavy compute operations is to find a way to split up the work into multiple function invocations, then join the results of all that work into a final product. For something like video transcoding, that sounds like a very difficult task.
Instead, consider using a function to trigger a long-running task in App Engine or Compute Engine.
As follow up for the random anonymous person who tries to figure out how to get past transcoding videos or some other long processes, here's a version of the same code example I gave that instead sends a http request to a google app engine process which transcodes the file. No documentation for it as of right now, but looking at Firebase/functions/index.js code and the app.js code may help you with your issue.
https://github.com/Scew5145/GCSConvertDemo
Good luck.

How can I timeout slow Connect middleware and instead return the Node response?

I have a Node server that uses Connect to insert some middleware which attempt to transform a response stream from node-http-proxy. Occasionally this transformation can be quite slow and it would be preferable in such cases to simply return a response that doesn't include the transformations or alternatively includes their partial application.
In my application I've attempted to use setTimeout to call next after some number of milliseconds in the context of the transformation middleware. This generally works but exposes a race condition where if the middleware has already called next and then setTimeout fires and does the same an error occurs that looks like: Error: Can't set headers after they are sent.
Eventually I evolved the setTimeout to invoke next with an Error instance as its first argument and then later on in my middleware chain would catch that error and assuming res.headersSent was false would start sending the response via res.end.call(res). This worked and surprisingly I could set the timeout to nearly nothing and the response would happen significantly faster and be complete.
I feel like this last method is a bit of a hack and not immune from the same race condition, but perhaps appears to be a little more resilient. So I would like to know what sort of idiomatic approaches Node and Connect have for dealing with this kind of thing.
How can I go about timing out slow middleware and simply return the response stream?
Currently this seems to do what I want, more or less, but again feels a bit gross.
let resTimedout = false;
const timeout = setTimeout(() => {
if (!resTimedout) {
resTimedout = true;
next();
}
}, 100);
getSelectors(headers, uri, (selectors) => {
const resSelectors = Object.keys(selectors).map((selector) => {
...
};
const rewrite = resRewrite(resSelectors);
rewrite(req, res, () => {
if (!resTimedout) {
resTimedout = true;
clearTimeout(timeout);
next();
}
});
});
setTimeout returns the id of the timeout, so you can then run clearTimeout passing in the id. So when the transformation is complete just clear the timeout before you call next.
var a = setTimeout(()=>{}, 3000);
clearTimeout(a);
https://developer.mozilla.org/en-US/docs/Web/API/WindowOrWorkerGlobalScope/setTimeout
Use async.timeout or Promise.timeout from Bluebird or Q library
You could eliminate the need for global variables and decide this per request:
const rewrite = resRewrite(resSelectors);
rewrite(req, res, () => {
// set a timer to fail the function early
let timer = setTimeout(() => {
timer = null;
next();
}, 100);
// do the slow response transformation
transformResponse((err, data) => { // eg. callback handler
clearTimeout(timer);
if (timer) next();
});
});
How it works
If the timer ends first, it sets itself to null and calls next(). When the transform function ends, it will see the timeout is null and not call next().
If the response transform is faster, it clears the timeout to prevent it running next later on.

Run 1000 requests so that only 10 runs at a time

With node.js I want to http.get a number of remote urls in a way that only 10 (or n) runs at a time.
I also want to retry a request if an exception occures locally (m times), but when the status code returns an error (5XX, 4XX, etc) the request counts as valid.
This is really hard for me to wrap my head around.
Problems:
Cannot try-catch http.get as it is async.
Need a way to retry a request on failure.
I need some kind of semaphore that keeps track of the currently active request count.
When all requests finished I want to get the list of all request urls and response status codes in a list which I want to sort/group/manipulate, so I need to wait for all requests to finish.
Seems like for every async problem using promises are recommended, but I end up nesting too many promises and it quickly becomes uncypherable.
There are lots of ways to approach the 10 requests running at a time.
Async Library - Use the async library with the .parallelLimit() method where you can specify the number of requests you want running at one time.
Bluebird Promise Library - Use the Bluebird promise library and the request library to wrap your http.get() into something that can return a promise and then use Promise.map() with a concurrency option set to 10.
Manually coded - Code your requests manually to start up 10 and then each time one completes, start another one.
In all cases, you will have to manually write some retry code and as with all retry code, you will have to very carefully decide which types of errors you retry, how soon you retry them, how much you backoff between retry attempts and when you eventually give up (all things you have not specified).
Other related answers:
How to make millions of parallel http requests from nodejs app?
Million requests, 10 at a time - manually coded example
My preferred method is with Bluebird and promises. Including retry and result collection in order, that could look something like this:
const request = require('request');
const Promise = require('bluebird');
const get = Promise.promisify(request.get);
let remoteUrls = [...]; // large array of URLs
const maxRetryCnt = 3;
const retryDelay = 500;
Promise.map(remoteUrls, function(url) {
let retryCnt = 0;
function run() {
return get(url).then(function(result) {
// do whatever you want with the result here
return result;
}).catch(function(err) {
// decide what your retry strategy is here
// catch all errors here so other URLs continue to execute
if (err is of retry type && retryCnt < maxRetryCnt) {
++retryCnt;
// try again after a short delay
// chain onto previous promise so Promise.map() is still
// respecting our concurrency value
return Promise.delay(retryDelay).then(run);
}
// make value be null if no retries succeeded
return null;
});
}
return run();
}, {concurrency: 10}).then(function(allResults) {
// everything done here and allResults contains results with null for err URLs
});
The simple way is to use async library, it has a .parallelLimit method that does exactly what you need.

Categories