Javascript delay after increase range in loop to download images from server - javascript

I have 1 array with thousands link image like this
let imageList = ["http://img1.jpg", "http://img2.jpg", ...];
I want to loop over the imageList and delay after 20 times (n times) increase index like
for(let i = 0; i <= imageList.length; i+=20){
// with i from 0 -> 20
// do download image from server
downloadImages(0,20) // [start, end]
// delay 5s to prevent server timeout because request many times
// with i from 20 -> 40
// do download image from server
downloadImages(20,40)
// continue delay 5s
// .... try to finish
}

Try smth like this:
const imageList = ['***'];
downloadImages(imageList)
.then(/* done */)
.catch(/* error */);
async function downloadImages(images) {
for(let i = 0; i + 20 <= imageList.length; i += 20){
const n20images = imageList.slice(i, i + 20);
await fetchImages(n20images);
await delay(5);
}
}
function fetchImages(images) {
return Promise.all(
images.map(image => /* fetch(image) or smth else */)
)
}
function delay(seconds) {
return new Promise(resolve => setTimeout(resolve, seconds * 1000))
}

You can use modulus operator.
let imageList = ["http://img1.jpg", "http://img2.jpg", ...];
for(let i = 0; i <= imageList.length; i++){
if (i % n == 0) //n is number of images to start delay
START_YOUR_DELAY_HERE
downloadImage(20); //20 is number of images you want to download
}

I use async/await, for of and chunk from lodash for this kind of situation. It'll make the requests in groups of 20 for not flooding the server
let i = 0
const imageListChunks = _.chunk(imageList, 20)
for await (const chunk of imageListChunks){
const chunkPromises = downloadImage(0 + i*20, 20 + i*20)
const chunkResp = await Promise.all(chunkPromises)
i = i + 1
}
If you need more delay to let the server breath you can add a setTimeout with another await to slow it more

You can use async and setTimeout to achieve this:
let downloadImage = async url => {
console.log(`Downloading ${url}`);
// Simulate a download delay
return new Promise(r => setTimeout(r, 100 + Math.floor(Math.random() * 500)));
}
let downloadAllImages = async (imageUrls, chunkSize=20, delayMs=2000) => {
for (let i = 0; i < imageUrls.length; i += chunkSize) {
console.log(`Downloading chunk [${i}, ${i + 20 - 1}]`);
// This `chunk` is a subset of `imageUrls`: the ones to be downloaded next
let chunk = imageUrls.slice(i, i + 20);
// Call `downloadImage` on each item in the chunk; wait for all downloads to finish
await Promise.all(chunk.map(url => downloadImage(url)));
// Unless this is the last chunk, wait `delayMs` milliseconds before continuing
// (Note: this step may be poor practice! See explanation at bottom of this answer)
if ((i + chunkSize) < imageUrls.length) await new Promise(r => setTimeout(r, delayMs));
}
};
// Create an array of example urls
let imageUrls = [ ...new Array(100) ].map((v, n) => `http://example.com/image${n}.jpg`);
// Call `downloadAllImages`
downloadAllImages(imageUrls)
// Use `.then` to capture the moment when all images have finished downloading
.then(() => console.log(`Finished downloading ${imageUrls.length} images!`));
Note that if you implement downloadImage correctly, so that it returns a promise which resolves when the image is downloaded, it may be best practice to forego the timeout. The timeout is a heuristic way of ensuring not too many requests are running at once, but if you have a fine-grained sense of when a request finishes you can simply wait for a batch of requests to finish before beginning the next batch.
There is an even more efficient design to consider (for your further research). To understand, let's think about a problem with this current approach (which I'll call the "batch" approach). The batch approach is incapable of beginning a new batch until the current one completes. Imagine a batch of 20 images where 1 downloads in 1ms, 18 of them download within 5ms, but the final image takes 10+ seconds to download; even though this system ought to have the bandwidth to download 20 images at once, it winds up spending 10 entire seconds with only a single request in progress. A more efficient design (which we can call the "maximal bandwidth approach") would maintain a queue of 20 in-progress requests, and every time one of those requests completes a new request is immediately begun. Imagine that first image which downloads in 1ms; the moment it finishes, and only 19 requests are in progress, the "maximal bandwidth approach" could begin a new request right away without waiting for those other 19 to finish.

Set some offset
let offset = 0
for (let i = offset; i <= imageList.length; i += 20) {
downloadImage(offset, offset + 20)
offset += 20
}

Related

How to use promise with set interval

I’m trying to solve a simple problem here but I have no idea what direction to take.
getAuthNumber() // returns a promise with a number (eg 98765)
// response times can be 5s-20s
<div class=“auth”> </div>
//code
let counter = 0;
let el = document.getElementsbyClassName(“auth”)[0];
let func = setInterval(function(){
counter++;
getAuthNumber().then((num)=>{
return [num, counter];
}).then(res){
If(counter == res[1])
el.innerHTML = res[0];
}, 10000);
I need to write a function that gets the auth number every 10s & displays it in the block below. I’ve tried using set interval but getAuthNumber() can take more than 10s to return in which case, I need to discard that response and only show the current value.
Do not use setInterval instead use setTimeOut function.
setInterval execution depends on the CPU usage, if it in case increased the setinterval will not gets completed in the interval specified
What you can do is run an async function inside setInterval. In the following code snippet, the getAuth function completes after 2s but setInterval runs every 1s. But it still works because there is an async funciton inside setInterval.
const getAuth = () => {
return new Promise((res, rej) => {
setTimeout(() => res(Math.random()), 2000);
});
};
const setDiv = async () => {
const res = await getAuth();
console.log(res);
};
setInterval(setDiv, 1000);
I have adapted this gist by Jake Archibald (see JavaScript counters the hard way - HTTP 203) into the following code:
function promiseInterval(milliseconds, signal, promiseFactory, callback) {
const start = performance.now();
function tick(time) {
if (signal.aborted){
return;
}
promiseFactory().then(
value => {
callback(value);
scheduleTick(time);
}
);
}
function scheduleTick(time) {
const elapsed = time - start;
const roundedElapsed = Math.round(elapsed / milliseconds) * milliseconds;
const targetNext = start + roundedElapsed + milliseconds;
const delay = targetNext - performance.now();
setTimeout(tick, delay);
}
scheduleTick(start);
}
Starting from the gist, I have removed the use of requestAnimationFrame and document.timeline.currentTime (using only performance.now), and I have added the promiseFactory parameter, plus some renaming (animationInterval renamed to promiseInterval, ms renamed to milliseconds and scheduleFrame renamed to scheduleTick) and formatting.
You would use it like this:
const controller = new AbortController(); // This is used to stop
promiseInterval(
10000, // 10s
controller.signal, // signal, to stop the process call `controller.abort`
getAuthNumber, // the promise factory
num => {el.innerHTML = num;} // this is what you do with the values
);
It will not really call getAuthNumber each 10 seconds. Instead, it will wait until getAuthNumber completes and schedule to call on the next 10 seconds interval, and repeat. So it is not calling it multiple times and discarding values.

Getting progress from web-worker executing computationally intensive calculation

I have WebWorker doing computationally intensive recursive calculation, lasting for several seconds. I would like to post message with progress to parent thread (main window) let say every 500 milliseconds.
I tried to use setInterval to achieve this. But since thread is blocked by main calculation, setInterval was not executed at all during that time.
Web worker code:
// global variable holding some partial information
let temporal = 0;
// time intensive recursive function. Fibonacci is chosen as an example here.
function fibonacci(num) {
// store current num into global variable
temporal = num;
return num <= 1
? 1
: fibonacci(num - 1) + fibonacci(num - 2);
};
self.onmessage = function(e) {
// start calculation
const result = fibonacci(e.data.value);
postMessage({result});
}
setInterval(function() {
// post temporal solution in interval.
// While the thread is blocked by recursive calculation, this is not executed
postMessage({progress: temporal});
}, 500);
Main window code
worker.onmessage = (e) => {
if (e.data.progress !== undefined) {
console.log('progress msg received')
} else {
console.log('result msg received')
console.log(e.data)
}
};
console.log('starting calculation');
worker.postMessage({
'value': 42,
});
See jsFiddle example - https://jsfiddle.net/m3geaxbo/36/
Of course, I could add some code to calculate passed time into fibonacci function and send message from there. But I don't like it, because it pollutes function with non-relevant code.
function fibonacci(num) {
// such approach will work, but it is not very nice.
if (passed500ms()) {
postMessage({progress: num})
}
return num <= 1
? 1
: fibonacci(num - 1) + fibonacci(num - 2);
};
Is there preferred way, how to get progress of the intensive web-worker calculation without polluting code performing calculation itself?
There is no way to let your algorithm perform synchronously without integrating some sort of yielding inside.
You'd have to adapt your algorithm so that you can pause it, and check if enough time has elapsed, or even let the event-loop to actually loop.
Letting the event loop perform other tasks is my personal favorite, since it also allows the main thread to communicate with the Worker, however, if you are really just wanting for it to verbose the current progress, a simple and synchronous time check is just fine.
Note that recursive functions by their very nature aren't really usable in such a case, because the values the function will generate at the 5th nesting level will not reflect the value you would have gotten by calling the main function with 5 as input.
So getting intermediate values using a recursive function is very tedious.
However, a fibonacci calculator can be rewritten inline really easily:
function fibonacci( n ) {
let a = 1, b = 0, temp;
while( n >= 0 ) {
temp = a;
a = a + b;
b = temp;
n--;
}
return b;
}
From here it's super easy to add the time-elapsed check and quite simple to rewrite it in a way we can pause it in the middle:
async function fibonacci( n ) {
let a = 1, b = 0, temp;
while( n >= 0 ) {
temp = a;
a = a + b;
b = temp;
n--;
if( n % batch_size === 0 ) { // we completed one batch
current_value = b; // let the outside scripts know where we are
await nextTask(); // let the event-loop loop.
}
}
return b;
}
To pause a function in the middle the async/await syntax comes very handy as it allows us to write a linear code, instead of having several intricated recursive callbacks.
The best thing you can use to let the event-loop to loop is, as demonstrated in this answer, to use a MessageChannel as a next-task scheduler.
Now, you can let your preferred scheduling method get in between these pauses and do the messaging to the main port, or listen for updates from the main thread.
But inlining your function also improves the performances so much that you can calculate the full sequence until Infinity in less than a few ms... (fibonacci( 1476 ) does return Infinity).
So fibonacci is not a great candidate to demonstrate this issue, let's rather calculate π.
I am borrowing a function to calculate PI from this answer, not judging if it's performant or not, it's simply for the sake of demonstrating how to let the Worker thread pause a long running function.
// Main thread code
const log = document.getElementById( "log" );
const url = generateWorkerURL();
const worker = new Worker( url );
worker.onmessage = ({data}) => {
const [ PI, iterations ] = data;
log.textContent = `π = ${ PI }
after ${ iterations } iterations.`
};
function generateWorkerURL() {
const script = document.querySelector( "[type='worker-script']" );
const blob = new Blob( [ script.textContent ], { type: "text/javascript" } );
return URL.createObjectURL( blob );
}
<script type="worker-script">
// The worker script
// Will get loaded dynamically in this snippet
// first some helper functions / monkey-patches
if( !self.requestAnimationFrame ) {
self.requestAnimationFrame = (cb) =>
setTimeout( cb, 16 );
}
function postTask( cb ) {
const channel = postTask.channel;
channel.port2.addEventListener( "message", () => cb(), { once: true } );
channel.port1.postMessage( "" );
}
(postTask.channel = new MessageChannel()).port2.start();
function nextTask() {
return new Promise( (res) => postTask( res ) );
}
// Now the actual code
// The actual processing
// borrowed from https://stackoverflow.com/a/50282537/3702797
// [addition]: made async so it can wait easily for next event loop
async function calculatePI( iterations = 10000 ) {
let pi = 0;
let iterator = sequence();
let i = 0;
// [addition]: start a new interval task
// which will report to main the current values
// using an rAF loop as it's the best to render on screen
requestAnimationFrame( function reportToMain() {
postMessage( [ pi, i ] );
requestAnimationFrame( reportToMain );
} );
// [addition]: define a batch_size
const batch_size = 10000;
for( ; i < iterations; i++ ){
pi += 4 / iterator.next().value;
pi -= 4 / iterator.next().value;
// [addition]: In case we completed one batch,
// we'll wait the next event loop iteration
// to let the interval callback fire.
if( i % batch_size === 0 ) {
await nextTask();
}
}
function* sequence() {
let i = 1;
while( true ){
yield i;
i += 2;
}
}
}
// Start the *big* job...
calculatePI( Infinity );
</script>
<pre id="log"></pre>

Break out of while loop with Async/Await

Context:
I have a while loop that i want to run 10 times and within i have async/awaait code that runs every 3 seconds. The while loop works as sort of a timeout if it ever runs 10 times and the async/await check doesnt return the expected value then, break out of the while loop the process timed out.
Problem: The break out of loop portion of the code is running first with the value of i(loop variable) maxed out. As i figure the way i have it setup i have no access to the value of i while is looping and only when i is at its max value.
Question: How can i escape out of this loop early when condition is met or if i is exhausted?
var i = 0;
//Run 10 Times
while (i < 10) {
//Run every 3 seconds
((i) => {
setTimeout( async () => {
isAuthenticated = await eel.is_authenticated()();
sessionStorage.sessionStatus = JSON.stringify(isAuthenticated);
console.log('------NEW STATUS-------');
console.log(JSON.parse(sessionStorage.sessionStatus).authenticated);
console.log('Inside:' + i);
}, 3000 * i)
})(i++);
//Break out of loop early if condition is met or I is exhausted, but it only runs 1 time and i is always max
if (i === 9 || JSON.parse(sessionStorage.sessionStatus).authenticated) {
console.log('Outside:' + i);
checkStatus('retried');
break;
}
}
NOTE: In case anyone is wondering eel.is_authenticated()(); is not a typo, its a python library to create desktop applications the double ()() is normal.
Also if this approach is overly complicated for what it does, any other ways to approach it are welcome :)
Thanks
The issue here is that you are running through all your loop iterations immediately (10 times), setting up 10 timeouts in the process, 3s apart from each other:
Your loop runs 10 times, creating 10 timeouts
You reach the i === 9 case
3s later, 1st timeout runs
3s later, 2nd timeout runs
...
What you want is for the loop to actually wait 3s between iterations. For that, you'd use setTimeout in a different way - you'd create a promise that resolves when the timeout is hit and await that promise:
// As helper function for readability
const delay = ms => new Promise(resolve => setTimeout(resolve, ms));
// Your loop
let i;
for (i = 0; i < 10; i++) {
// Wait 3s
await delay(3000);
// Check authentication
const isAuthenticated = await eel.is_authenticated()();
sessionStorage.sessionStatus = JSON.stringify(isAuthenticated);
console.log('------NEW STATUS-------');
console.log(JSON.parse(sessionStorage.sessionStatus).authenticated);
console.log('Inside:' + i);
// Break loop if authenticated
if (isAuthenticated.authenticated) break;
}
// We were authenticated, or looped 10 times
// Note: Because I moved this outside, i will now actually be 10 if the loop
// ended by itself, not 9.
console.log('Outside:' + i);
checkStatus('retried');
One consequence here though would be that if the call to is_authenticated takes a significant amount of time, the checks will be more than 3s apart, because we are now waiting for 3s and for this call. If this is undesired, we can reduce the delay time based on how much time elapsed since the last call:
// As helper function for readability
const delay = ms => new Promise(resolve => setTimeout(resolve, ms));
// We will save here when the last delay completed, so that the checks are always
// 3s apart (unless the check takes longer than 3s)
// Initially we save the current time so that the first wait is always 3s, as before
let lastIteration = Date.now();
// Your loop
let i;
for (i = 0; i < 10; i++) {
// Wait until 3s after last iteration (limited to 0ms, negative waits won't work)
await delay(Math.max(lastIteration + 3000 - Date.now(), 0));
// Update "last iteration" time so the next delay will wait until 3s from now again
lastIteration = Date.now();
// Check authentication
const isAuthenticated = await eel.is_authenticated()();
sessionStorage.sessionStatus = JSON.stringify(isAuthenticated);
console.log('------NEW STATUS-------');
console.log(JSON.parse(sessionStorage.sessionStatus).authenticated);
console.log('Inside:' + i);
// Break loop if authenticated
if (isAuthenticated.authenticated) break;
}
// We were authenticated, or looped 10 times
// Note: Because I moved this outside, i will now actually be 10 if the loop
// ended by itself, not 9.
console.log('Outside:' + i);
checkStatus('retried');
All of this assumes that the function in which this code is located is async. If it isn't, you need to make it async but then you need to remember adding a .catch(e => handleTheErrorSomehow(e)) when you call it, to avoid unhandled promise rejections!

limiting concurrency in javascript node js

I have the following code:
const rl = require('readline').createInterface({
input: require('fs').createReadStream(__dirname + '/../resources/profiles.txt'),
terminal: true
});
for await (const line of rl) {
scrape_profile(line)
}
scrape_profile is a function that makes some request to the web and perform some processing. now the issue is that i wanted to limit so that 5 scrape_profile is executed per 30 seconds.. as of now if i have a text file with 1000 lines, it would go ahead and execute 1000 concurrent requests at one time.. how do i limit this ?
I'm not entirely sure why you're using a readlineInterface if you're asynchronously reading the entire file into memory at once, so for my answer, I've replaced it with a call to fs.readFileSync as it's much easier to deal with finite values than a stream and the question didn't explicitly state the file IO needed to be streamed.
You could try using Bluebird Promise.reduce:
const fs = require('fs');
const lines = fs.readFileSync('./test.txt').toString().split('\r\n');
const Promise = require('bluebird');
const BATCHES = 5;
const scrape_profile = file => new Promise((resolve, reject) => {
setTimeout(() => {
console.log("Done with", file);
resolve(Math.random());
}, Math.random() * 1000);
});
const runBatch = batchNo => {
const batchSize = Math.round(lines.length / BATCHES);
const start = batchSize * batchNo;
const end = batchSize * (batchNo + 1);
const index = start;
return Promise.reduce(lines.slice(start, end), (aggregate, line) => {
console.log({ aggregate });
return scrape_profile(line)
.then(result => {
aggregate.push(result);
return aggregate;
});
}, []);
}
runBatch(0).then(/* batch 1 done*/)
runBatch(1).then(/* batch 2 done*/)
runBatch(2).then(/* batch 3 done*/)
runBatch(3).then(/* batch 4 done*/)
runBatch(4).then(/* batch 5 done*/)
// ... preferably use a for loop to do this
This is a full example; you should be able to run this locally (with a file called 'test.txt' that has any contents), for each line it will spend a random amount of time generating a random number; it runs 5 separate batches. You need to change the value of BATCHES to reflect the number of batches you need
you can use setinterval for 30 seconds to perform a loop of scrape_profile for 5 times, your loop is using the number of lines which is like you specified 1000 lines without stopping, then make a loop for 5 times and put it in a function that you call with setinterval and of course keep the index of the current line as a variable too to continue from where you left off

Make several requests to an API that can only handle 20 request a minute

I've got a method that returns a promise and internally that method makes a call to an API which can only have 20 requests every minute. The problem is that I have a large array of objects (around 300) and I would like to make a call to the API for each one of them.
At the moment I have the following code:
const bigArray = [.....];
Promise.all(bigArray.map(apiFetch)).then((data) => {
...
});
But it doesnt handle the timing constraint. I was hoping I could use something like _.chunk and _.debounce from lodash but I can't wrap my mind around it. Could anyone help me out ?
If you can use the Bluebird promise library, it has a concurrency feature built in that lets you manage a group of async operations to at most N in flight at a time.
var Promise = require('bluebird');
const bigArray = [....];
Promise.map(bigArray, apiFetch, {concurrency: 20}).then(function(data) {
// all done here
});
The nice thing about this interface is that it will keep 20 requests in flight. It will start up 20, then each time one finishes, it will start another. So, this is a potentially more efficient than sending 20, waiting for all to finish, sending 20 more, etc...
This also provides the results in the exact same order as bigArray so you can identify which result goes with which request.
You could, of course, code this yourself with generic promises using a counter, but since it is already built in the the Bluebird library, I thought I'd recommend that way.
The Async library also has a similar concurrency control though it is obviously not promise based.
Here's a hand-coded version using only ES6 promises that maintains result order and keeps 20 requests in flight at all time (until there aren't 20 left) for maximum throughput:
function pMap(array, fn, limit) {
return new Promise(function(resolve, reject) {
var index = 0, cnt = 0, stop = false, results = new Array(array.length);
function run() {
while (!stop && index < array.length && cnt < limit) {
(function(i) {
++cnt;
++index;
fn(array[i]).then(function(data) {
results[i] = data;
--cnt;
// see if we are done or should run more requests
if (cnt === 0 && index === array.length) {
resolve(results);
} else {
run();
}
}, function(err) {
// set stop flag so no more requests will be sent
stop = true;
--cnt;
reject(err);
});
})(index);
}
}
run();
});
}
pMap(bigArray, apiFetch, 20).then(function(data) {
// all done here
}, function(err) {
// error here
});
Working demo here: http://jsfiddle.net/jfriend00/v98735uu/
You could send 1 block of 20 requests every minute or space them out 1 request every 3 seconds (latter probably preferred by the API owners).
function rateLimitedRequests(array, chunkSize) {
var delay = 3000 * chunkSize;
var remaining = array.length;
var promises = [];
var addPromises = function(newPromises) {
Array.prototype.push.apply(promises, newPromises);
if (remaining -= newPromises.length == 0) {
Promise.all(promises).then((data) => {
... // do your thing
});
}
};
(function request() {
addPromises(array.splice(0, chunkSize).map(apiFetch));
if (array.length) {
setTimeout(request, delay);
}
})();
}
To call 1 every 3 seconds:
rateLimitedRequests(bigArray, 1);
Or 20 every minute:
rateLimitedRequests(bigArray, 20);
If you prefer to use _.chunk and _.debounce1 _.throttle:
function rateLimitedRequests(array, chunkSize) {
var delay = 3000 * chunkSize;
var remaining = array.length;
var promises = [];
var addPromises = function(newPromises) {
Array.prototype.push.apply(promises, newPromises);
if (remaining -= newPromises.length == 0) {
Promise.all(promises).then((data) => {
... // do your thing
});
}
};
var chunks = _.chunk(array, chunkSize);
var throttledFn = _.throttle(function() {
addPromises(chunks.pop().map(apiFetch));
}, delay, {leading: true});
for (var i = 0; i < chunks.length; i++) {
throttledFn();
}
}
1You probably want _.throttle since it executes each function call after a delay whereas _.debounce groups multiple calls into one call. See this article linked from the docs
Debounce: Think of it as "grouping multiple events in one". Imagine that you go home, enter in the elevator, doors are closing... and suddenly your neighbor appears in the hall and tries to jump on the elevator. Be polite! and open the doors for him: you are debouncing the elevator departure. Consider that the same situation can happen again with a third person, and so on... probably delaying the departure several minutes.
Throttle: Think of it as a valve, it regulates the flow of the executions. We can determine the maximum number of times a function can be called in certain time. So in the elevator analogy.. you are polite enough to let people in for 10 secs, but once that delay passes, you must go!

Categories