determining the end of an asynchronous loop - javascript

In the code attached i am looking to run the function returnFile after all database querys have run, but the problem is that i am unable to tell which response will be the last from inside of the query response, so what I was thinking was to separate the loops and just have the last callback run the returnFile function but that would dramatically slow things down.
for (var i = 0, len = articleRevisionData.length; i < len; i++) {
tagNames=[]
console.log("step 1, "+articleRevisionData.length+" i:"+i);
if(articleRevisionData[i]["tags"]){
for (var x = 0, len2 = articleRevisionData[i]["tags"].length; x < len2; x++) {
console.log("step 2, I: "+i+" x: "+x+articleRevisionData[i]["articleID"])
tagData.find({"tagID":articleRevisionData[i]["tags"][x]}).toArray( function(iteration,len3,iterationA,error, resultC){
console.log("step 3, I: "+i+" x: "+x+" iteration: "+iteration+" len3: "+len3)
if(resultC.length>0){
tagNames.push(resultC[0]["tagName"]);
}
//console.log("iteration: "+iteration+" len: "+len3)
if(iteration+1==len3){
console.log("step 4, iterationA: "+iterationA+" I: "+iteration)
articleRevisionData[iterationA]["tags"]=tagNames.join(",");
}
}.bind(tagData,x,len2,i));
}
}
if(i==len-1){
templateData={
name:userData["firstName"]+" "+userData["lastName"],
articleData:articleData,
articleRevisionData:articleRevisionData
}
returnFile(res,"/usr/share/node/Admin/anonymousAttempt2/Admin/Articles/home.html",templateData);
}
}

It is rarely a good idea to call an asynchronous function from within a loop since, as you've discovered, you cannot know when all the calls complete (which is the nature of asynchrony.)
In your example, it's important to note that all of your async calls run concurrently, which can consume more system resources than you might wish.
I've found that the best solution to these kinds of problems is to use events to manage execution flow, as in:
const EventEmitter = require('events');
const emitter = new EventEmitter();
let iterations = articleRevisionData.length;
// start up state
emitter.on('start', () => {
// do setup here
emitter.emit('next_iteration');
});
// loop state
emitter.on('next_iteration', () => {
if(iterations--) {
asyncFunc(args, (err,result) => {
if(err) {
emitter.emit('error', err);
return;
}
// do something with result
emitter.emit('next_iteration');
});
return;
}
// no more iterations
emitter.emit('complete');
});
// error state
emitter.on('error', (e) => {
console.error(`processing failed on iteration ${iterations+1}: ${e.toString()}`);
});
// processing complete state
emitter.on('complete', () => {
// do something with all results
console.log('all iterations complete');
});
// start processing
emitter.emit('start');
Note how simple and clean this code is, lacking any "callback hell", and how easy it is to visualize program flow.
It is also worth noting that you can express every kind of execution control (doWhile, doUntil, map/reduce, queue workers, etc.) using events and since event handling is at the very core of Node, you'll find using them in this manner will outperform most, if not all, other solutions.
See Node Events for more information on event handling in Node.

Related

Promise not running asynchronously [duplicate]

How can I make a simple, non-block Javascript function call? For example:
//begin the program
console.log('begin');
nonBlockingIncrement(10000000);
console.log('do more stuff');
//define the slow function; this would normally be a server call
function nonBlockingIncrement(n){
var i=0;
while(i<n){
i++;
}
console.log('0 incremented to '+i);
}
outputs
"beginPage"
"0 incremented to 10000000"
"do more stuff"
How can I form this simple loop to execute asynchronously and output the results via a callback function? The idea is to not block "do more stuff":
"beginPage"
"do more stuff"
"0 incremented to 10000000"
I've tried following tutorials on callbacks and continuations, but they all seem to rely on external libraries or functions. None of them answer the question in a vacuum: how does one write Javascript code to be non-blocking!?
I have searched very hard for this answer before asking; please don't assume I didn't look. Everything I found is Node.js specific ([1], [2], [3], [4], [5]) or otherwise specific to other functions or libraries ([6], [7], [8], [9], [10], [11]), notably JQuery and setTimeout(). Please help me write non-blocking code using Javascript, not Javascript-written tools like JQuery and Node. Kindly reread the question before marking it as duplicate.
To make your loop non-blocking, you must break it into sections and allow the JS event processing loop to consume user events before carrying on to the next section.
The easiest way to achieve this is to do a certain amount of work, and then use setTimeout(..., 0) to queue the next chunk of work. Crucially, that queueing allows the JS event loop to process any events that have been queued in the meantime before going on to the next piece of work:
function yieldingLoop(count, chunksize, callback, finished) {
var i = 0;
(function chunk() {
var end = Math.min(i + chunksize, count);
for ( ; i < end; ++i) {
callback.call(null, i);
}
if (i < count) {
setTimeout(chunk, 0);
} else {
finished.call(null);
}
})();
}
with usage:
yieldingLoop(1000000, 1000, function(i) {
// use i here
}, function() {
// loop done here
});
See http://jsfiddle.net/alnitak/x3bwjjo6/ for a demo where the callback function just sets a variable to the current iteration count, and a separate setTimeout based loop polls the current value of that variable and updates the page with its value.
SetTimeout with callbacks is the way to go. Though, understand your function scopes are not the same as in C# or another multi-threaded environment.
Javascript does not wait for your function's callback to finish.
If you say:
function doThisThing(theseArgs) {
setTimeout(function (theseArgs) { doThatOtherThing(theseArgs); }, 1000);
alert('hello world');
}
Your alert will fire before the function you passed will.
The difference being that alert blocked the thread, but your callback did not.
There are in general two ways to do this as far as I know. One is to use setTimeout (or requestAnimationFrame if you are doing this in a supporting environment). #Alnitak shown how to do this in another answer. Another way is to use a web worker to finish your blocking logic in a separate thread, so that the main UI thread is not blocked.
Using requestAnimationFrame or setTimeout:
//begin the program
console.log('begin');
nonBlockingIncrement(100, function (currentI, done) {
if (done) {
console.log('0 incremented to ' + currentI);
}
});
console.log('do more stuff');
//define the slow function; this would normally be a server call
function nonBlockingIncrement(n, callback){
var i = 0;
function loop () {
if (i < n) {
i++;
callback(i, false);
(window.requestAnimationFrame || window.setTimeout)(loop);
}
else {
callback(i, true);
}
}
loop();
}
Using web worker:
/***** Your worker.js *****/
this.addEventListener('message', function (e) {
var i = 0;
while (i < e.data.target) {
i++;
}
this.postMessage({
done: true,
currentI: i,
caller: e.data.caller
});
});
/***** Your main program *****/
//begin the program
console.log('begin');
nonBlockingIncrement(100, function (currentI, done) {
if (done) {
console.log('0 incremented to ' + currentI);
}
});
console.log('do more stuff');
// Create web worker and callback register
var worker = new Worker('./worker.js'),
callbacks = {};
worker.addEventListener('message', function (e) {
callbacks[e.data.caller](e.data.currentI, e.data.done);
});
//define the slow function; this would normally be a server call
function nonBlockingIncrement(n, callback){
const caller = 'nonBlockingIncrement';
callbacks[caller] = callback;
worker.postMessage({
target: n,
caller: caller
});
}
You cannot run the web worker solution as it requires a separate worker.js file to host worker logic.
You cannot execute Two loops at the same time, remember that JS is single thread.
So, doing this will never work
function loopTest() {
var test = 0
for (var i; i<=100000000000, i++) {
test +=1
}
return test
}
setTimeout(()=>{
//This will block everything, so the second won't start until this loop ends
console.log(loopTest())
}, 1)
setTimeout(()=>{
console.log(loopTest())
}, 1)
If you want to achieve multi thread you have to use Web Workers, but they have to have a separated js file and you only can pass objects to them.
But, I've managed to use Web Workers without separated files by genering Blob files and i can pass them callback functions too.
//A fileless Web Worker
class ChildProcess {
//#param {any} ags, Any kind of arguments that will be used in the callback, functions too
constructor(...ags) {
this.args = ags.map(a => (typeof a == 'function') ? {type:'fn', fn:a.toString()} : a)
}
//#param {function} cb, To be executed, the params must be the same number of passed in the constructor
async exec(cb) {
var wk_string = this.worker.toString();
wk_string = wk_string.substring(wk_string.indexOf('{') + 1, wk_string.lastIndexOf('}'));
var wk_link = window.URL.createObjectURL( new Blob([ wk_string ]) );
var wk = new Worker(wk_link);
wk.postMessage({ callback: cb.toString(), args: this.args });
var resultado = await new Promise((next, error) => {
wk.onmessage = e => (e.data && e.data.error) ? error(e.data.error) : next(e.data);
wk.onerror = e => error(e.message);
})
wk.terminate(); window.URL.revokeObjectURL(wk_link);
return resultado
}
worker() {
onmessage = async function (e) {
try {
var cb = new Function(`return ${e.data.callback}`)();
var args = e.data.args.map(p => (p.type == 'fn') ? new Function(`return ${p.fn}`)() : p);
try {
var result = await cb.apply(this, args); //If it is a promise or async function
return postMessage(result)
} catch (e) { throw new Error(`CallbackError: ${e}`) }
} catch (e) { postMessage({error: e.message}) }
}
}
}
setInterval(()=>{console.log('Not blocked code ' + Math.random())}, 1000)
console.log("starting blocking synchronous code in Worker")
console.time("\nblocked");
var proc = new ChildProcess(blockCpu, 43434234);
proc.exec(function(block, num) {
//This will block for 10 sec, but
block(10000) //This blockCpu function is defined below
return `\n\nbla bla ${num}\n` //Captured in the resolved promise
}).then(function (result){
console.timeEnd("\nblocked")
console.log("End of blocking code", result)
})
.catch(function(error) { console.log(error) })
//random blocking function
function blockCpu(ms) {
var now = new Date().getTime();
var result = 0
while(true) {
result += Math.random() * Math.random();
if (new Date().getTime() > now +ms)
return;
}
}
For very long tasks, a Web-Worker should be preferred, however for small-enough tasks (< a couple of seconds) or for when you can't move the task to a Worker (e.g because you needs to access the DOM or whatnot, Alnitak's solution of splitting the code in chunks is the way to go.
Nowadays, this can be rewritten in a cleaner way thanks to async/await syntax.
Also, instead of waiting for setTimeout() (which is delayed to at least 1ms in node-js and to 4ms everywhere after the 5th recursive call), it's better to use a MessageChannel.
So this gives us
const waitForNextTask = () => {
const { port1, port2 } = waitForNextTask.channel ??= new MessageChannel();
return new Promise( (res) => {
port1.addEventListener("message", () => res(), { once: true } );
port1.start();
port2.postMessage("");
} );
};
async function doSomethingSlow() {
const chunk_size = 10000;
// do something slow, like counting from 0 to Infinity
for (let i = 0; i < Infinity; i++ ) {
// we've done a full chunk, let the event-loop loop
if( i % chunk_size === 0 ) {
log.textContent = i; // just for demo, to check we're really doing something
await waitForNextTask();
}
}
console.log("Ah! Did it!");
}
console.log("starting my slow computation");
doSomethingSlow();
console.log("started my slow computation");
setTimeout(() => console.log("my slow computation is probably still running"), 5000);
<pre id="log"></pre>
Using ECMA async function it's very easy to write non-blocking async code, even if it performs CPU-bound operations. Let's do this on a typical academic task - Fibonacci calculation for the incredible huge value.
All you need is to insert an operation that allows the event loop to be reached from time to time. Using this approach, you will never freeze the user interface or I/O.
Basic implementation:
const fibAsync = async (n) => {
let lastTimeCalled = Date.now();
let a = 1n,
b = 1n,
sum,
i = n - 2;
while (i-- > 0) {
sum = a + b;
a = b;
b = sum;
if (Date.now() - lastTimeCalled > 15) { // Do we need to poll the eventloop?
lastTimeCalled = Date.now();
await new Promise((resolve) => setTimeout(resolve, 0)); // do that
}
}
return b;
};
And now we can use it (Live Demo):
let ticks = 0;
console.warn("Calulation started");
fibAsync(100000)
.then((v) => console.log(`Ticks: ${ticks}\nResult: ${v}`), console.warn)
.finally(() => {
clearTimeout(timer);
});
const timer = setInterval(
() => console.log("timer tick - eventloop is not freezed", ticks++),
0
);
As we can see, the timer is running normally, which indicates the event loop is not blocking.
I published an improved implementation of these helpers as antifreeze2 npm package. It uses setImmediate internally, so to get the maximum performance you need to import setImmediate polyfill for environments without native support.
Live Demo
import { antifreeze, isNeeded } from "antifreeze2";
const fibAsync = async (n) => {
let a = 1n,
b = 1n,
sum,
i = n - 2;
while (i-- > 0) {
sum = a + b;
a = b;
b = sum;
if (isNeeded()) {
await antifreeze();
}
}
return b;
};
If you are using jQuery, I created a deferred implementation of Alnitak's answer
function deferredEach (arr, batchSize) {
var deferred = $.Deferred();
var index = 0;
function chunk () {
var lastIndex = Math.min(index + batchSize, arr.length);
for(;index<lastIndex;index++){
deferred.notify(index, arr[index]);
}
if (index >= arr.length) {
deferred.resolve();
} else {
setTimeout(chunk, 0);
}
};
setTimeout(chunk, 0);
return deferred.promise();
}
Then you'll be able to use the returned promise to manage the progress and done callback:
var testArray =["Banana", "Orange", "Apple", "Mango"];
deferredEach(testArray, 2).progress(function(index, item){
alert(item);
}).done(function(){
alert("Done!");
})
I managed to get an extremely short algorithm using functions. Here is an example:
let l=($,a,f,r)=>{f(r||0),$((r=a(r||0))||0)&&l($,a,f,r)};
l
(i => i < 4, i => i+1, console.log)
/*
output:
0
1
2
3
*/
I know this looks very complicated, so let me explain what is really going on here.
Here is a slightly simplified version of the l function.
let l_smpl = (a,b,c,d) => {c(d||0);d=b(d||0),a(d||0)&&l_smpl(a,b,c,d)||0}
First step in the loop, l_smpl calls your callback and passes in d - the index. If d is undefined, as it would be on the first call, it changes it to 0.
Next, it updates d by calling your updater function and setting d to the result. In our case, the updater function would add 1 to the index.
The next step checks if your condition is met by calling the first function and checking if the value is true meaning the loop is not done. If so, it calls the function again, or otherwise, it returns 0 to end the loop.

The program execute commands not by the logic order they are put in [duplicate]

Suppose you need to do some operations that depend on some temp file. Since
we're talking about Node here, those operations are obviously asynchronous.
What is the idiomatic way to wait for all operations to finish in order to
know when the temp file can be deleted?
Here is some code showing what I want to do:
do_something(tmp_file_name, function(err) {});
do_something_other(tmp_file_name, function(err) {});
fs.unlink(tmp_file_name);
But if I write it this way, the third call can be executed before the first two
get a chance to use the file. I need some way to guarantee that the first two
calls already finished (invoked their callbacks) before moving on without nesting
the calls (and making them synchronous in practice).
I thought about using event emitters on the callbacks and registering a counter
as receiver. The counter would receive the finished events and count how many
operations were still pending. When the last one finished, it would delete the
file. But there is the risk of a race condition and I'm not sure this is
usually how this stuff is done.
How do Node people solve this kind of problem?
Update:
Now I would advise to have a look at:
Promises
The Promise object is used for deferred and asynchronous computations.
A Promise represents an operation that hasn't completed yet, but is
expected in the future.
A popular promises library is bluebird. A would advise to have a look at why promises.
You should use promises to turn this:
fs.readFile("file.json", function (err, val) {
if (err) {
console.error("unable to read file");
}
else {
try {
val = JSON.parse(val);
console.log(val.success);
}
catch (e) {
console.error("invalid json in file");
}
}
});
Into this:
fs.readFileAsync("file.json").then(JSON.parse).then(function (val) {
console.log(val.success);
})
.catch(SyntaxError, function (e) {
console.error("invalid json in file");
})
.catch(function (e) {
console.error("unable to read file");
});
generators: For example via co.
Generator based control flow goodness for nodejs and the browser,
using promises, letting you write non-blocking code in a nice-ish way.
var co = require('co');
co(function *(){
// yield any promise
var result = yield Promise.resolve(true);
}).catch(onerror);
co(function *(){
// resolve multiple promises in parallel
var a = Promise.resolve(1);
var b = Promise.resolve(2);
var c = Promise.resolve(3);
var res = yield [a, b, c];
console.log(res);
// => [1, 2, 3]
}).catch(onerror);
// errors can be try/catched
co(function *(){
try {
yield Promise.reject(new Error('boom'));
} catch (err) {
console.error(err.message); // "boom"
}
}).catch(onerror);
function onerror(err) {
// log any uncaught errors
// co will not throw any errors you do not handle!!!
// HANDLE ALL YOUR ERRORS!!!
console.error(err.stack);
}
If I understand correctly I think you should have a look at the very good async library. You should especially have a look at the series. Just a copy from the snippets from github page:
async.series([
function(callback){
// do some stuff ...
callback(null, 'one');
},
function(callback){
// do some more stuff ...
callback(null, 'two');
},
],
// optional callback
function(err, results){
// results is now equal to ['one', 'two']
});
// an example using an object instead of an array
async.series({
one: function(callback){
setTimeout(function(){
callback(null, 1);
}, 200);
},
two: function(callback){
setTimeout(function(){
callback(null, 2);
}, 100);
},
},
function(err, results) {
// results is now equals to: {one: 1, two: 2}
});
As a plus this library can also run in the browser.
The simplest way increment an integer counter when you start an async operation and then, in the callback, decrement the counter. Depending on the complexity, the callback could check the counter for zero and then delete the file.
A little more complex would be to maintain a list of objects, and each object would have any attributes that you need to identify the operation (it could even be the function call) as well as a status code. The callbacks would set the status code to completed.
Then you would have a loop that waits (using process.nextTick) and checks to see if all tasks are completed. The advantage of this method over the counter, is that if it is possible for all outstanding tasks to complete, before all tasks are issued, the counter technique would cause you to delete the file prematurely.
// simple countdown latch
function CDL(countdown, completion) {
this.signal = function() {
if(--countdown < 1) completion();
};
}
// usage
var latch = new CDL(10, function() {
console.log("latch.signal() was called 10 times.");
});
There is no "native" solution, but there are a million flow control libraries for node. You might like Step:
Step(
function(){
do_something(tmp_file_name, this.parallel());
do_something_else(tmp_file_name, this.parallel());
},
function(err) {
if (err) throw err;
fs.unlink(tmp_file_name);
}
)
Or, as Michael suggested, counters could be a simpler solution. Take a look at this semaphore mock-up. You'd use it like this:
do_something1(file, queue('myqueue'));
do_something2(file, queue('myqueue'));
queue.done('myqueue', function(){
fs.unlink(file);
});
I'd like to offer another solution that utilizes the speed and efficiency of the programming paradigm at the very core of Node: events.
Everything you can do with Promises or modules designed to manage flow-control, like async, can be accomplished using events and a simple state-machine, which I believe offers a methodology that is, perhaps, easier to understand than other options.
For example assume you wish to sum the length of multiple files in parallel:
const EventEmitter = require('events').EventEmitter;
// simple event-driven state machine
const sm = new EventEmitter();
// running state
let context={
tasks: 0, // number of total tasks
active: 0, // number of active tasks
results: [] // task results
};
const next = (result) => { // must be called when each task chain completes
if(result) { // preserve result of task chain
context.results.push(result);
}
// decrement the number of running tasks
context.active -= 1;
// when all tasks complete, trigger done state
if(!context.active) {
sm.emit('done');
}
};
// operational states
// start state - initializes context
sm.on('start', (paths) => {
const len=paths.length;
console.log(`start: beginning processing of ${len} paths`);
context.tasks = len; // total number of tasks
context.active = len; // number of active tasks
sm.emit('forEachPath', paths); // go to next state
});
// start processing of each path
sm.on('forEachPath', (paths)=>{
console.log(`forEachPath: starting ${paths.length} process chains`);
paths.forEach((path) => sm.emit('readPath', path));
});
// read contents from path
sm.on('readPath', (path) => {
console.log(` readPath: ${path}`);
fs.readFile(path,(err,buf) => {
if(err) {
sm.emit('error',err);
return;
}
sm.emit('processContent', buf.toString(), path);
});
});
// compute length of path contents
sm.on('processContent', (str, path) => {
console.log(` processContent: ${path}`);
next(str.length);
});
// when processing is complete
sm.on('done', () => {
const total = context.results.reduce((sum,n) => sum + n);
console.log(`The total of ${context.tasks} files is ${total}`);
});
// error state
sm.on('error', (err) => { throw err; });
// ======================================================
// start processing - ok, let's go
// ======================================================
sm.emit('start', ['file1','file2','file3','file4']);
Which will output:
start: beginning processing of 4 paths
forEachPath: starting 4 process chains
readPath: file1
readPath: file2
processContent: file1
readPath: file3
processContent: file2
processContent: file3
readPath: file4
processContent: file4
The total of 4 files is 4021
Note that the ordering of the process chain tasks is dependent upon system load.
You can envision the program flow as:
start -> forEachPath -+-> readPath1 -> processContent1 -+-> done
+-> readFile2 -> processContent2 -+
+-> readFile3 -> processContent3 -+
+-> readFile4 -> processContent4 -+
For reuse, it would be trivial to create a module to support the various flow-control patterns, i.e. series, parallel, batch, while, until, etc.
The simplest solution is to run the do_something* and unlink in sequence as follows:
do_something(tmp_file_name, function(err) {
do_something_other(tmp_file_name, function(err) {
fs.unlink(tmp_file_name);
});
});
Unless, for performance reasons, you want to execute do_something() and do_something_other() in parallel, I suggest to keep it simple and go this way.
Wait.for https://github.com/luciotato/waitfor
using Wait.for:
var wait=require('wait.for');
...in a fiber...
wait.for(do_something,tmp_file_name);
wait.for(do_something_other,tmp_file_name);
fs.unlink(tmp_file_name);
With pure Promises it could be a bit more messy, but if you use Deferred Promises then it's not so bad:
Install:
npm install --save #bitbar/deferred-promise
Modify your code:
const DeferredPromise = require('#bitbar/deferred-promise');
const promises = [
new DeferredPromise(),
new DeferredPromise()
];
do_something(tmp_file_name, (err) => {
if (err) {
promises[0].reject(err);
} else {
promises[0].resolve();
}
});
do_something_other(tmp_file_name, (err) => {
if (err) {
promises[1].reject(err);
} else {
promises[1].resolve();
}
});
Promise.all(promises).then( () => {
fs.unlink(tmp_file_name);
});

Make several requests to an API that can only handle 20 request a minute

I've got a method that returns a promise and internally that method makes a call to an API which can only have 20 requests every minute. The problem is that I have a large array of objects (around 300) and I would like to make a call to the API for each one of them.
At the moment I have the following code:
const bigArray = [.....];
Promise.all(bigArray.map(apiFetch)).then((data) => {
...
});
But it doesnt handle the timing constraint. I was hoping I could use something like _.chunk and _.debounce from lodash but I can't wrap my mind around it. Could anyone help me out ?
If you can use the Bluebird promise library, it has a concurrency feature built in that lets you manage a group of async operations to at most N in flight at a time.
var Promise = require('bluebird');
const bigArray = [....];
Promise.map(bigArray, apiFetch, {concurrency: 20}).then(function(data) {
// all done here
});
The nice thing about this interface is that it will keep 20 requests in flight. It will start up 20, then each time one finishes, it will start another. So, this is a potentially more efficient than sending 20, waiting for all to finish, sending 20 more, etc...
This also provides the results in the exact same order as bigArray so you can identify which result goes with which request.
You could, of course, code this yourself with generic promises using a counter, but since it is already built in the the Bluebird library, I thought I'd recommend that way.
The Async library also has a similar concurrency control though it is obviously not promise based.
Here's a hand-coded version using only ES6 promises that maintains result order and keeps 20 requests in flight at all time (until there aren't 20 left) for maximum throughput:
function pMap(array, fn, limit) {
return new Promise(function(resolve, reject) {
var index = 0, cnt = 0, stop = false, results = new Array(array.length);
function run() {
while (!stop && index < array.length && cnt < limit) {
(function(i) {
++cnt;
++index;
fn(array[i]).then(function(data) {
results[i] = data;
--cnt;
// see if we are done or should run more requests
if (cnt === 0 && index === array.length) {
resolve(results);
} else {
run();
}
}, function(err) {
// set stop flag so no more requests will be sent
stop = true;
--cnt;
reject(err);
});
})(index);
}
}
run();
});
}
pMap(bigArray, apiFetch, 20).then(function(data) {
// all done here
}, function(err) {
// error here
});
Working demo here: http://jsfiddle.net/jfriend00/v98735uu/
You could send 1 block of 20 requests every minute or space them out 1 request every 3 seconds (latter probably preferred by the API owners).
function rateLimitedRequests(array, chunkSize) {
var delay = 3000 * chunkSize;
var remaining = array.length;
var promises = [];
var addPromises = function(newPromises) {
Array.prototype.push.apply(promises, newPromises);
if (remaining -= newPromises.length == 0) {
Promise.all(promises).then((data) => {
... // do your thing
});
}
};
(function request() {
addPromises(array.splice(0, chunkSize).map(apiFetch));
if (array.length) {
setTimeout(request, delay);
}
})();
}
To call 1 every 3 seconds:
rateLimitedRequests(bigArray, 1);
Or 20 every minute:
rateLimitedRequests(bigArray, 20);
If you prefer to use _.chunk and _.debounce1 _.throttle:
function rateLimitedRequests(array, chunkSize) {
var delay = 3000 * chunkSize;
var remaining = array.length;
var promises = [];
var addPromises = function(newPromises) {
Array.prototype.push.apply(promises, newPromises);
if (remaining -= newPromises.length == 0) {
Promise.all(promises).then((data) => {
... // do your thing
});
}
};
var chunks = _.chunk(array, chunkSize);
var throttledFn = _.throttle(function() {
addPromises(chunks.pop().map(apiFetch));
}, delay, {leading: true});
for (var i = 0; i < chunks.length; i++) {
throttledFn();
}
}
1You probably want _.throttle since it executes each function call after a delay whereas _.debounce groups multiple calls into one call. See this article linked from the docs
Debounce: Think of it as "grouping multiple events in one". Imagine that you go home, enter in the elevator, doors are closing... and suddenly your neighbor appears in the hall and tries to jump on the elevator. Be polite! and open the doors for him: you are debouncing the elevator departure. Consider that the same situation can happen again with a third person, and so on... probably delaying the departure several minutes.
Throttle: Think of it as a valve, it regulates the flow of the executions. We can determine the maximum number of times a function can be called in certain time. So in the elevator analogy.. you are polite enough to let people in for 10 secs, but once that delay passes, you must go!

Populating async array with a function called right before.

var request = require('request'),
requests = [],
values = [],
request("url1", function());
function() {
.....
for (x in list){
requests.push(requestFunction(x));
}
}
requestFunction(x){
request("url2", function (e,r,b) {
....
return function(callback) {
values[i] = b
}
});
}
async.parallel(requests, function (allResults) {
// values array is ready at this point
// the data should also be available in the allResults array
console.log(values);
});
I new to node. Issue is that the request needs to be called to populate the requests callback array. But the issue is the async.parallel will run before the requests array is full and need run all the callbacks. Where do I move this async so it runs after the requests array is full?
Asynchronous programming is all about chaining blocks. This allows node to efficiently run its event queue, while ensuring that your steps are done in order. For example, here's a query from a web app I wrote:
app.get("/admin", isLoggedIn, isVerified, isAdmin, function (req, res) {
User.count({}, function (err, users) {
if (err) throw err;
User.count({"verified.isVerified" : true}, function (err2, verifiedUsers) {
if (err2) throw err2;
Course.count({}, function (err3, courses) {
// and this continues on and on — the admin page
// has a lot of information on all the documents in the database
})
})
})
})
Notice how I chained function calls inside of one another. Course.count({}, ...) could only be called once User.count({"verified.isVerified" : true}, ...) was called. This means the i/o is never blocked and the /admin page is never rendered without the required information.
You didn't really give enough information regarding your problem (so there might be a better way to fix it), but I think you could, for now, do this:
var request = require('request'),
requests = [],
values = [],
length; // a counter to store the number of times to run the loop
request("url1", function() {
length = Object.keys(list).length;
// referring to the list below;
// make sure list is available at this level of scope
for (var x in list){
requests.push(requestFunction(x));
length--;
if (length == 0) {
async.parallel(requests, function (allResults) {
console.log(values); // prints the values array
});
}
}
}
function requestFunction(x) {
request("url2", function (e,r,b) {
values[i] = b;
return b;
}
}
I am assuming that requestFunction() takes a while to load, which is why async.parallel is running before the for (var x in list) loop finishes. To force async.parallel to run after the loop finishes, you'll need a counter.
var length = Object.keys(list).length;
This returns the number of keys in the list associative array (aka object). Now, every time you run through the for loop, you decrement length. When length == 0, you then run your async.parallel process.
edit: You could also write the requests.push() part as:
requests.push(
(function () {
request("url2", function (e,r,b) {
values[i] = b;
return b;
}
})()
);
I think it's redundant to store b in both values and requests, but I have kept it as you had it.

MongoDB and Node js Asynchronous programming

I am trying to solve an exam problem, so I cannot post my exam code as it is. So I have simplified such that it addresses the core concept that I do not understand. Basically, I do not know how to slow down node's asynchronous execution so that my mongo code can catch up with it. Here is the code:
MongoClient.connect('mongodb://localhost:27017/somedb', function(err, db) {
if (err) throw err;
var orphans = [];
for (var i; i < 100000; i++) {
var query = { 'images' : i };
db.collection('albums').findOne(query, function(err, doc_album) {
if(err) throw err;
if (doc_album === null) {
orphans.push(i);
}
});
}
console.dir(orphans.length);
return db.close();
});
So I am trying to create an array of those images who do not match my query criteria. I end up with a orphans.length value of 0 since Node does not wait for the callbacks to finish. How can I modify the code such that the callbacks finish executing before I count the number of images in the array that did not meet my query criteria?
Thanks in advance for your time.
Bharat
I assume you want to do 100000 parallel DB calls. To "wait" 10000 calls completion in each call callback we increase finished calls counter and invoke main callback when last one finished. Note that very common mistake here is to use for loop variable as a closure inside callback. This does not work as expected as all 10000 handlers scheduled first and by the time first is executed loop variable is of the same, maximum value.
function getOrphans(cb) {
MongoClient.connect('mongodb://localhost:27017/somedb', function(err, db) {
if (err) cb(err);
var orphans = [];
var numResponses = 0;
var maxIndex = 100000
for (var i = 0; i < maxIndex; i++) {
// problem: by the time you get reply "i" would be 100000.
// closure variable changed to function argument:
(function(index) {
var query = { 'images' : index };
db.collection('albums').findOne(query, function(err, doc_album) {
numResponses++;
if(err) cb(err);
if (doc_album === null) {
orphans.push(index);
}
if (numResponses == maxIndex) {
db.close();
cb(null, orphans);
}
});
})(i); // this is "immediately executed function
}
});
}
getOrphans(function(err, o) {
if (err)
return console.log('error:', err);
console.log(o.length);
});
Im not suggesting this is the best way to handle this specific problem in Mongo, but if you need to wait to the DB to reply before continuing then just use the callback to start next request.
This is not obvious at first, but you can refer to the result processing function inside the function itself:
var i = 0;
var mycback = function(err, doc_album) {
// ... process i-th result ...
if (++i < 100000) {
db.collections("album").findOne({'images': i}, mycback);
} else {
// request is complete, "return" result
result_cback(null, res);
}
};
db.collections('album').findOne({'images': 0}, mycback);
This also means that your function itself will be async (i.e. will want a result_cback parameter to call with the result instead of using return).
Writing a sync function that calls an async one is just not possible.
You cannot "wait" for an event in Javascript... you must set up an handler for the result and then terminate.
Waiting for an event is done in event-based processing by writing a "nested event loop" and this is for example how message boxes are handled in most GUI frameworks. This is a capability that Javascript designers didn't want to give to programmers (not really sure why, though).
Since you know it does not wait for the call to come back. You can do the console.dir inside your callback function, this should work (although I haven't tested it)
db.collection('albums').findOne(query, function(err, doc_album) {
if(err) throw err;
if (doc_album === null) {
orphans.push(i);
}
console.dir(orphans.length);
});
You don't need to slow anything down. If you are simply trying to load 100,000 images from the albums collection, you could consider using the async framework. This will let you assign tasks until the job is complete.
Also, you probably don't want request 100,000 records one-by-one. Instead, you probably want to page them.

Categories