Batch requests in Node.js - javascript

My program is communicating with a web service that only accepts ~10 requests per second. From time to time, my program sends 100+ concurrent requests to the web service, causing my program to crash.
How do I limit concurrent requests in Node.js to 5 per second? Im using the request library.
// IF EVENT AND SENDER
if(data.sender[0].events && data.sender[0].events.length > 0) {
// FIND ALL EVENTS
for(var i = 0; i < data.sender[0].events.length; i++) {
// IF TYPE IS "ADDED"
if(data.sender[0].events[i].type == "added") {
switch (data.sender[0].events[i].link.rel) {
case "contact" :
batch("added", data.sender[0].events[i].link.href);
//_initContacts(data.sender[0].events[i].link.href);
break;
}
// IF TYPE IS "UPDATED"
} else if(data.sender[0].events[i].type == "updated") {
switch (data.sender[0].events[i].link.rel){
case "contactPresence" :
batch("updated", data.sender[0].events[i].link.href);
//_getContactPresence(data.sender[0].events[i].link.href);
break;
case "contactNote" :
batch("updated", data.sender[0].events[i].link.href);
// _getContactNote(data.sender[0].events[i].link.href);
break;
case "contactLocation" :
batch("updated", data.sender[0].events[i].link.href);
// _getContactLocation(data.sender[0].events[i].link.href);
break;
case "presenceSubscription" :
batch("updated", data.sender[0].events[i].link.href);
// _extendPresenceSubscription(data.sender[0].events[i].link.href);
break;
}
}
};
And then the homegrown batch method:
var updated = [];
var added = [];
var batch = function(type, url){
console.log("batch called");
if (type === "added"){
console.log("Added batched");
added.push(url);
if (added.length > 5) {
setTimeout(added.forEach(function(req){
_initContacts(req);
}), 2000);
added = [];
}
}
else if (type === "updated"){
console.log("Updated batched");
updated.push(url);
console.log("Updated length is : ", updated.length);
if (updated.length > 5){
console.log("Over 5 updated events");
updated.forEach(function(req){
setTimeout(_getContactLocation(req), 2000);
});
updated = [];
}
}
};
And an example of the actual request:
var _getContactLocation = function(url){
r.get(baseUrl + url,
{ "strictSSL" : false, "headers" : { "Authorization" : "Bearer " + accessToken }},
function(err, res, body){
if(err)
console.log(err);
else {
var data = JSON.parse(body);
self.emit("data.contact", data);
}
}
);
};

Using the async library, the mapLimit function does exactly what you want. I can't provide an example for your specific use case as you did not provide any code.
From the readme:
mapLimit(arr, limit, iterator, callback)
The same as map only no more than "limit" iterators will be simultaneously
running at any time.
Note that the items are not processed in batches, so there is no guarantee that
the first "limit" iterator functions will complete before any others are
started.
Arguments
arr - An array to iterate over.
limit - The maximum number of iterators to run at any time.
iterator(item, callback) - A function to apply to each item in the array.
The iterator is passed a callback(err, transformed) which must be called once
it has completed with an error (which can be null) and a transformed item.
callback(err, results) - A callback which is called after all the iterator
functions have finished, or an error has occurred. Results is an array of the
transformed items from the original array.
Example
async.mapLimit(['file1','file2','file3'], 1, fs.stat, function(err, results){
// results is now an array of stats for each file
});
EDIT: Now that you provided code, I see that your use is a bit different from what I assumed. The async library is more useful when you know all the tasks to run up front. I don't know of a library off hand that will easily solve this for you. The above note is likely still relevant to people searching this topic so I'll leave it in.
Sorry, I don't have time to restructure your code, but this is an (un-tested) example of a function that makes an asynchronous request while self-throttling itself to 5 requests per second. I would highly recommend working off of this to come up with a more general solution that fits your code base.
var throttledRequest = (function () {
var queue = [], running = 0;
function sendPossibleRequests() {
var url;
while (queue.length > 0 && running < 5) {
url = queue.shift();
running++;
r.get(url, { /* YOUR OPTIONS HERE*/ }, function (err, res, body) {
running--;
sendPossibleRequests();
if(err)
console.log(err);
else {
var data = JSON.parse(body);
self.emit("data.contact", data);
}
});
}
}
return function (url) {
queue.push(url);
sendPossibleRequests();
};
})();
Basically, you keep a queue of all the data to be asynchronously processed (such as urls to be requested) and then after each callback (from a request) you try to launch off as many remaining requests as possible.

This is precisely what node's Agent class is designed to address. Have you done something silly like require('http').globalAgent.maxSockets = Number.MAX_VALUE or passed agent: false as a request option?
With Node's default behavior, your program will not send more than 5 concurrent requests at a time. Additionally, the Agent provides optimizations that a simple queue cannot (namely HTTP keepalives).
If you try to make many requests (for example, issue 100 requests from a loop), the first 5 will begin and the Agent will queue the remaining 95. As requests complete, it starts the next.
What you probably want to do is create an Agent for your web service requests, and pass it in to every call to request (rather than mixing requests in with the global agent).
var http=require('http'), svcAgent = http.Agent();
request({ ... , agent: svcAgent });

Related

MEAN Node JS requests intercourse at many requests

I have a MEAN app that works well with single requests, let's say calling /api/products?pid=500. But I recently discovered that at a "burst" of requests (i'm updating bulk around 50 products = 50 requests /api/products?pid=500 *** 550 with post data), the req.body sometimes gets a value of a new upcoming request.
The front app makes the calls in a foreach of selected products:
ds.forEach((d, key) => {
this.ApiCall.setData('products', { action: 'send-product', data: d })
.subscribe((result) => {
//we have results
});
});
//setData makes a http.post().map
Back app / mean analyses the post, tried to synthesize the code:
router.route('/')
.post(function (req, response) {
if(req.body.data){
var obj = { id: req.body.data.product_id }
if(req.body.data.linked_products){
req.body.data.linked_products.forEach(function(entry) {
obj.linked = entry; //more ifs
});
}
var async = require('async');
async.series({
q2: function(cb){
queryProducts.findOne({id: req.body.data.product_id, null).exec(cb);
},
q3: function(cb){
queryCategories.findOne({id: req.body.data.category_id, null).exec(cb);
}
}, function(err, qResults){
var alreadysent = false;
if (qResults.q3) qResults.q3.logs.forEach(function(entry) {
if(entry.sent){
alreadysent = true;
}
});
//more ifs
qResults.q3.external_codes.forEach(function(entry) {
obj.external_code = entry;//more ifs
});
if(req.body.data.price < 0){
response.json({message: "Negative price didn't sent"});
return;
}
if(qResults.q2.status=="inactive"){
response.json({message: "Inactive didn't sent"});
return;
}
req.body.data.campaigns(function(entry) {
obj.price_offers = entry;//more ifs
});
//more ifs and foreach similar
queryProducts.update({id: req.body.data.id}, {$push: { synced_products: obj }}, function (err, result) {
//HERE I found req.body.data with values of a future request
if(!err)
response.json({message: "Sent"});
return;
});
});
}
});
module.exports = router;
I understand that making requests
/api/products?pid=500
/api/products?pid=501
/api/products?pid=502
/api/products?pid=503
...
have different timings, but how is possible that a request (pid=501), calling the last req.body to have the value of req.body of new req (pid=503)?
Any ideas how to avoid? putting async first right after the post or making a
var reqbody = req.body
Thanks!
I believe this is due to the async module initialization. To quote from the node docs:
Caching
Modules are cached after the first time they are loaded. This means (among other things) that every call to require('foo') will get exactly the same object returned, if it would resolve to the same file.
Multiple calls to require('foo') may not cause the module code to be executed multiple times. This is an important feature. With it, "partially done" objects can be returned, thus allowing transitive dependencies to be loaded even when they would cause cycles.
To have a module execute code multiple times, export a function, and call that function.
When a burst of requests causes overlapping execution, you will have two (or more) uses of the async variable being modified "concurrently". I would suggest using some sort of mutex to control access to the async variable.

Parse Javascript Cloud Code .save() works only on a single user

I use Parse in iOS to run a cloud code method that gets an ID in it's request and receives a number in the response.
The purpose of the cloud code function is to take the request ID and add it to a field of 3 different users.
Here is the cloud code method in Javascript:
amount = 3;
// Use Parse.Cloud.define to define as many cloud functions as you want.
// For example:
Parse.Cloud.define("addToIDs", function(request, response) {
var value = request.params.itemId;
var query = new Parse.Query(Parse.User);
query.ascending("createdAt");
query.limit(100);
query.find({
success: function(results) {
var sent = 0;
for (var i = 0; i < results.length; i++) {
var idlst = results[i].get("idString");
if (idlst != null && idlst.indexOf(value) <= -1) {
idlst += value+"|";
results[i].set("idString", idlst);
results[i].save();
sent = sent+1;
}
if (sent >= amount) {
break;
}
}
response.success(sent);
},
error: function() {
response.error("Test failed");
}
});
});
When running this cloud code method I get a response of '3' meaning it called .save for 3 users. The problem is that when i go back to look in the Database viewer in the parse website it actually only updated a single user (Its always the same user). No matter how many times i run this code, it will only actually update the first user..
Anyone know why this is happening?
Both save and saveAll are asynchronous, so you should make sure the saving process is done.
Also note that, the user object can only be updated by the owner or request with masterkey.
The following code should work:
var amount = 3;
Parse.Cloud.define("addToIDs", function(request, response) {
var value = request.params.itemId;
var query = new Parse.Query(Parse.User);
query.ascending("createdAt");
query.limit(100);
return query.find()
.then(function(results) { // success
var toSave = [];
var promise = new Parse.Promise();
for (var i = 0; i < results.length; i++) {
var idlst = results[i].get("idString");
if (idlst != null && idlst.indexOf(value) <= -1) {
idlst += value+"|";
results[i].set("idString", idlst);
toSave.push(results[i]);
}
if (toSave.length >= amount) {
break;
}
}
// use saveAll to save multiple object without bursting multiple request
Parse.Object.saveAll(toSave, {
useMasterKey: true,
success: function(list) {
promise.resolve(list.length);
},
error: function() {
promise.reject();
}
});
return promise;
}).then(function(length) { // success
response.success(length);
}, function() { // error
response.error("Test failed");
});
});
The reason this is happening is two-fold:
save() is an asynchronous method, and
response.success() will immediately kill your running code as soon as it's called.
So what's happening is that inside your for loop you're running save() several times, but since it's asynchronous, they're simply thrown into the processing queue and your for loop continues on through. So it's quickly throwing all of your save()'s into the processing queue, and then it reaches your response.success() call but, by the time it's reached, only one of the save()'s has had a chance to successfully process.

Javascript memory consumption with map() over a large set and callbacks

I don't even know how to properly ask this question but I have concerns about the performance (mostly memory consumption) of the following code. I am anticipating that this code will consume a lot of memory because of map on a large set and a lot of 'hanging' functions that wait for external service. Are my concerns justified here? What would be a better approach?
var list = fs.readFileSync('./mailinglist.txt') // say 1.000.000 records
.split("\n")
.map( processEntry );
var processEntry = function _processEntry(i){
i = i.split('\t');
getEmailBody( function(emailBody, name){
var msg = {
"message" : emailBody,
"name" : i[0]
}
request(msg, function reqCb(err, result){
...
});
}); // getEmailBody
}
var getEmailBody = function _getEmailBody(obj, cb){
// read email template from file;
// v() returns the correct form for person's name with web-based service
v(obj.name, function(v){
cb(obj, v)
});
}
If you're worried about submitting a million http requests in a very short time span (which you probably should be), you'll have to set up a buffer of some kind.
one simple way to do it:
var lines = fs.readFileSync('./mailinglist.txt').split("\n");
var entryIdx = 0;
var done = false;
var processNextEntry = function () {
if (entryIdx < lines.length) {
processEntry(lines[entryIdx++]);
} else {
done = true;
}
};
var processEntry = function _processEntry(i){
i = i.split('\t');
getEmailBody( function(emailBody, name){
var msg = {
"message" : emailBody,
"name" : name
}
request(msg, function reqCb(err, result){
// ...
!done && processNextEntry();
});
}); // getEmailBody
}
// getEmailBody didn't change
// you set the ball rolling by calling processNextEntry n times,
// where n is a sensible number of http requests to have pending at once.
for (var i=0; i<10; i++) processNextEntry();
Edit: according to this blog post node has an internal queue system, it will only allow 5 simultaneous requests. But you can still use this method to avoid filling up that internal queue with a million items if you're worried about memory consumption.
Firstly I would advise against using readFileSync, and instead favour the async equivalent. Blocking on IO operations should be avoided as reading from a disk is very expensive, and whilst that's the sole purpose of your code now, I would consider how that might change in the future - and arbitrarily wasting clock cycles is never a good idea.
For large data files I would read them in in defined chunks and process them. If you can come up with some schema, either sentinels to distinguish data blocks within the file, or padding to boundaries, then process the file piece by piece.
This is just rough, untested off the top of my head, but something like:
var fs = require("fs");
function doMyCoolWork(startByteIndex, endByteIndex){
fs.open("path to your text file", 'r', function(status, fd) {
var chunkSize = endByteIndex - startByteIndex;
var buffer = new Buffer(chunkSize);
fs.read(fd, buffer, 0, chunkSize, 0, function(err, byteCount) {
var data = buffer.toString('utf-8', 0, byteCount);
// process your data here
if(stillWorkToDo){
//recurse
doMyCoolWork(endByteIndex, endByteIndex + 100);
}
});
});
}
Or look into one of the stream library functions for similar functionality.
H2H
ps. Javascript and Node works extremely well with async and eventing.. using sync is an antipattern in my opinion, and likely to cause code to be a headache in future

Fill Azure Mobile Services from Scheduler

I created code like this for getting news from xml export from another website and I am trying to fill with it my database.
function UpdateLunchTime() {
var httpRequest = require('request');
var xml2js = require('xml2js');
var parser = new xml2js.Parser();
var url = 'http://www...com/export/xml/actualities';
httpRequest.get({
url: url
}, function(err, response, body) {
if (err) {
console.warn(statusCodes.INTERNAL_SERVER_ERROR,
'Some problem.');
} else if (response.statusCode !== 200) {
console.warn(statusCodes.BAD_REQUEST,
'Another problem');
} else {
//console.log(body);
parser.parseString(body, function (err2, result) {
//console.log(result.Root.event);
var count = 0;
for (var i=0;i<result.Root.event.length;i++)
{
//console.log(result.Root.event[i]);
InsertActionToDatabase(result.Root.event[i]);
}
/*
result.Root.event.forEach(function(entry) {
InsertActionToDatabase(entry);
});
*/
});
}
});
}
function InsertActionToDatabase(action)
{
var queryString = "INSERT INTO Action (title, description, ...) VALUES (?, ?, ...)";
mssql.query(queryString, [action.akce[0], action.description[0],...], {
success: function(insertResults) {
},
error: function(err) {
console.log("Problem: " + err);
}
});
}
For individual actualities it's working fine but when I run it over whole xml I get this error:
Error: [Microsoft][SQL Server Native Client 10.0][SQL Server]Resource ID : 1. The request limit for the database is 180 and has been reached. See 'http://go.microsoft.com/fwlink/?LinkId=267637' for assistance.
And for a few last objects I get this error:
Error: [Microsoft][SQL Server Native Client 10.0]TCP Provider: Only one usage of each socket address (protocol/network address/port) is normally permitted.
Thanks for help
The problem is that you're trying to make too many concurrent (insert) operations in your database. Remember that in node.js (almost) everything is asynchronous, so when you call InsertActionToDatabase for one of the items, this operation will start right away and not wait before it finishes to return. So you're basically trying to insert all of the events at once, and as the error message said there's a limit on the number of concurrent connections which can be made to the SQL server.
What you need to do is to change your loop to run asynchronously, by waiting for one of the operations to complete before starting the next one (you can also "batch" send a smaller number of operations at once, continuing after each batch is complete, but the code is a little more complicated) as shown below.
var count = result.Root.event.length;
var insertAction = function(index) {
if (index >= count) return;
InsertActionToDatabase(result.Root.event[i], function() {
insertAction(index + 1);
});
}
insertAction(0);
And the InsertActionToDatabase function would take a callback parameter to be called when it's done.
function InsertActionToDatabase(item, done) {
var table = tables.getTable('event');
table.insert(item, {
success: function() {
console.log('Inserted event: ', item);
done();
}
});
}

Do I ever need to synchronize node.js code like in Java?

I have only recently started developing for node.js, so forgive me if this is a stupid question - I come from Javaland, where objects still live happily sequentially and synchronous. ;)
I have a key generator object that issues keys for database inserts using a variant of the high-low algorithm. Here's my code:
function KeyGenerator() {
var nextKey;
var upperBound;
this.generateKey = function(table, done) {
if (nextKey > upperBound) {
require("../sync/key-series-request").requestKeys(function(err,nextKey,upperBound) {
if (err) { return done(err); }
this.nextKey = nextKey;
this.upperBound = upperBound;
done(nextKey++);
});
} else {
done(nextKey++);
}
}
}
Obviously, when I ask it for a key, I must ensure that it never, ever issues the same key twice. In Java, if I wanted to enable concurrent access, I would make make this synchronized.
In node.js, is there any similar concept, or is it unnecessary? I intend to ask the generator for a bunch of keys for a bulk insert using async.parallel. My expectation is that since node is single-threaded, I need not worry about the same key ever being issued more than once, can someone please confirm this is correct?
Obtaining a new series involves an asynchronous database operation, so if I do 20 simultaneous key requests, but the series has only two keys left, won't I end up with 18 requests for a new series? What can I do to avoid that?
UPDATE
This is the code for requestKeys:
exports.requestKeys = function (done) {
var db = require("../storage/db");
db.query("select next_key, upper_bound from key_generation where type='issue'", function(err,results) {
if (err) { done(err); } else {
if (results.length === 0) {
// Somehow we lost the "issue" row - this should never have happened
done (new Error("Could not find 'issue' row in key generation table"));
} else {
var nextKey = results[0].next_key;
var upperBound = results[0].upper_bound;
db.query("update key_generation set next_key=?, upper_bound=? where type='issue'",
[ nextKey + KEY_SERIES_WIDTH, upperBound + KEY_SERIES_WIDTH],
function (err,results) {
if (err) { done(err); } else {
done(null, nextKey, upperBound);
}
});
}
}
});
}
UPDATE 2
I should probably mention that consuming a key requires db access even if a new series doesn't have to be requested, because the consumed key will have to be marked as used in the database. The code doesn't reflect this because I ran into trouble before I got around to implementing that part.
UPDATE 3
I think I got it using event emitting:
function KeyGenerator() {
var nextKey;
var upperBound;
var emitter = new events.EventEmitter();
var requesting = true;
// Initialize the generator with the stored values
db.query("select * from key_generation where type='use'", function(err, results)
if (err) { throw err; }
if (results.length === 0) {
throw new Error("Could not get key generation parameters: Row is missing");
}
nextKey = results[0].next_key;
upperBound = results[0].upper_bound;
console.log("Setting requesting = false, emitting event");
requesting = false;
emitter.emit("KeysAvailable");
});
this.generateKey = function(table, done) {
console.log("generateKey, state is:\n nextKey: " + nextKey + "\n upperBound:" + upperBound + "\n requesting:" + requesting + " ");
if (nextKey > upperBound) {
if (!requesting) {
requesting = true;
console.log("Requesting new series");
require("../sync/key-series-request").requestSeries(function(err,newNextKey,newUpperBound) {
if (err) { return done(err); }
console.log("New series available:\n nextKey: " + newNextKey + "\n upperBound: " + newUpperBound);
nextKey = newNextKey;
upperBound = newUpperBound;
requesting = false;
emitter.emit("KeysAvailable");
done(null,nextKey++);
});
} else {
console.log("Key request is already underway, deferring");
var that = this;
emitter.once("KeysAvailable", function() { console.log("Executing deferred call"); that.generateKey(table,done); });
}
} else {
done(null,nextKey++);
}
}
}
I've peppered it with logging outputs, and it does do what I want it to.
As another answer mentions, you will potentially end up with results different from what you want. Taking things in order:
function KeyGenerator() {
// at first I was thinking you wanted these as 'class' properties
// and thus would want to proceed them with this. rather than as vars
// but I think you want them as 'private' members variables of the
// class instance. That's dandy, you'll just want to do things differently
// down below
var nextKey;
var upperBound;
this.generateKey = function (table, done) {
if (nextKey > upperBound) {
// truncated the require path below for readability.
// more importantly, renamed parameters to function
require("key-series-request").requestKeys(function(err,nKey,uBound) {
if (err) { return done(err); }
// note that thanks to the miracle of closures, you have access to
// the nextKey and upperBound variables from the enclosing scope
// but I needed to rename the parameters or else they would shadow/
// obscure the variables with the same name.
nextKey = nKey;
upperBound = uBound;
done(nextKey++);
});
} else {
done(nextKey++);
}
}
}
Regarding the .requestKeys function, you will need to somehow introduce some kind of synchronization. This isn't actually terrible in one way because with only one thread of execution, you don't need to sweat the challenge of setting your semaphore in a single operation, but it is challenging to deal with the multiple callers because you will want other callers to effectively (but not really) block waiting for the first call to requestKeys() which is going to the DB to return.
I need to think about this part a bit more. I had a basic solution in mind which involved setting a simple semaphore and queuing the callbacks, but when I was typing it up I realized I was actually introducing a more subtle potential synchronization bug when processing the queued callbacks.
UPDATE:
I was just finishing up one approach as you were writing about your EventEmitter approach, which seems reasonable. See this gist which illustrates the approach. I took. Just run it and you'll see the behavior. It has some console logging to see which calls are getting deferred for a new key block or which can be handled immediately. The primary moving part of the solution is (note that the keyManager provides the stubbed out implementation of your require('key-series-request'):
function KeyGenerator(km) {
this.nextKey = undefined;
this.upperBound = undefined;
this.imWorkingOnIt = false;
this.queuedCallbacks = [];
this.keyManager = km;
this.generateKey = function(table, done) {
if (this.imWorkingOnIt){
this.queuedCallbacks.push(done);
console.log('KG deferred call. Pending CBs: '+this.queuedCallbacks.length);
return;
};
var self=this;
if ((typeof(this.nextKey) ==='undefined') || (this.nextKey > this.upperBound) ){
// set a semaphore & add the callback to the queued callback list
this.imWorkingOnIt = true;
this.queuedCallbacks.push(done);
this.keyManager.requestKeys(function(err,nKey,uBound) {
if (err) { return done(err); }
self.nextKey = nKey;
self.upperBound = uBound;
var theCallbackList = self.queuedCallbacks;
self.queuedCallbacks = [];
self.imWorkingOnIt = false;
theCallbackList.forEach(function(f){
// rather than making the final callback directly,
// call KeyGenerator.generateKey() with the original
// callback
setImmediate(function(){self.generateKey(table,f);});
});
});
} else {
console.log('KG immediate call',self.nextKey);
var z= self.nextKey++;
setImmediate(function(){done(z);});
}
}
};
If your Node.js code to calculate the next key didn't need to execute an async operation then you wouldn't run into synchronization issues because there is only one JavaScript thread executing code. Access to the nextKey/upperBound variables will be done in sequence by only one thread (i.e. request 1 will access first, then request 2, then request 3 et cetera.) In the Java-world you will always need synchronization because multiple threads will be executing even if you didn't make a DB call.
However, in your Node.js code since you are making an async call to get the nextKey you could get strange results. There is still only one JavaScript thread executing your code, but it would be possible for request 1 to make the call to the DB, then Node.js might accept request 2 (while request 1 is getting data from the DB) and this second request will also make a request to the DB to get keys. Let's say that request 2 gets data from the DB quicker than request 1 and update nextKey/upperBound variables with values 100/150. Once request 1 gets its data (say values 50/100) then it will update nextKey/upperBound. This scenario wouldn't result in duplicate keys, but you might see gaps in your keys (for example, not all keys 100 to 150 will be used because request 1 eventually reset the values to 50/100)
This makes me think that you will need a way to sync access, but I am not exactly sure what will be the best way to achieve this.

Categories