I'm making a bunch calls to a database that contains a large amount of data on a Windows 7 64 bit OS. As the calls are queuing up I get the error (for ever HTTP call after the first error):
Error: connect ENOBUFS *omitted* - Local (undefined:undefined)
From my google searching I've learned that this error means that my buffer has grown too large and my system's memory can no longer handle the buffer's size.
But I don't really understand what this means. I'm using node.js to with an HTTPS library to handle my requests. When the requests are getting queued and the sockets are opening is the buffer's size allocated in RAM? What will allow the buffer to expand to a greater size? Is this simply a hardware limitation?
I've also read that some OS are able to handle the size of the buffer better than other OS's. Is this the case? If so which OS would be better suited for running a node script that needs to fetch a lot of data via HTTPS requests?
Here's how I'm doing my requests.
for (let j = 0; j < dataQueries; j++) {
getData(function())
}
function getData(callback){
axios.get(url, config)
.then((res) => {
// parse res
callback(parsedRes(res))
}).catch(function(err) {
console.log("Spooky problem alert! : " + err);
})
}
I've omitted some code for brevity, but this is generally how I'm doing my requests. I have a for loop that for every iteration launches a GET request via axios.
I know there is an axios.all command that is used for storing the promise the axios.HTTPMethod gives you, but I saw no change in my code when I set it up to store promises and then iterate over the promises via axios.all
Thanks #Jonasw for your help, but there is a very simple solution to this problem.
I used the small library throttled-queue to get the job done. (If you look at the source code it would be pretty easy to implement your own queue based on this package.
My code changed to:
const throttledQueue = require('throttled-queue')
let throttle = throttledQueue(15, 1000) // 15 times per second
for (let j = 0; j < dataQueries; j++) {\
throttle(function(){
getData(function(res){
// do parsing
})
}
}
function getData(callback){
axios.get(url, config)
.then((res) => {
// parse res
callback(parsedRes(res))
}).catch(function(err) {
console.log("Spooky problem alert! : " + err);
})
}
In my case this got resolved by deleting the autogenerated zip files from my workspace, which got created every time I did cdk deploy. Turns out that my typescript compiler treated these files as source files and counted them into the tarball.
Youre starting a lot of data Queries at the same time. You could chain them up using a partly recursive function, so that theyre executed one after another:
(function proceedwith(j) {
getData(function(){
if(j<dataQueries-1) proceedwith(j+1);
});
})(0)
Experienced the same issue when starting too many requests.
Tried throttled-queue, but wasn't working correctly.
system-sleep worked for me, effectively slowing down the rate at which the requests were made. Sleep is best used in synchronized code, to block before using sync/async code.
Example: (using sleep to limit the rate updateAddress() is called)
// Asynchronus call (what is important is that forEach is synchronous)
con.query(sql, function (err, result) {
if (err) throw err;
result.forEach(function(element) {
sleep(120); // Blocking call sleep for 120ms
updateAddress(element.address); // Another async call (network request)
});
});
Related
I was tasked with transferring a large portion of data using javascript and an API from one database to another. Yes I understand that there are better ways of accomplishing this task, but I was asked to try this method.
I wrote some javascript that makes a GET call to an api that returns an array of data, which I then turnaround and make calls to another api to send this data as individual POST requests.
What I have written so far seems to works fairly well, and I have been able to send over 50k individual POST requests without any errors. But I am having trouble when the number of POST requests increases past around 100k. I end up running out of memory and the browser crashes.
From what I understand so far about promises, is that there may be an issue where promises (or something else?) are still kept in heap memory after they are resolved, which results in running out of memory after too many requests.
I've tried 3 different methods to get all the records to POST successfully after searching for the past couple days. This has included using Bluebirds Promise.map, as well as breaking up the array into chunks first before sending them as POST requests. Each method seems to work up until it has processed about 100k records before it crashes.
async function amGetRequest(controllerName) {
try{
const amURL = "http://localhost:8081/api/" + controllerName;
const amResponse = await fetch(amURL, {
"method": "GET",
});
return await amResponse.json();
} catch (err) {
closeModal()
console.error(err)
}
};
async function brmPostRequest(controllerName, body) {
const brmURL = urlBuilderBRM(controllerName);
const headers = headerBuilderBRM();
try {
await fetch(brmURL, {
"method": "POST",
"headers": headers,
"body": JSON.stringify(body)
});
}
catch(error) {
closeModal()
console.error(error);
};
};
//V1.0 Send one by one and resolve all promises at the end.
const amResult = await amGetRequest(controllerName); //(returns an array of ~245,000 records)
let promiseArray = [];
for (let i = 0; i < amResult.length; i++) {
promiseArray.push(await brmPostRequest(controllerName, amResult[i]));
};
const postResults = await Promise.all(promiseArray);
//V2.0 Use bluebirds Promise.map with concurrency set to 100
const amResult = await amGetRequest(controllerName); //(returns an array of ~245,000 records)
const postResults = Promise.map(amResult, async data => {
await brmPostRequest(controllerName, data);
return Promise.resolve();
}, {concurrency: 100});
//V3.0 Chunk array into max 1000 records and resolve 1000 promises before looping to the next 1000 records
const amResult = await amGetRequest(controllerName); //(returns an array of ~245,000 records)
const numPasses = Math.ceil(amResult.length / 1000);
for (let i=0; i <= numPasses; i++) {
let subset = amResult.splice(0,1000);
let promises = subset.map(async (record) => {
await brmPostRequest(controllerName, record);
});
await Promise.all(promises);
subset.length = 0; //clear out temp array before looping again
};
Is there something that I am missing about getting these promises cleared out of memory after they have been resolved?
Or perhaps a better method of accomplishing this task?
Edit: Disclaimer - I'm still fairly new to JS and still learning.
"Well-l-l-l ... you're gonna need to put a throttle on this thing!"
Without (pardon me ...) attempting to dive too deeply into your code, "no matter how many records you need to transfer, you need to control the number of requests that the browser attempts to do at any one time."
What's probably happening right now is that you're stacking up hundreds or thousands of "promised" requests in local memory – but, how many requests can the browser actually transmit at once? That should govern the number of requests that the browser actually attempts to do. As each reply is returned, your software then decides whether to start another request and if so for which record.
Conceptually, you have so-many "worker bees," according to the number of actual network requests your browser can simultaneously do. Your software never attempts to launch more simultaneous requests than that: it simply launches one new request as each one request is completed. Each request, upon completion, triggers code that decides to launch the next one.
So – you never are "sending thousands of fetch requests." You're probably sending only a handful at a time, even though, in this you-controlled manner, "thousands of requests do eventually get sent."
As you are not intereted in the values delivered by brmPostRequest(), there's no point mapping the original array; neither the promises nor the results need to be acumulated.
Not doing so will save memory and may allow progress beyond the 100k sticking point.
async function foo() {
const amResult = await amGetRequest(controllerName);
let counts = { 'successes': 0, 'errors': 0 };
for (let i = 0; i < amResult.length; i++) {
try {
await brmPostRequest(controllerName, amResult[i]);
counts.successes += 1;
} catch(err) {
counts.errors += 1;
}
};
const console.log(counts);
}
I'm currently getting into developing alexa-skills. This is in fact the 1st time I'm trying this and I kinda works out good so far. However, I stumbled upon a problem which seems to be wide-spreaded but I couldn't find an answer how to solve it.
First things first:
I started this skill by following a tutorial. It might be that this tutorial is outdated and therefore this error appears.
I created a skill from the scratch and it works to the part where the LaunchRequest is invoked:
As you can see, I get my response as expected. (works on the test-environment as well as on alexa itself). Now, when try to call an IntentRequest, I just get the error-message:
The remote endpoint could not be called, or the response it returned was invalid.
As I can tell from the picture / request, the correct intent-request is called (in my case getSubscriberCount ) - And this is the point where I have no idea anymore on how to resolve this problem.
To keep things short, this here is the JS-part for the Intent:
case "IntentRequest":
// Intent Request
console.log(INTENT REQUEST)
switch(event.request.intent.name) {
case "GetSubscriberCount":
var endpoint = "my url"
var body = ""
https.get(endpoint, (response) => {
response.on('data', (chunk) => { body += chunk })
response.on('end', () => {
var data = JSON.parse(body)
var subscriberCount = data.items[0].statistics.subscriberCount
context.succeed(
generateResponse(
buildSpeechletResponse(`Du hast momentan ${subscriberCount} Abonnenten`, true),
{}
)
)
And this is causing my problems. To test what exactly is wrong, I tried the following:
Called the endpoint in my browser --> Correct output
Adjusted the "response" to the minimum to see if that works --> didn't work
Checked several sources related to this error --> didn't help either
I saw some approaches to get rid of this, since this seems to be a common issue, but I got lost. Someone mentioned an environment variable, which I couldn't put my hands on. Another one suggested to run the JSON request manually, which I tried, but leading to the same error.
Hopefully you can help me out here.
Assuming you AWS lambda, it might be because you didn't create your response right or your AWS lambda function had a error.
I use the following API in my program to detrmine free port and provide it to application to run
portscanner.findAPortNotInUse(3000, 65000, '127.0.0.1', function(error, port) {
console.log('AVAILABLE PORT AT: ' + port)
})
https://github.com/baalexander/node-portscanner
This free port are given to application for use and working OK.
The problem is that if I provide a free port to application A and the application is doesn't occupied it yet(sometimes it takes some time...) and there is coming other application B and request a free port so it give to APP B the port of app A
Which cause to problem...
is there any elegant way to solve it?
my application doesn't have state so it cannot save to which app get which port...
There is solution that we can randomize the range but this is not robust ...
In my application Im getting the URL of the app that I should provide the free port to run.
update
I cannot use some broker or someting else that will controll this outside I need to find some algorithm (maybe with some smart random ) that can help me to do it internally i.e. my program is like singleton and I need some trick how to give port between 50000 to 65000 that will reduce the amount of collision of port that was provided to the apps
update 2
I've decided to try something like the following what do you think ?
using lodash https://lodash.com/docs/4.17.2#random to determine ports between with loops that provide 3(or more if that make sense) numbers for ranges like following
portscanner.findAPortNotInUse([50001, 60000, 600010], '127.0.0.1', function(err, port) {
if(err) {
console.log("error!!!-> " +err);
}else {
console.log('Port Not in Use ' + port);
}
//using that in a loop
var aa = _.random(50000, 65000);
Then If I got false in the port i.e. all 3 port are occupied ,run this process again for 3 other random number.comments suggestion are welcomed!!!
I try to find some way to avoid collision as much as possible...
I would simply accept the fact that things can go wrong in a distributed system and retry the operation (i.e., getting a free port) if it failed for whatever reason on the first attempt.
Luckily, there are lots of npm modules out there that do that already for you, e.g. retry.
Using this module you can retry an asynchronous operation until it succeeds, and configure waiting strategies, and how many times it should be retried maximally, and so on…
To provide a code example, it basically comes down to something such as:
const operation = retry.operation();
operation.attempt(currentAttempt => {
findAnUnusedPortAndUseIt(err => {
if (operation.retry(err)) {
return;
}
callback(err ? operation.mainError() : null);
});
});
The benefits of this solution are:
Works without locking, i.e. it is efficient and makes low usage of resources if everything is fine.
Works without a central broker or something like that.
Works for distributed systems of any size.
Uses a pattern that you can re-use in distributed systems for all kinds of problems.
Uses a battle-tested and solid npm module instead of handwriting all these things.
Does not require you to change your code in a major way, instead it is just adding a few lines.
Hope this helps :-)
If your applications can open ports with option like SO_REUSEADDR, but operation system keeps ports in the list in TIME_WAIT state, you can bind/open port you want to return with SO_REUSEADDR, instantly close it and give it back to application. So for TIME_WAIT period (depending on operation system it can be 30 seconds, and actual time should be decided/set up or found by experiment/administration) port list will show this port as occupied.
If your port finder does not give port numbers for ports in TIME_WAIT state, problem solved by relatively expensive open/close socket operation.
I'd advise you look for a way to retain state. Even temporary state, in memory, is better than nothing at all. This way you could at least avoid giving out ports you've already given out. Because those are very likely not free anymore. (This would be as simple as saving them and regenerating a random port if you notice you found a random port you've already given out). If you don't want collisions, build your module to have state so it can avoid them. If you don't want to do that, you'll have to accept there are going to be collisions sometimes when there don't need to be.
If the URLs you get are random, the best you can do is guess randomly. If you can derive some property in which the URLs uniquely and consistently differ, you could design something around that.
Code example:
function getUnusedPort(url) {
// range is [0, 65001). (inclusive zero, exclusive 65001)
const guessPort = () => Math.floor(Math.random() * 15001) + 50000;
let randomPort = guessPort();
while (checkPortInUse(randomPort)) {
randomPort = guessPort();
}
return randomPort;
}
Notes:
checkPortInUse will probably be asynchronous so you'll have to
accommodate for that.
You said 'between 50000 and 65000'. This is from 50000 up to and including 65000.
When managing multiple applications or multiple servers, where one must be right the first time (without retrying), you need a single source of truth. Applications on the same machine can talk to a database, a broker server or even a file, so long as the resource is "lockable". (Servers work in similar ways, though not with local files).
So your flow would be something like:
App A sends request to service to request lock.
When lock is confirmed, start port scanner
When port is used, release lock.
Again, this could be a "PortService" you write that hands out unused ports, or a simple lock in some shared resource so two things are getting the same port at the same time.
Hopefully you can find something suitable to work for your apps.
As you want to find an port that is not in use in your application, you could do is run following command:
netstat -tupln | awk '{print $4}' | cut -d ':' -f2
so in your application you will use this like:
const exec = require('child_process').exec;
exec('netstat -tupln | awk '{print $4}' | cut -d ':' -f2', (error, stdout, stderr) => {
if (error) {
console.error(`exec error: ${error}`);
return;
}
var listPorts = stdout.split(/\n/)
console.log(listPorts); // list of all ports already in use
var aa = _.random(50000, 65000); // generate random port
var isFree = (listPorts.indexOf(aa)===-1) ? true : false;
if(isFree){
//start your appliation
}else{
// restart the search, write this in a function and start search again
}
});
this should give you list of all ports that are in use,so use any port except ones in the listPorts.
First of all - I am aware of this answer to a kind of similar problem.
Problem
I have a third party protocol, that uses TCP\IP. This protocol defines that the server replies to every message received. On the client side (which I try to implement) I have to wait for the answer from the server.
The problem occurs, when I try to send messages. I need to wait for the answer from the first message before I send the second one (like ping-pong).
I tried to do multiple writes on my NodeJS tcp-client like this, which understandably fails due to async:
client.connect(connectOptions, function () {
client.write(message1);
client.write(message2);
});
Like I said before, I have a third party component, which responses to both messages with a numeric value. So when
client.on('data',function (data) {});
fires an event, I can't distinguish which message, was responsible for the answer. Unlike the linked answer I don't have the ability, to tag the answer on the server side.
I am new to node.js, so I try to figure out the best way to solve this kind of problem, as it´s of the nature: do synchronous things in the async environment.
One way would be to use a common list of handlers to keep track of requests and responses:
var handlers = [];
client.connect(connectOptions, function () {
client.write(message1);
handlers.push(function msg1_handler(data) {});
client.writewrite(message2);
handlers.push(function msg2_handler(data) {});
});
client.on('data',function(data) {
var handler = handlers.shift();
handler(data);
});
All of this should obviously be wrapped in a separate class containing both handlers an client objects. It's just an example of how to do it. The drawback is that if the server fails to respond to some request then you have a complete mess, hard to make it right.
Another idea is to buffer requests:
function BufferedClient(cli) {
this.cli = cli;
this.buffer = [];
this.waiting_for_response = false;
var that = this;
cli.on('data', function(data) {
that.waiting_for_response = false;
var pair = that.buffer.shift();
var handler = pair[0];
process.nextTick(function() {
// we use .nextTick to avoid potential
// exception in handler which would break
// BufferedClient
handler(data);
});
that.flush();
});
};
BufferedClient.prototype = {
request: function(msg, handler) {
this.buffer.push([handler, msg]);
this.flush();
},
flush: function() {
var pair = this.buffer[0];
if (pair && !this.waiting_for_response) {
this.cli.write(pair[1]);
this.waiting_for_response = true;
}
}
};
This time you send requests sequentially (so like synchronous) due to how .request() and .on('data') handler work together with .flush() function. Usage:
client.connect(connectOptions, function () {
var buff_cli = new BufferedClient(client);
buff_cli.request(message1, function(data) { });
buff_cli.request(message2, function(data) { });
});
Now even if the server fails to respond you don't have a mess. However if you issue buff_cli.request parallely and one of them fails then you will have a memory leak (since this.buffer is getting bigger while nothing is draining it because the BufferedClient is waiting for a response). This can be fixed by adding some timeouts on the socket.
Note that both solutions assume that the server never pushes anything to the client without a request.
If I were you I would go with second solution. Note that I haven't tested the code so it might be buggy but the general idea should be ok.
Side note: When you implement a server (and I know that you don't in this case) you should always have a protocol that matches each request with a response in a unique way. One way would be to send a unique ID with each request so that the server would be respond with the same ID. In such scenario matching request with response is very easy and you avoid all that mess.
I have a node application handling some ZeroMQ events coming from another application utilizing the Node-ZMQ bindings found here: https://github.com/JustinTulloss/zeromq.node
The issue I am running into is one of the operations from an event takes a long time to process and this appears to be blocking any other event from being processed during this time. Although the application is not currently clustered, doing so would only afford a few more threads and doesn't really solve the issue. I am wondering if there is a way of allowing for these async calls to not block other incoming requests while they process, and how I might go about implementing them.
Here is a highly condensed/contrived code example of what I am doing currently:
var zmq = require('zmq');
var zmqResponder = zmq.socket('rep');
var Client = require('node-rest-client').Client;
var client = new Client();
zmqResponder.on('message', function (msg, data) {
var parsed = JSON.parse(msg);
logging.info('ZMQ Request received: ' + parsed.event);
switch (parsed.event) {
case 'create':
//Typically short running process, not an issue
case 'update':
//Long running process this is the issue
serverRequest().then(function(response){
zmqResponder.send(JSON.stringify(response));
});
}
});
function serverRequest(){
var deferred = Q.defer();
client.get(function (data, response) {
if (response.statusCode !== 200) {
deferred.reject(data.data);
} else {
deferred.resolve(data.data);
}
});
return deferred.promise;
}
EDIT** Here's a gist of the code: https://gist.github.com/battlecow/cd0c2233e9f197ec0049
I think, through the comment thread, I've identified your issue. REQ/REP has a strict synchronous message order guarantee... You must receive-send-receive-send-etc. REQ must start with send and REP must start with receive. So, you're only processing one message at a time because the socket types you've chosen enforce that.
If you were using a different, non-event-driven language, you'd likely get an error telling you what you'd done wrong when you tried to send or receive twice in a row, but node lets you do it and just queues the subsequent messages until it's their turn in the message order.
You want to change REQ/REP to DEALER/ROUTER and it'll work the way you expect. You'll have to change your logic slightly for the ROUTER socket to get it to send appropriately, but everything else should work the same.
Rough example code, using the relevant portions of the posted gist:
var zmqResponder = zmq.socket('router');
zmqResponder.on('message', function (msg, data) {
var peer_id = msg[0];
var parsed = JSON.parse(msg[1]);
switch (parsed.event) {
case 'create':
// build parsedResponse, then...
zmqResponder.send([peer_id, JSON.stringify(parsedResponse)]);
break;
}
});
zmqResponder.bind('tcp://*:5668', function (err) {
if (err) {
logging.error(err);
} else {
logging.info("ZMQ awaiting orders on port 5668");
}
});
... you need to grab the peer_id (or whatever you want to call it, in ZMQ nomenclature it's the socket ID of the socket you're sending from, think of it as an "address" of sorts) from the first frame of the message you receive, and then use send it as the first frame of the message you send back.
By the way, I just noticed in your gist you are both connect()-ing and bind()-ing on the same socket (zmq.js lines 52 & 143, respectively). Don't do that. Inferring from other clues, you just want to bind() on this side of the process.