How does loop a lot of URL request work? - javascript

For example:
for(var i = 0; i < 100000000; i++) {
requestify.request('http://www.domain.com/' + i)
.then(function() { // do something; } );
}
So, how does NodeJS handle this type of code? It will try to send 100000000 requests or it will request a few request at the same time and put the rest in the queue so that when active request has finished, it will load more request from queue and do it?
I want to ask this question because I run into similiar problem. I load a database of 1,000,000 records of URL that I will later on make a request on each of that URL. So, I don't want my program to hang because it try to request so much at the same time.

Node will fire these requests as fast as it can, it will not wait one to finish and then fire another one.
Javascript is asynchronous, if you want to control the flow you can use async module.

Related

Express server that creates maximum of 2 child/worker processes

I'm experimenting with node and it's child_process module.
My goal is to create server which will run on maximum of 3 processes (1 main and optionally 2 children).
I'm aware that code below may be incorrect, but it displays interesting results.
const app = require ("express")();
const {fork} = require("child_process")
const maxChildrenRuning = 2
let childrenRunning = 0
app.get("/isprime", (req, res) => {
if(childrenRunning+1 <= maxChildrenRuning) {
childrenRunning+=1;
console.log(childrenRunning)
const childProcess = fork('./isprime.js');
childProcess.send({"number": parseInt(req.query.number)})
childProcess.on("message", message => {
console.log(message)
res.send(message)
childrenRunning-=1;
})
}
})
function isPrime(number) {
...
}
app.listen(8000, ()=>console.log("Listening on 8000") )
I'm launching 3 requests with 5*10^9'ish numbers.
After 30 seconds I receive 2 responses with correct results.
CPU stops doing hard work and goes idle
Surprisingly after next 1 minute 30 seconds 1 thread starts to proceed, still pending, 3rd request and finishes after next 30 seconds with correct answer. Console log displayed below:
> node index.js
Listening on 8000
1
2
{ number: 5000000029, isPrime: true, time: 32471 }
{ number: 5000000039, isPrime: true, time: 32557 }
1
{ number: 5000000063, isPrime: true, time: 32251 }
Either express listens and checks pending requests once for a while or my browser sends actual requests every x time while pending. Can anybody explain what is happening here and why? How can I correctly achieve my goal?
The way your server code is written, if you receive a /isprime request and two child processes are already running, your request handler for /isprime does nothing. It never sends any response. You don't pass that first if test and then nothing happens afterwards. So, that request will just sit there with the client waiting for a response. Depending upon the client, it will probably eventually time out as a dead/inactive request and the client will shut it down.
Some clients (like browsers) may assume that something just got lost in the network and they may retry the request by sending it again. It would be my guess that this is what is happening in your case. The browser eventually times out and then resends the request. By the time it retries, there are less than two child processes running so it gets processed on the retry.
You could verify that the browser is retrying automatically by going to the network tab in the Chrome debugger and watching exactly what the browser sends to your server and watch that third request, see it timeout and see if it is the browser retrying the request.
Note, this code seems to be only partially implemented because you initially start two child processes, but you don't reuse those child processes. Once they finish and you decrement maxChildrenRuning, your code will then start another child process. Probably what you really want to do is to keep track of the two child processes you started and when one finishes, add it to an array of "available child processes" so when a new request comes in, you can just use an existing child process that is already started, but idle.
You also need to either queue incoming requests when all the child processes are full or you need to send some sort of error response to the http request. Never sending an http response to an incoming request is a poor design that just leads to great inefficiencies (connections hanging around much longer than needed that never actually accomplish anything).

Can someone explain an ENOBUFS error?

I'm making a bunch calls to a database that contains a large amount of data on a Windows 7 64 bit OS. As the calls are queuing up I get the error (for ever HTTP call after the first error):
Error: connect ENOBUFS *omitted* - Local (undefined:undefined)
From my google searching I've learned that this error means that my buffer has grown too large and my system's memory can no longer handle the buffer's size.
But I don't really understand what this means. I'm using node.js to with an HTTPS library to handle my requests. When the requests are getting queued and the sockets are opening is the buffer's size allocated in RAM? What will allow the buffer to expand to a greater size? Is this simply a hardware limitation?
I've also read that some OS are able to handle the size of the buffer better than other OS's. Is this the case? If so which OS would be better suited for running a node script that needs to fetch a lot of data via HTTPS requests?
Here's how I'm doing my requests.
for (let j = 0; j < dataQueries; j++) {
getData(function())
}
function getData(callback){
axios.get(url, config)
.then((res) => {
// parse res
callback(parsedRes(res))
}).catch(function(err) {
console.log("Spooky problem alert! : " + err);
})
}
I've omitted some code for brevity, but this is generally how I'm doing my requests. I have a for loop that for every iteration launches a GET request via axios.
I know there is an axios.all command that is used for storing the promise the axios.HTTPMethod gives you, but I saw no change in my code when I set it up to store promises and then iterate over the promises via axios.all
Thanks #Jonasw for your help, but there is a very simple solution to this problem.
I used the small library throttled-queue to get the job done. (If you look at the source code it would be pretty easy to implement your own queue based on this package.
My code changed to:
const throttledQueue = require('throttled-queue')
let throttle = throttledQueue(15, 1000) // 15 times per second
for (let j = 0; j < dataQueries; j++) {\
throttle(function(){
getData(function(res){
// do parsing
})
}
}
function getData(callback){
axios.get(url, config)
.then((res) => {
// parse res
callback(parsedRes(res))
}).catch(function(err) {
console.log("Spooky problem alert! : " + err);
})
}
In my case this got resolved by deleting the autogenerated zip files from my workspace, which got created every time I did cdk deploy. Turns out that my typescript compiler treated these files as source files and counted them into the tarball.
Youre starting a lot of data Queries at the same time. You could chain them up using a partly recursive function, so that theyre executed one after another:
(function proceedwith(j) {
getData(function(){
if(j<dataQueries-1) proceedwith(j+1);
});
})(0)
Experienced the same issue when starting too many requests.
Tried throttled-queue, but wasn't working correctly.
system-sleep worked for me, effectively slowing down the rate at which the requests were made. Sleep is best used in synchronized code, to block before using sync/async code.
Example: (using sleep to limit the rate updateAddress() is called)
// Asynchronus call (what is important is that forEach is synchronous)
con.query(sql, function (err, result) {
if (err) throw err;
result.forEach(function(element) {
sleep(120); // Blocking call sleep for 120ms
updateAddress(element.address); // Another async call (network request)
});
});

NodeJS TCP Client communication

First of all - I am aware of this answer to a kind of similar problem.
Problem
I have a third party protocol, that uses TCP\IP. This protocol defines that the server replies to every message received. On the client side (which I try to implement) I have to wait for the answer from the server.
The problem occurs, when I try to send messages. I need to wait for the answer from the first message before I send the second one (like ping-pong).
I tried to do multiple writes on my NodeJS tcp-client like this, which understandably fails due to async:
client.connect(connectOptions, function () {
client.write(message1);
client.write(message2);
});
Like I said before, I have a third party component, which responses to both messages with a numeric value. So when
client.on('data',function (data) {});
fires an event, I can't distinguish which message, was responsible for the answer. Unlike the linked answer I don't have the ability, to tag the answer on the server side.
I am new to node.js, so I try to figure out the best way to solve this kind of problem, as it´s of the nature: do synchronous things in the async environment.
One way would be to use a common list of handlers to keep track of requests and responses:
var handlers = [];
client.connect(connectOptions, function () {
client.write(message1);
handlers.push(function msg1_handler(data) {});
client.writewrite(message2);
handlers.push(function msg2_handler(data) {});
});
client.on('data',function(data) {
var handler = handlers.shift();
handler(data);
});
All of this should obviously be wrapped in a separate class containing both handlers an client objects. It's just an example of how to do it. The drawback is that if the server fails to respond to some request then you have a complete mess, hard to make it right.
Another idea is to buffer requests:
function BufferedClient(cli) {
this.cli = cli;
this.buffer = [];
this.waiting_for_response = false;
var that = this;
cli.on('data', function(data) {
that.waiting_for_response = false;
var pair = that.buffer.shift();
var handler = pair[0];
process.nextTick(function() {
// we use .nextTick to avoid potential
// exception in handler which would break
// BufferedClient
handler(data);
});
that.flush();
});
};
BufferedClient.prototype = {
request: function(msg, handler) {
this.buffer.push([handler, msg]);
this.flush();
},
flush: function() {
var pair = this.buffer[0];
if (pair && !this.waiting_for_response) {
this.cli.write(pair[1]);
this.waiting_for_response = true;
}
}
};
This time you send requests sequentially (so like synchronous) due to how .request() and .on('data') handler work together with .flush() function. Usage:
client.connect(connectOptions, function () {
var buff_cli = new BufferedClient(client);
buff_cli.request(message1, function(data) { });
buff_cli.request(message2, function(data) { });
});
Now even if the server fails to respond you don't have a mess. However if you issue buff_cli.request parallely and one of them fails then you will have a memory leak (since this.buffer is getting bigger while nothing is draining it because the BufferedClient is waiting for a response). This can be fixed by adding some timeouts on the socket.
Note that both solutions assume that the server never pushes anything to the client without a request.
If I were you I would go with second solution. Note that I haven't tested the code so it might be buggy but the general idea should be ok.
Side note: When you implement a server (and I know that you don't in this case) you should always have a protocol that matches each request with a response in a unique way. One way would be to send a unique ID with each request so that the server would be respond with the same ID. In such scenario matching request with response is very easy and you avoid all that mess.

How to wait for the backend in Protractor?

I'm testing a web page where the user can send a message to another via a textinput. A POST request is then send on the server and the message is dumped on the disk in the var/mail/new folder.
After automatising the sending of the message in the page with Protractor I'm calling browser.waitForAngular() and browser.driver.sleep(4000) to leave time for the backend to write the mail on the disk.
After these calls the check of the email's presence fails. When looking in the Unix shell, I can confirm that the email was sent and also the next test marked with in Jasmine with it confirms the presence of the email.
Why is browser.driver.sleep(4000) not effective to wait for the backend to proceed? How can I correct the following code?
it("is possible to send a message", function() {
shared.loginContributor();
var mailsBeforeMessaging =
fs.readdirSync(browser.params.mail.queue_path + "/new");
console.log('mailsBeforeMessaging');
console.log(mailsBeforeMessaging.length);
console.log(fs.lstatSync(browser.params.mail.queue_path + "/new"));
var usersListing = new UserPages.UsersListing().get();
var annotatorPage = usersListing.getUserPage("annotator");
annotatorPage.sendMessage("title5", "content64");
exec("/tmp/check.sh");
// we expect the message widget to disappear
var button = element(by.css(".user-profile-info-button"));
console.log('waiting');
browser.wait(EC.elementToBeClickable(button), 5000);
console.log('waiting is finished');
expect(EC.elementToBeClickable(button)).toBeTruthy();
// wait for mail to be dumped on the disk?
browser.waitForAngular();
browser.driver.sleep(4000);
exec("/tmp/check.sh");
var mailsAfterMessaging =
fs.readdirSync(browser.params.mail.queue_path + "/new");
console.log('mailsAfterMessaging');
// ERROR: here the number of emails is NOT incremented
console.log(mailsAfterMessaging.length);
console.log(fs.lstatSync(browser.params.mail.queue_path + "/new"));
});
it("xyz", function() {
console.log(fs.lstatSync(browser.params.mail.queue_path + "/new"));
// here the number of emails is incremented
var mailsAfterMessaging =
fs.readdirSync(browser.params.mail.queue_path + "/new");
console.log('mailsAfterMessaging');
console.log(mailsAfterMessaging.length);
});
Most of the Protractor functions do not do anything. They queue something up to be done later, and return promise to do it. After an it block schedules a bunch of things to do, they actually start happening (via the promises they registered in the ControlFlow).
Your checks, however, are all executing immediately. So, they are happening before any of the protractor calls accomplish anything.
Use then to make the waiting and dependencies explicit in your test. Like this:
annotatorPage.sendMessage("title5", "content64").then(function() {
exec("/tmp/check.sh");
});
or:
browser.wait(EC.elementToBeClickable(button), 5000).then(function() {
console.log('wait-for-clickable has completed'); // B
});
console.log('wait-for-clickable has been scheduled'); // A
See the Protractor Control Flow documentation and the Webdriver JS API doc.
Its not you. This is a crazy API to learn because it does not act at all like anyone familiar with normal synchronous programming would expect.

setInterval alternative

In my app I am polling the webserver for messages every second and displaying them in the frontend.
I use setInterval to achieve this. However as long as the user stays on that page the client keeps polling the server with requests even if there is no data. The server does give an indication when no more messages are being generated by setting a variable.
I thought of using this variable to clearInterval and stop the timer but that didn't work. What else can I use in this situation?
I am using jquery and django. Here is my code:
jquery:
var refresh = setInterval(
function ()
{
var toLoad = '/myMonitor'+' #content';
$('#content').load(toLoad).show();
}, 1000); // refresh every 1000 milliseconds
});
html:
div id=content is here
I can access the django variable for completion in html with each refresh. How can I set clearInterval if at all ?
Note: stack overflow does not let me put is &gt &lt so html is incomplete
Thanks
Updated 03/16/2010
I must be doing something wrong. But cannot figure it out. Here is my script with clearTimer and it does not work.
var timer = null;
$(function(){
if ("{{status}}" == "False")
{
clearInterval(timer);
}
else
{
timer = setInterval(
function(){
var toLoad = '/myMonitor'+' #content';
$('#content').load(toLoad).show();}
,1000); // refresh every 1000 milliseconds
}
});
status is a boolean set in "views.py" (Django).
Thanks a bunch.
A couple people have already answered with specific resources to your problem, so I thought I would provide a bit of background.
In short, you want the server to push data to the browser to avoid extensive client-side polling. There isn't a good cross-browser way to support server push, so a common solution that requires much less polling is to use the Comet (another cleaning product, like AJAX) long-poll technique.
With Comet, the browser makes a request, and the server keeps the connection open without responding until new data is available. When the server does has new data, it sends it over the open connection and the browser receives it right away. If the connection times out, the browser opens a new one. This lets the server send data to the client as soon as it becomes available. As others have indicated, this approach requires special configuration of your web server. You need a script on the server that checks for data at an interval and responds to the client if it exists.
Something to keep in mind with this approach is that most web servers are built to get a request from a client and respond as quickly as possible; they're not intended to be kept alive for a long period of time. With Comet you'll have far more open connections than normal, probably consuming more resources than you expect.
Your clearInterval check is only checking when the document ready event is fired.
If the code you gave is exactly what's in the browser, then you're comparing the string "{{status}}" to the string "False". I'd rather watch paint dry than wait for that to evaluate as true.
What if your requests taking longer than 1 second to complete? : You'll flood your server with requests.
function update () {
$('#content').show().load('/myMonitor'+' #content', function (response, status) {
if (!/* whatever you're trying to check*/) {
setTimeout(update, 1000);
};
});
};
$(document).ready(function () {
update();
});
Is closer than where you were, but you still need to work out how you're going to decide when you want to stop polling.

Categories