I have a node.js process that uses a large number of client requests to pull information from a website. I am using the request package (https://www.npmjs.com/package/request) since, as it says: "It supports HTTPS and follows redirects by default."
My problem is that after a certain period of time, the requests begin to hang. I haven't been able to determine if this is because the server is returning an infinite data stream, or if something else is going on. I've set the timeout, but after some number of successful requests, some of them eventually get stuck and never complete.
var options = { url: 'some url', timeout: 60000 };
request(options, function (err, response, body) {
// process
});
My questions are, can I shut down a connection after a certain amount of data is received using this library, and can I stop the request from hanging? Do I need to use the http/https libraries and handle the redirects and protocol switching myself in order the get the kind of control I need? If I do, is there a standardized practice for that?
Edit: Also, if I stop the process and restart it, they pick right back up and start working, so I don't think it is related to the server or the machine the code is running on.
Note that in request(options, callback), the callback will be fired when request is completed and there is no way to break the request.
You should listen on data event instead:
var request = require('request')
var stream = request(options);
var len = 0
stream.on('data', function(data) {
// TODO process your data here
// break stream if len > 1000
len += Buffer.byteLength(data)
if (len > 1000) {
stream.abort()
}
})
Related
I have a bulk create participants function that using Promise.allSettled to send 100 axios POST request. The backend is Express and frontend is React. That request is call a single add new participant rest API. I have set the backend timeout to 15s using connect-timeout. And frontend is 10s timeout.
My issue is when I click the bulk add button, the bulk create is triggered and that Promise.allSettled concurrent starts. However, I cannot send a new request before all concurrent request done. Because I have set up a timeout on the frontend, the new request will be cancelled.
Is there a way, I can still make the concurrent request, but that request does not stop other new requests?
This is the frontend code, createParticipant is the API request.
const PromiseArr = []
for (let i = 0; i < totalNumber; i++) {
const participant = participantList[i]
const participantNewDetail = {
firstName: participant.firstName,
lastName: participant.lastName,
email: participant.email,
}
PromiseArr.push(
createParticipant(participantNewDetail)
.then((createParticipantResult) => {
processedTask++
processMessage = `Processing adding participant`
dispatch({ type: ACTIVATE_PROCESS_PROCESSING, payload: { processedTask, processMessage } })
})
.catch((error) => {
processedTask++
processMessage = `Processing adding participant`
dispatch({ type: ACTIVATE_PROCESS_PROCESSING, payload: { processedTask, processMessage } })
throw new Error(
JSON.stringify({
status: "failed",
value: error.data.message ? error.data.message : error,
})
)
})
)
}
const addParticipantResults = await Promise.allSettled(PromiseArr)
PromiseArr is the Promise array with the length 100.
Is it possible I can splite this big request into small pieces promise array and send to the backend and within each request gap, it's possible I can send another new request like retriveUserDetail?
If you're sending 100 requests at a time to your server, that's just going to take awhile for the server to process. It would be best be to find a way to combine them all into one request or into a very small number of requests. Some server APIs have efficient ways of doing multiple queries in one request.
If you can't do that, then you probably should be sending them 5-10 at a time max so the server isn't being asked to handle sooo many simultaneous requests which causes your additional request to go to the end of the line and take too long to process. That will allow you to send other things and get them processed while you're chunking away on the 100 without waiting for all of them to finish.
If this is being done from a browser, you also have some browser safeguard limitations to deal with where the browser refuses to send more than N requests to the same host at a time. So if you send more than that, it queues them up and holds onto them until some prior requests have completed. This keeps one client from massively overwhelming the server, but also creates this long line of requests that any new request has to go to the end of. The way to deal with that is not never send more than a small number of requests to the same host and then that queue/line will be short when you want to send a new request.
You can look at these snippets of code that let you process an array of data N-at-a-time rather than all at once. Each of these has slightly different control options so you can decide which one fits your problem the best.
mapConcurrent() - Process an array with no more than N requests in flight at the same time
pMap() - Similar to mapConcurrent with more argument checking
rateLimitMap() - Process max of N requestsPerSecond
runN() - Allows you to continue processing upon error
These all replace both Promise.all() and whatever code you had for iterating your data, launching all the requests and collecting the promises into an array. The functions take an input array of data, a function to call that gets passed an item of the data and should return a promise that resolves to the result of that request and they return a promise that resolves to an array of data in the original array order (same return value as Promise.all()).
I am running a cron job every 5 mins to get data from 3rd party API, It can be N number of request at a time from NodeJS application. Below are the details and code samples:
1> Running cron Job every 5 mins:
const cron = require('node-cron');
const request = require('request');
const otherServices= require('./services/otherServices');
cron.schedule("0 */5 * * * *", function () {
initiateScheduler();
});
2> Get the list of elements for which I want to initiate the request. Can receive N number of elements. I have called request function (getSingleElementUpdate()) in the forEach loop
var initiateScheduler = function () {
//Database call to get elements list
otherServices.moduleName()
.then((arrayList) => {
arrayList.forEach(function (singleElement, index) {
getSingleElementUpdate(singleElement, 1);
}, this);
})
.catch((err) => {
console.log(err);
})
}
3> Start initiating the request for singleElement. Please note I don't need any callback if I received a successful (200) response from the request. I just have to update my database entries on success.
var getSingleElementUpdate = function (singleElement, count) {
var bodyReq = {
"id": singleElement.elem_id
}
var options = {
method: 'POST',
url: 'http://example.url.com',
body: bodyReq,
dataType: 'json',
json: true,
crossDomain: true
};
request(options, function (error, response, body) {
if (error) {
if (count < 3) {
count = count + 1;
initiateScheduler(singleElement, count)
}
} else{
//Request Success
//In this: No callback required
// Just need to update database entries on successful response
}
});
}
I have already checked this:
request-promise: But, I don't need any callback after a successful request. So, I didn't find any advantage of implementing this in my code. Let me know if you have any positive point to add this.
I need your help with the following things:
I have checked the performance when I received 10 elements in arrayList of step 2. Now, the problem is I don't have any clear vision about what will happen when I start receiving 100 and 1000 of elements in step 2. So, I need your help in determining whether I need to update my code for that scenario or not or is there anything I missed out which degrade the performance. Also, How many maximum requests I can make at a time. Any help from you is appreciable.
Thanks!
AFAIK there is no hard limit on a number of request. However, there are (at least) two things to consider: your hardware limits (memory/CPU) and remote server latency (is it able to respond to all requests in 5 mins before the next batch). Without knowing the context it's also impossible to predict what scaling mechanism you might need.
The question is actually more about app architecture and not about some specific piece of code, so you might want to try softwareengineering instead of SO.
I'm building a decentralized application (I don't control the servers, only the client), and want to add some sanity checks and preventative measures to stop bad people from doing malicious things. This involves (among many, many other things), preventing DoS attempts on the client by the use of arbitrary payload data being sent from the servers.
The question is this: How can the client limit the maximum size of data received from a server over JQuery AJAX? If I'm expecting to fetch a few bytes of JSON, and am instead greeted by a 30MB video file when I make the AJAX request, how can I stop the request and throw an error after I've received the first 16 KB?
While I recognize that the nature of my undertaking is unique, any feedback is wwlcome.
As #Barmar pointed out in the comments, this was a simple case of checking the "onprogress" event of the download and terminating it when it exceeded my desired max size.
Here is the code for any interested parties:
var xhr = $.ajax({
url: "your-url",
success: () => {
// ...
},
xhrFields: {
onprogress: function(progress) {
if (progress.loaded > config.MAX_HASH_DESCRIPTOR_SIZE) {
// stop any unreasonably long malicious payload downloads.
xhr.abort()
}
}
}
})
I send JSON requests one by one to the nodejs server. After 6th request, server can't reply to the client immediately and then it takes a little while(15 seconds or little bit more and send back to me answer 200 ok) It occurs a writing json value into MongoDB and time is important option for me in terms with REST call. How can I find the error in this case? (which tool or script code can help me?) My server side code is like that
var controlPathDatabaseSave = "/save";
app.use('/', function(req, res) {
console.log("req body app use", req.body);
var str= req.path;
if(str.localeCompare(controlPathDatabaseSave) == 0)
{
console.log("controlPathDatabaseSave");
mongoDbHandleSave(req.body);
res.setHeader('Content-Type', 'application/json');
res.write('Message taken: \n');
res.write('Everything all right with database saving');
res.send("OK");
console.log("response body", res.body);
}
});
My client side code as below:
function saveDatabaseData()
{
console.log("saveDatabaseData");
var oReq = new XMLHttpRequest();
oReq.open("POST", "http://192.168.80.143:2800/save", true);
oReq.setRequestHeader("Content-type", "application/json;charset=UTF-8");
oReq.onreadystatechange = function() {//Call a function when the state changes.
if(oReq.readyState == 4 && oReq.status == 200) {
console.log("http responseText", oReq.responseText);
}
}
oReq.send(JSON.stringify({links: links, nodes: nodes}));
}
--Mongodb save code
function mongoDbHandleSave(reqParam){
//Connect to the db
MongoClient.connect(MongoDBURL, function(err, db)
{
if(!err)
{
console.log("We are connected in accordance with saving");
} else
{
return console.dir(err);
}
/*
db.createCollection('user', {strict:true}, function(err, collection) {
if(err)
return console.dir(err);
});
*/
var collection = db.collection('user');
//when saving into database only use req.body. Skip JSON.stringify() function
var doc = reqParam;
collection.update(doc, doc, {upsert:true});
});
}
You can see my REST call in google chrome developer editor. (First six call has 200 ok. Last one is in pending state)
--Client output
--Server output
Thanks in advance,
Since it looks like these are Ajax requests from a browser, each browser has a limit on the number of simultaneous connections it will allow to the same host. Browsers have varied that setting over time, but it is likely in the 4-6 range. So, if you are trying to run 6 simultaneous ajax calls to the same host, then you may be running into that limit. What the browser does is hold off on sending the latest ones until the first ones finish (thus avoiding sending too many at once).
The general idea here is to protect servers from getting beat up too much by one single client and thus allow the load to be shared across many clients more fairly. Of course, if your server has nothing else to do, it doesn't really need protecting from a few more connections, but this isn't an interactive system, it's just hard-wired to a limit.
If there are any other requests in process (loading images or scripts or CSS stylesheets) to the same origin, those will count to the limit too.
If you run this in Chrome and you open the network tab of the debugger, you could actually see on the timline exactly when a given request was sent and when its response was received. This should show you immediately whether the later requests are being held up at the browser or at the server.
Here's an article on the topic: Maximum concurrent connections to the same domain for browsers.
Also, keep in mind that, depending upon what your requests do on the server and how the server is structured, there may be a maximum number of server requests that can efficiently processed at once. For example, if you had a blocking, threaded server that was configured with one thread for each of four CPUs, then once the server has four requests going at once, it may have to queue the fifth request until the first one is done causing it to be delayed more than the others.
I'm new to Node and am having some difficulties with getting the Request library to return an accurate response time.
I have read the thread at nodejs request library, get the response time and can see that the request library should be able to return an "elapsed time" for the request.
I am using it in the following way :
request.get({
url : 'http://example.com',
time : true
},function(err, response){
console.log('Request time in ms', response.elapsedTime);
});
The response.elapsedTime result is in the region of 500-800ms, however I can see the request is actually taking closer to 5000ms.
I am testing this against an uncached nginx page which takes roughly 5 seconds to render the page when profiling via a browser (Chrome).
Here is an example of the timing within Chrome (although the server is under load hence the 10s)
Chrome Profiling example
It looks to me like this isn't actually timing the full start to finish of the request but it "timing" something else. It might be the time taken to download the page once the server starts streaming it.
If this is the case, how can I get the actual start to finish time that this request has taken ? The time I need is from making the request to receiving the entire body and headers.
I am running the request like this with listofURLs being an array of urls to request:
for (var i = 0; i < listofURLs.length; i++) {
collectSingleURL(listofURLs[i].url.toString(),
function (rData) {
console.log(rData['url']+" - "+rData['responseTime']);
});
}
function collectSingleURL(urlToCall, cb) {
var https = require('https');
var http = require('http');
https.globalAgent.maxSockets = 5;
http.globalAgent.maxSockets = 5;
var request = require('request');
var start = Date.now();
// Make the request
request.get({
"url": urlToCall,
"time": true,
headers: {"Connection": "keep-alive"}
}, function (error, response, body) {
//Check for error
if (error) {
var result = {
"errorDetected": "Yes",
"errorMsg": error,
"url": urlToCall,
"timeDate": response.headers['date']
};
//callback(error);
console.log('Error in collectSingleURL:', error);
}
// All Good - pass the relevant data back to the callback
var result = {
"url": urlToCall,
"timeDate": response.headers['date'],
"responseCode": response.statusCode,
"responseMessage": response.statusMessage,
"cacheStatus": response.headers['x-magento-cache-debug'],
"fullHeaders": response.headers,
"bodyHTML": body,
"responseTime" : Date.now() - start
};
cb(result);
//console.log (cb);
});
}
You are missing a key point - it take 5 seconds to render, not to just download the page.
The request module of node is not a full browser, it's a simple HTTP request, so when you for example request www.stackoverflow.com, it will only load the basic HTML returned by the page, it will not load the JS files, CSS file, images etc.
The browser on the otherhand, will load all of that after the basic HTML of the page is loaded (some parts will load before the page has finished loading, together with the page).
Take a look on the network profiling below of stackoverflow - the render finishes at ~1.6 seconds, but the basic HTML page (the upper bar) has finished loading around 0.5 second. So if you use request to fetch a web page, it actually only loading the HTML, meaning - "the upper bar".
Just time it yourself:
var start = Date.now()
request.get({
url : 'http://example.com'
}, function (err, response) {
console.log('Request time in ms', Date.now() - start);
});