I'm trying to understand how to achieve better performance with node-express server.
When I send request with big JSON and limiting the node to 40MB with loadtest libary using 20 co-current requests I see my memory keep rising and eventually getting "FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory".
At the moment I'm using this basic synchronous code with 2 seconds of starving for each job.
If I'm understanding right the operation running like single core and making each job at the time and I'm wondering - why the GC cant keep up with the load?
In addition, I read that the requests payload in the queue are stored efficiency.
I saw a lot of answers to rise the memory but lets say i cant do it, what can i do to improve the GC/Memory performance?
this is the code:
const express = require('express')
var bodyParser = require('body-parser');
const app = express()
const port = 5000
app.use(bodyParser.json({ limit: "3mb" }));
app.post('/', (req, res) => {
const t = new Date()
while (t > new Date() - 2 * 1000) { }
res.send("");
})
app.listen(port, () => {
console.log(`Example app listening at http://localhost:${port}`)
})
This is how i run the node:
node --trace_gc --inspect --max-old-space-size=40 server-test.js
And finally, this is how i run the loadtest:
loadtest -p loadTestJson2.JSON.txt -T application/json -c 20 http://localhost:5000
JSON size is about 2-2.5MB
Holy cow! With respect, your synchronous "starving" code is a bad idea. It systematically starves your Javascript instance of the resources it needs to do its work correctly.
while (t > new Date() - 2 * 1000) { } /* grossly wasteful! */
This operation completely ties up your tiny nodejs instance for two seconds each time it answers a request. By the time your two-second spinloop finishes and your POST function returns other requests are queued up. It seems likely the GC doesn't get a chance to run often enough; it's usually a background operation.
Plus, 40MB is a very small, too-small, working set for handling a load test of multi-megabyte-sized POST payloads.
Try allowing your code to operate concurrently by using setTimeout for your delay rather than your spinloop. That may allow your GC to run correctly.
app.post('/', (req, res) => {
setTimeout ( function () {
res.send("")
}, 2000)
})
EDIT: Another possibility: you could try explicitly deleting your large req.body objects when you're done processing each request. (They are large because your POST payloads are large.)
Put delete req.body right at the end of your POST handler. I don't know if this will help, but it may. It drops a reference to the large object, and it may help GC stay ahead of your workflow.
Related
I have a proprietary math formula written in a javascript function that I need to make available for a website to use it without them actually having access to the code itself.
Is it possible?
The idea is to make the formula available online without people being able to read the code. I have no idea how to do it.
I read about private packages on npm, but it seems to restrict prople who can use and read the code. I need them to use it but not read it.
If the code is run on the client's machine in any way, it will be possible for any sufficient dedicated and persistent user to find it, eventually; all code that runs on a page can be found through the browser devtools.
The only way for true privacy for such a thing would be to not send the code that implements the formula to the client in the first place. Set up a server, if you don't already have one, and create a very simple API for it - one that takes the inputs for the formula as, say, a POST request, runs the formula that calculates the result on the server, and responds to the client with the result.
Use node.js to create an express server that listens for incoming requests and then send back the result to the client in the response
const express = require('express');
const app = express();
function proprietaryFormula(x, y) {
// the formula goes here
return x + y;
}
app.get('/formula', (req, res) => {
let x = req.query.x;
let y = req.query.y;
let result = proprietaryFormula(x, y);
res.send(result);
});
app.listen(3000, () => {
console.log('started listening on port 3000');
});
The website can call this API to access the formula's functionality, and the code for the formula is kept on the server and never exposed to the client-side.
I'm experimenting with node and it's child_process module.
My goal is to create server which will run on maximum of 3 processes (1 main and optionally 2 children).
I'm aware that code below may be incorrect, but it displays interesting results.
const app = require ("express")();
const {fork} = require("child_process")
const maxChildrenRuning = 2
let childrenRunning = 0
app.get("/isprime", (req, res) => {
if(childrenRunning+1 <= maxChildrenRuning) {
childrenRunning+=1;
console.log(childrenRunning)
const childProcess = fork('./isprime.js');
childProcess.send({"number": parseInt(req.query.number)})
childProcess.on("message", message => {
console.log(message)
res.send(message)
childrenRunning-=1;
})
}
})
function isPrime(number) {
...
}
app.listen(8000, ()=>console.log("Listening on 8000") )
I'm launching 3 requests with 5*10^9'ish numbers.
After 30 seconds I receive 2 responses with correct results.
CPU stops doing hard work and goes idle
Surprisingly after next 1 minute 30 seconds 1 thread starts to proceed, still pending, 3rd request and finishes after next 30 seconds with correct answer. Console log displayed below:
> node index.js
Listening on 8000
1
2
{ number: 5000000029, isPrime: true, time: 32471 }
{ number: 5000000039, isPrime: true, time: 32557 }
1
{ number: 5000000063, isPrime: true, time: 32251 }
Either express listens and checks pending requests once for a while or my browser sends actual requests every x time while pending. Can anybody explain what is happening here and why? How can I correctly achieve my goal?
The way your server code is written, if you receive a /isprime request and two child processes are already running, your request handler for /isprime does nothing. It never sends any response. You don't pass that first if test and then nothing happens afterwards. So, that request will just sit there with the client waiting for a response. Depending upon the client, it will probably eventually time out as a dead/inactive request and the client will shut it down.
Some clients (like browsers) may assume that something just got lost in the network and they may retry the request by sending it again. It would be my guess that this is what is happening in your case. The browser eventually times out and then resends the request. By the time it retries, there are less than two child processes running so it gets processed on the retry.
You could verify that the browser is retrying automatically by going to the network tab in the Chrome debugger and watching exactly what the browser sends to your server and watch that third request, see it timeout and see if it is the browser retrying the request.
Note, this code seems to be only partially implemented because you initially start two child processes, but you don't reuse those child processes. Once they finish and you decrement maxChildrenRuning, your code will then start another child process. Probably what you really want to do is to keep track of the two child processes you started and when one finishes, add it to an array of "available child processes" so when a new request comes in, you can just use an existing child process that is already started, but idle.
You also need to either queue incoming requests when all the child processes are full or you need to send some sort of error response to the http request. Never sending an http response to an incoming request is a poor design that just leads to great inefficiencies (connections hanging around much longer than needed that never actually accomplish anything).
I'm building a server side application using Nodejs and Express and for some reason i was thinking about how to limit the number of request by user in a fixed amount of time to prevent hackers from spamming and trying to break down the server.
I am a little concerned with people abusing/spamming the available services with a large number of requests.
so is there any idea about how to build an express middleware that allows me to control the number of request send by specific user based on his access_token
the questions are:
1) how to build this middleware and what is the best way to do this?
2) is there any module that can do the job?
3) is there any other solution or a structure that allows me to secure my server against this kind of attack?
All suggestions are welcome.
There's a bunch of existing modules out there, but this seems to be what you're looking for:
https://www.npmjs.com/package/tokenthrottle
In the Node community there is almost always a module to do all or part of what you're looking for. You almost never have to reinvent the wheel.
Just collect the request ips/whatever in a Map that keeps a counter running. When the counter hits a certain just show an error page:
const app = Express();
const ips = new Map;
const limit = 20;
app.use((req, res, next) => {
const count = ips.get(req.ip) || 0;
if(count < 20){
ips.set(req.ip, count + 1);
next();
} else {
res.end("spam filter activated, sorry :(");
}
});
// ... your endpoints
app.listen(80);
That will block certain ips until you restart the server. However you could also reset the Map at a certain interval:
setInterval(() => ips.clear(), 60 * 60 * 1000);
I use the following API in my program to detrmine free port and provide it to application to run
portscanner.findAPortNotInUse(3000, 65000, '127.0.0.1', function(error, port) {
console.log('AVAILABLE PORT AT: ' + port)
})
https://github.com/baalexander/node-portscanner
This free port are given to application for use and working OK.
The problem is that if I provide a free port to application A and the application is doesn't occupied it yet(sometimes it takes some time...) and there is coming other application B and request a free port so it give to APP B the port of app A
Which cause to problem...
is there any elegant way to solve it?
my application doesn't have state so it cannot save to which app get which port...
There is solution that we can randomize the range but this is not robust ...
In my application Im getting the URL of the app that I should provide the free port to run.
update
I cannot use some broker or someting else that will controll this outside I need to find some algorithm (maybe with some smart random ) that can help me to do it internally i.e. my program is like singleton and I need some trick how to give port between 50000 to 65000 that will reduce the amount of collision of port that was provided to the apps
update 2
I've decided to try something like the following what do you think ?
using lodash https://lodash.com/docs/4.17.2#random to determine ports between with loops that provide 3(or more if that make sense) numbers for ranges like following
portscanner.findAPortNotInUse([50001, 60000, 600010], '127.0.0.1', function(err, port) {
if(err) {
console.log("error!!!-> " +err);
}else {
console.log('Port Not in Use ' + port);
}
//using that in a loop
var aa = _.random(50000, 65000);
Then If I got false in the port i.e. all 3 port are occupied ,run this process again for 3 other random number.comments suggestion are welcomed!!!
I try to find some way to avoid collision as much as possible...
I would simply accept the fact that things can go wrong in a distributed system and retry the operation (i.e., getting a free port) if it failed for whatever reason on the first attempt.
Luckily, there are lots of npm modules out there that do that already for you, e.g. retry.
Using this module you can retry an asynchronous operation until it succeeds, and configure waiting strategies, and how many times it should be retried maximally, and so on…
To provide a code example, it basically comes down to something such as:
const operation = retry.operation();
operation.attempt(currentAttempt => {
findAnUnusedPortAndUseIt(err => {
if (operation.retry(err)) {
return;
}
callback(err ? operation.mainError() : null);
});
});
The benefits of this solution are:
Works without locking, i.e. it is efficient and makes low usage of resources if everything is fine.
Works without a central broker or something like that.
Works for distributed systems of any size.
Uses a pattern that you can re-use in distributed systems for all kinds of problems.
Uses a battle-tested and solid npm module instead of handwriting all these things.
Does not require you to change your code in a major way, instead it is just adding a few lines.
Hope this helps :-)
If your applications can open ports with option like SO_REUSEADDR, but operation system keeps ports in the list in TIME_WAIT state, you can bind/open port you want to return with SO_REUSEADDR, instantly close it and give it back to application. So for TIME_WAIT period (depending on operation system it can be 30 seconds, and actual time should be decided/set up or found by experiment/administration) port list will show this port as occupied.
If your port finder does not give port numbers for ports in TIME_WAIT state, problem solved by relatively expensive open/close socket operation.
I'd advise you look for a way to retain state. Even temporary state, in memory, is better than nothing at all. This way you could at least avoid giving out ports you've already given out. Because those are very likely not free anymore. (This would be as simple as saving them and regenerating a random port if you notice you found a random port you've already given out). If you don't want collisions, build your module to have state so it can avoid them. If you don't want to do that, you'll have to accept there are going to be collisions sometimes when there don't need to be.
If the URLs you get are random, the best you can do is guess randomly. If you can derive some property in which the URLs uniquely and consistently differ, you could design something around that.
Code example:
function getUnusedPort(url) {
// range is [0, 65001). (inclusive zero, exclusive 65001)
const guessPort = () => Math.floor(Math.random() * 15001) + 50000;
let randomPort = guessPort();
while (checkPortInUse(randomPort)) {
randomPort = guessPort();
}
return randomPort;
}
Notes:
checkPortInUse will probably be asynchronous so you'll have to
accommodate for that.
You said 'between 50000 and 65000'. This is from 50000 up to and including 65000.
When managing multiple applications or multiple servers, where one must be right the first time (without retrying), you need a single source of truth. Applications on the same machine can talk to a database, a broker server or even a file, so long as the resource is "lockable". (Servers work in similar ways, though not with local files).
So your flow would be something like:
App A sends request to service to request lock.
When lock is confirmed, start port scanner
When port is used, release lock.
Again, this could be a "PortService" you write that hands out unused ports, or a simple lock in some shared resource so two things are getting the same port at the same time.
Hopefully you can find something suitable to work for your apps.
As you want to find an port that is not in use in your application, you could do is run following command:
netstat -tupln | awk '{print $4}' | cut -d ':' -f2
so in your application you will use this like:
const exec = require('child_process').exec;
exec('netstat -tupln | awk '{print $4}' | cut -d ':' -f2', (error, stdout, stderr) => {
if (error) {
console.error(`exec error: ${error}`);
return;
}
var listPorts = stdout.split(/\n/)
console.log(listPorts); // list of all ports already in use
var aa = _.random(50000, 65000); // generate random port
var isFree = (listPorts.indexOf(aa)===-1) ? true : false;
if(isFree){
//start your appliation
}else{
// restart the search, write this in a function and start search again
}
});
this should give you list of all ports that are in use,so use any port except ones in the listPorts.
So I'm using cluster to run some chat bots for some friends. And I use express to run a single page for each bot. However, cluster doesn't like that. My code (abridged) is something akin to the following:
var configs = {
bot1:"bot1",
bot2:"bot2"
};
if (cluster.isMaster) {
for (var bot in configs) {
cluster.fork( { file:configs[bot] } );
}
} else {
var file = process.env["file"];
var page = "/" + process.env["file"];
var express = require("express");
var web = express();
web.listen(3000);
web.get(page,function(req,res){
res.send( file );
});
}
And while this works good in theory, I'm only getting one bot with an output.
If I go to example.com:3000/bot2 I get bot2 as an output.
If I go to example.com:3000/bot1, I get Cannot GET /bot1.
It seems random which one will work, but never both of them.
Apologies if it's something stupid simple, or if it can't be done. I just find cluster more effective at rebooting itself on exits and generally more stable than child_process. (sometimes, when I use child_process, I'll end up with multiple instances of the same bot, which is tacky.)
You seem to have misunderstood how cluster works. It will not help for a situation like this and is primarily designed as a way to start multiple processes listening to the same port for HTTP connections.
What you now have is:
P1 => Master process which starts P2 and P3.
P2 => Listening on port 3000 handling /bot1
P3 => Listening on port 3000 handling /bot2
When a request comes in on port 3000, Node has no idea what the URL would be. It just knows that both P2 and P3 are set up to handle requests on that port, so it will randomly choose one to handle the request.
If you send a request to /bot1 and Node randomly assigns it to be handled by P3, then you will get the error you were seeing Cannot GET /bot1, because P3 has no idea what that path means. The same is true the other way around.
Perhaps what you really want is some number of bot processes and a single process that listens on port 3000 and then forwards the messages to the bot processes using worker.send() and such.