How to implement time to live with socket.io - javascript

I believe I have surfed the web enough but still cant get any resources on the topic. How can I implement the 'time-to-live' function with Socket.io?
I am using Node.js with express.
The above mentioned time-to-live function is intended to work as described below:
If I specify timeToLive = 10; secs, clients that connect in less than 10 sec after the message is emitted should still get the message.
This function is available on some of the cloud messaging libraries like GCM.
Any online resource will appreciated.

There is no such functionality in socket.io. You will have to implement it yourself. Consider using an array of objects that holds messages and Date.now() of that message and loop it when a user connects. Delete any messages that are expired and emit the ones that are still valid.
Minimum code could be this but due to heavy use of splice it could be slow.
var messages = [];
var TTL = 10;
io.on('connection', function (socket) {
for(i = 0; i < messages.length; i++)
if(messages[i].time + TTL < Date.now())
messages.splice(i, 1);
socket.emit('data', messages);
});
Consider using redis or any other high performance database to also synchronize between multiple servers if you require that:

Related

Better way to schedule cron jobs based on job orders from php script

So I wrote simple video creator script in NodeJS.
It's running on scheduled cron job.
I have a panel written in PHP, user enter details and clicks "Submit new Video Job" Button.
This new job is saving to DB with details, jobId and status="waiting" data.
PHP API is responsible for returning 1 status at a time, checks status="waiting" limits query to 1 then returns data with jobID when asked
Video Creation Script requests every x seconds to that API asks for new job is available.
It has 5 tasks.
available=true.
Check if new job order available (With GET Request in every 20 seconds), if has new job;
available=false
Get details (name, picture url, etc.)
Create video with details.
Upload Video to FTP
Post data to API to update details. And Mark that job as "done"
available=true;
These tasks are async so everytask has to be wait previous task to be done.
Right now, get or post requesting api if new job available in every 20 seconds (Time doesnt mattter) seems bad way to me.
So any way / package / system to accomplish this behavior?
Code Example:
const cron = require('node-cron');
let available=true;
var scheduler = cron.schedule(
'*/20 * * * * *',
() => {
if (available) {
makevideo();
}
},
{
scheduled: false,
timezone: 'Europe/Istanbul',
}
);
let makevideo = async () => {
available = false;
let {data} = await axios.get(
'https://api/checkJob'
);
if (data == 0) {
console.log('No Job');
available = true;
} else {
let jobid = data.id;
await createvideo();
await sendToFTP();
await axios.post('https://api/saveJob', {
id: jobid,
videoPath: 'somevideopath',
});
available = true;
}
};
scheduler.start();
RabbitMQ is also a good queueing system.
Why ?
It's really well documented (examples for many languages including javascript & php).
Tutorials are simple while they're exposing real use cases.
It has a REST API.
It ships with a monitoring UI.
How to use it to solve your problem ?
On the job producer side : send messages (jobs) to a queue by following tutorial 1
To consume jobs with your nodejs process : see RabbitMQ's tutorial 2
Other suggestions :
Use a prefetch value of 1 and publisher confirms so you can ensure that an instance of consumer will not receive messages while there's a job running.
Roadmap for a quick prototype : tutorial 1... then tutorial 2 x). After sending and receiving messages you can explore the options you can set on queues and messages
Nodejs package : http://www.squaremobius.net/amqp.node/
PHP package : https://github.com/php-amqplib/php-amqplib
While it is possible to use the database as a queue, it is commonly known as an anti-pattern (next to using the database for logging), and as you are looking for:
So any way / package / system to accomplish this behavior?
I use the free-form of your question thanks to the placed bounty to suggest: Beanstalk.
Beanstalk is a simple, fast work queue.
Its interface is generic, but was originally designed for reducing the latency of page views in high-volume web applications by running time-consuming tasks asynchronously.
It has client libraries in the languages you mention in your question (and many more), is easy to develop with and to run in production.
What you are doing in a very standard system design paradigm, done with Apache Kafka or any queue based implementation(ex, RabbitMQ). You can check out about Kafka/rabbitmq but basically Not going into details:
There is a central Queue.
When user submits a job the job gets added to the Queue.
The video processor runs indefinitely subscribing to the queue.
You can go ahead and look up : https://www.gentlydownthe.stream/ and you will recognize the similarities on what you are doing.
Here you don't need to poll yourself, you need to subscribe to an event and the other things will be managed by the respective queues.

Elastic Beanstalk environment degrading due to high CPU utilisation

I'm trying to sync some google ads account into my system.
This process pull the data from the google ads account from 2017-01-01 to to last date.
Query for a single date, process it in a for loop to make a proper object
inserting into database.
Also tried with load balancers. But degrading occurs for one instance.
code
querying google ads data
var difference = dateDiffInDays(new Date(2017, 0, 1), new Date());
// getting last N days
days = LastNDays(difference + 1)
// making array of date ranges
var result = days.chunk(20);
// querying google ads data
for (var value of result) {
const list = await customer.report({
entity: 'keyword_view',
attributes: adAttributes,
segments: ['segments.date'],
from_date: value[0],
to_date: value[value.length - 1]
})
await saveKeywordsData(list, value[value.length - 1])
}
I think the problem is the following function.
Becasue the output of above query is more than 5000 or 6000 (for a single date. Here calling date fro 2017-01-01).
So when handling more than 5000 data continuously for some time lead to high cpu utilisation.
function saveKeywordsData
async function saveKeywordsData(list, cronUntill) {
let metricsArray = []
for await (let element of list) {
let metrics = element.metrics
metrics.criterion_id = element.ad_group_criterion.criterion_id
metrics.keyword = element.ad_group_criterion.keyword.text
metrics.accId = accId
metrics.agencyId = agencyId
metrics.accountMId = accountMId
metrics.date = element.segments.date
metrics.dateTime = new Date(element.segments.date)
metrics.createdAt = new Date()
metricsArray.push(metrics);
}
// metricsArray length may be more than 5000 for each loop
await chunkInsertion(metricsArray, 'keywords')
return 1;
};
function chunkInsertion
async function chunkInsertion(metricsArray, type) {
let model
if (type == 'ads')
model = app.models.googleAdsInsights
else
model = app.models.googleAdsAuctionInsights
var data = metricsArray.chunk(50);
for (let item of data) {
await model.create(item)
}
return 1
}
Based on the comment.
I can provide only generic description on how a worker environment could be used. Exact implementation details are case specific.
EB worker environments are used for executing long running tasks. This could be a good solution for your use cases, as you would decouple your web environment from those heavy processing jobs that elevate your CPU.
In this scenario your web environment would be responsible for initiating the job and collecting the results. It would not be performing the actual processing, which would be handled by the dedicated worker environment.
The worker environment exposes a SQS queue. This is different from a web environment which gives you url to your website. From worker env you get only the SQS queue endpoint. The endpoint is used to submit jobs to the worker. Your worker application would receive the jobs from the queue and perform the query independently from the web environment.
Handling of the results can be done in many ways. One way would be for worker to write the results to, e.g. a DynamoDB. The web environment in that case would query the database for the result from time to time to check when they are available. The other way is for your web app to expose a dedicated url endpoint which would be called by the worker to signal the completion of a job.
This is how you generally decouple the web environment from long-running cpu or memory intensive tasks. But this would require changing how your application works and development of worker application to be deployed on EB worker environments.

Limiting the number of request by user in any given hour [Nodejs]

I'm building a server side application using Nodejs and Express and for some reason i was thinking about how to limit the number of request by user in a fixed amount of time to prevent hackers from spamming and trying to break down the server.
I am a little concerned with people abusing/spamming the available services with a large number of requests.
so is there any idea about how to build an express middleware that allows me to control the number of request send by specific user based on his access_token
the questions are:
1) how to build this middleware and what is the best way to do this?
2) is there any module that can do the job?
3) is there any other solution or a structure that allows me to secure my server against this kind of attack?
All suggestions are welcome.
There's a bunch of existing modules out there, but this seems to be what you're looking for:
https://www.npmjs.com/package/tokenthrottle
In the Node community there is almost always a module to do all or part of what you're looking for. You almost never have to reinvent the wheel.
Just collect the request ips/whatever in a Map that keeps a counter running. When the counter hits a certain just show an error page:
const app = Express();
const ips = new Map;
const limit = 20;
app.use((req, res, next) => {
const count = ips.get(req.ip) || 0;
if(count < 20){
ips.set(req.ip, count + 1);
next();
} else {
res.end("spam filter activated, sorry :(");
}
});
// ... your endpoints
app.listen(80);
That will block certain ips until you restart the server. However you could also reset the Map at a certain interval:
setInterval(() => ips.clear(), 60 * 60 * 1000);

Count number of online users with nodeJS and Socket.io without duplicates

I'm having trouble displaying the correct amount of online users. A question similar to this have shown me that I can count the users this way:
var count = 0
socket.on('connection', function(client) {
count++;
client.broadcast({count:count})
client.on('disconnect', function(){
count--;
})
})
The issue I'm constantly running into with that method is that when a user happens to reload the page too quickly, the counter pulls in too much it can throw out.
As you can see, on the right side of the image, a user spammed the reload and it caught more users online than there actually is. (There was only one user on the server at this time)
My question is is there a better or more reliable way to count the exact amount users online without the extra 'virtual users', without using the users++/users-- method?
If they're logging in as a user, then you should authenticate them to the socket. Use that authentication to see if they already have a session, and disconnect them decrementing the count, before you increment it again with the new session.
An example below. The clients objects stores the connected clients, with values being the sockets they're connected to.
var clients = {};
socket.on('connection', function(client) {
//Authenticate the client (Using query string parameters, auth tokens, etc...), and return the userID if the user.
var userId = authenticate(client);
if ( !userId ) {
//Bad authentication, disconnect them
client.disconnect();
return;
}
if (clients[userId]) {
//They already have a session, disconnect
clients[userId].disconnect();
}
//Set session here
clients[userId] = client;
client.broadcast({count: Object.keys(clients).length})
client.on('disconnect', function(){
delete clients[userId];
})
})
Could do this pretty cleanly with the Observable pattern (using RxJS v5 here):
const { Observable } = require('rxjs')
const connections = Observable.fromEvent(socket, 'connection').mapTo(1)
const disconnections = Observable.fromEvent(socket, 'disconnect').mapTo(-1)
// emit 1 for connection, -1 for disconnection
Observable.merge(connections, disconnections)
.scan((total, change) => total + change, 0) // emit total
.subscribe(count => client.broadcast({ count }))
You can use Server Sent Events for this purpose really.
Take a look at https://javascript.info/server-sent-events

Find free port not in use for apps - find some algorithm

I use the following API in my program to detrmine free port and provide it to application to run
portscanner.findAPortNotInUse(3000, 65000, '127.0.0.1', function(error, port) {
console.log('AVAILABLE PORT AT: ' + port)
})
https://github.com/baalexander/node-portscanner
This free port are given to application for use and working OK.
The problem is that if I provide a free port to application A and the application is doesn't occupied it yet(sometimes it takes some time...) and there is coming other application B and request a free port so it give to APP B the port of app A
Which cause to problem...
is there any elegant way to solve it?
my application doesn't have state so it cannot save to which app get which port...
There is solution that we can randomize the range but this is not robust ...
In my application Im getting the URL of the app that I should provide the free port to run.
update
I cannot use some broker or someting else that will controll this outside I need to find some algorithm (maybe with some smart random ) that can help me to do it internally i.e. my program is like singleton and I need some trick how to give port between 50000 to 65000 that will reduce the amount of collision of port that was provided to the apps
update 2
I've decided to try something like the following what do you think ?
using lodash https://lodash.com/docs/4.17.2#random to determine ports between with loops that provide 3(or more if that make sense) numbers for ranges like following
portscanner.findAPortNotInUse([50001, 60000, 600010], '127.0.0.1', function(err, port) {
if(err) {
console.log("error!!!-> " +err);
}else {
console.log('Port Not in Use ' + port);
}
//using that in a loop
var aa = _.random(50000, 65000);
Then If I got false in the port i.e. all 3 port are occupied ,run this process again for 3 other random number.comments suggestion are welcomed!!!
I try to find some way to avoid collision as much as possible...
I would simply accept the fact that things can go wrong in a distributed system and retry the operation (i.e., getting a free port) if it failed for whatever reason on the first attempt.
Luckily, there are lots of npm modules out there that do that already for you, e.g. retry.
Using this module you can retry an asynchronous operation until it succeeds, and configure waiting strategies, and how many times it should be retried maximally, and so on…
To provide a code example, it basically comes down to something such as:
const operation = retry.operation();
operation.attempt(currentAttempt => {
findAnUnusedPortAndUseIt(err => {
if (operation.retry(err)) {
return;
}
callback(err ? operation.mainError() : null);
});
});
The benefits of this solution are:
Works without locking, i.e. it is efficient and makes low usage of resources if everything is fine.
Works without a central broker or something like that.
Works for distributed systems of any size.
Uses a pattern that you can re-use in distributed systems for all kinds of problems.
Uses a battle-tested and solid npm module instead of handwriting all these things.
Does not require you to change your code in a major way, instead it is just adding a few lines.
Hope this helps :-)
If your applications can open ports with option like SO_REUSEADDR, but operation system keeps ports in the list in TIME_WAIT state, you can bind/open port you want to return with SO_REUSEADDR, instantly close it and give it back to application. So for TIME_WAIT period (depending on operation system it can be 30 seconds, and actual time should be decided/set up or found by experiment/administration) port list will show this port as occupied.
If your port finder does not give port numbers for ports in TIME_WAIT state, problem solved by relatively expensive open/close socket operation.
I'd advise you look for a way to retain state. Even temporary state, in memory, is better than nothing at all. This way you could at least avoid giving out ports you've already given out. Because those are very likely not free anymore. (This would be as simple as saving them and regenerating a random port if you notice you found a random port you've already given out). If you don't want collisions, build your module to have state so it can avoid them. If you don't want to do that, you'll have to accept there are going to be collisions sometimes when there don't need to be.
If the URLs you get are random, the best you can do is guess randomly. If you can derive some property in which the URLs uniquely and consistently differ, you could design something around that.
Code example:
function getUnusedPort(url) {
// range is [0, 65001). (inclusive zero, exclusive 65001)
const guessPort = () => Math.floor(Math.random() * 15001) + 50000;
let randomPort = guessPort();
while (checkPortInUse(randomPort)) {
randomPort = guessPort();
}
return randomPort;
}
Notes:
checkPortInUse will probably be asynchronous so you'll have to
accommodate for that.
You said 'between 50000 and 65000'. This is from 50000 up to and including 65000.
When managing multiple applications or multiple servers, where one must be right the first time (without retrying), you need a single source of truth. Applications on the same machine can talk to a database, a broker server or even a file, so long as the resource is "lockable". (Servers work in similar ways, though not with local files).
So your flow would be something like:
App A sends request to service to request lock.
When lock is confirmed, start port scanner
When port is used, release lock.
Again, this could be a "PortService" you write that hands out unused ports, or a simple lock in some shared resource so two things are getting the same port at the same time.
Hopefully you can find something suitable to work for your apps.
As you want to find an port that is not in use in your application, you could do is run following command:
netstat -tupln | awk '{print $4}' | cut -d ':' -f2
so in your application you will use this like:
const exec = require('child_process').exec;
exec('netstat -tupln | awk '{print $4}' | cut -d ':' -f2', (error, stdout, stderr) => {
if (error) {
console.error(`exec error: ${error}`);
return;
}
var listPorts = stdout.split(/\n/)
console.log(listPorts); // list of all ports already in use
var aa = _.random(50000, 65000); // generate random port
var isFree = (listPorts.indexOf(aa)===-1) ? true : false;
if(isFree){
//start your appliation
}else{
// restart the search, write this in a function and start search again
}
});
this should give you list of all ports that are in use,so use any port except ones in the listPorts.

Categories