Nodejs manage different threads

Nodejs manage different threads - javascript

I'm a little bit newbie with Nodejs.
I'm working in a Nodejs - express solution.
I want to send and e-mail when some information is added to a MSSSQL database.
This is working well for me. The problem is that I want to check every five minutes that this information added to the database is modified or not, and if not, send another e-mail.
The call to add information to the db is this route:
router.post('/postlinevalidation', function(req, res) {
//insert function into mssql
silkcartCtrl.sendMail(req, res);
});
The controller part for sending the e-mail:
exports.sendMail = function(req, res) {
var emails = "";
fs.readFile('./config/email.conf', 'utf8', function (err,data) {
if (err) {
return logger.error(err);
}
emails = data;
});
var minutes = 5, the_interval = minutes * 60 * 1000;
var refreshId = setInterval(function() {
logger.info("I am doing my 5 minutes check FL_PENDIENTE");
var request = new sql.Request(req.dbsqlserver);
var sqlpendinglinesvalidation = "SELECT [FK_IDCHECK],[FK_IDPEDIDO],[BK_IDPROVEEDOR],[DE_PROVEEDOR]"+
",[FK_FAMILIA],[BK_FAMILIA],[FK_SUBFAMILIA],[BK_SUBFAMILIA],[FK_ARTICULO]"+
",[BK_ARTICULO],[FL_VALIDAR],[DT_FECHA],[FL_PENDIENTE],[DES_CHECK],[QNT_PROPUESTA],[FECHA]"+
"FROM TABLE"+
" WHERE [FL_PENDIENTE] = 1";
request.query(sqlpendinglinesvalidation, function (err, recordset) {
if (recordset.length > 0) {
var transporter = nodemailer.createTransport('smtps://user%40gmail.com:pwd#smtp.gmail.com');
var mailOptions = {
from: '"Mailer" <mail#mail.com>', // sender address
to: emails, // list of receivers
subject: 'Tienes compras pendientes de validar', // Subject line
text: 'Tienes compras pendientes de validar', // plaintext body
html: '<b>Tienes compras pendientes de validar.</b>' // html body
};
// send mail with defined transport object
transporter.sendMail(mailOptions, function(error, info){
if(error){
return logger.error(error);
}
logger.info('Message sent: ' + info.response);
});
} else {
clearInterval(refreshId);
return true;
}
});
}, the_interval);
};
As I said this is working well.
I control the five minutes withsetInterval
But I supossed every time the route postlinevalidation is called, a new thread is open, so I will have several setInterval processes running.
I want to know how to manage it. If the controller function exports.sendMail is running, when the route is called again, "kill this process", and start again exports.sendMail
Thanks in advance

But I supossed every time the route postlinevalidation is called, a
new thread is open, so I will have several setInterval processes
running.
No, this is not how node.js works. You don't get multiple threads because of multiple setInterval() timers.
node.js by itself is single threaded. So, each time a route is called, it just creates an event in the node.js event queue and they are served FIFO, one at a time. At any point that one of the route handlers makes an async call, it essentially "yields" control back and the next item in the event queue gets to run until it yields or finishes.
Timers like setInterval() also use the event queue so no additional threads are creates by setInterval(). It is possible that node.js modules that use native code may themselves use threads and node.js uses a small thread pool that it uses for disk managemnet, but neither of those have anything to do with setInterval().
If you explicitly want to create another execution context for a long running operation in node.js to separate it from the single node.js thread, then that is usually done with the child process module that is part of node.js. You create a new process (which can be a node.js process or some other program running in the process) and you can then communicate with that other process.
If the controller function exports.sendMail is running, when the route
is called again, "kill this process", and start again
exports.sendMail
This is something that would need to be an explicit feature of the nodemailer module in order for you to cancel an operation in process. How "in process" asynchronous operations are implemented and controlled is not a generic node.js thing, but is specific to how that specific module implements things and keeps track of things.
Looking into the code for the node-mailer and more specifically the smtp-connection module, it looks like it uses plain async node.js socket code. That means it does not create any new threads or processes on its own.
As for your setInterval() calls, you need to make sure that any body of code that creates a setInterval() keeps track of the interval timer ID and eventually clears the interval so it stops and you don't keep piling up more and more interval timers. Another possibility is that you have only one interval and it does checking for all outstanding operations (rather than have a separate interval for each one).

From a quick look, I think you don't really need to put the sendMail function inside postlinevalidation. If you want to control it, you could run it in a different script from the express app. You can use something like pm2 or parallelshell to run multiple scripts at the same time.

If you are using setInterval then you can use clearInterval to stop the setInterval based on your condition. Whenever you call a setInterval function, it returns an id using which you can stop the setInterval.
var interval = setInterval(doStuff, 5000);
function doStuff() {
if(your_condition) {
clearInterval(interval);
}
}

Related

Send continuous updates to connected clients using socket.io and nodejs

Can anyone tell me how to send continuous updates to connected clients every second using nodejs and socket.io?
NOTE: I don't want to use the setInterval() function as it is unfit for my current scenario.

You can do this with setTimeout in a function that references itself in the setTimeout. Basically the same result as doing setInterval but will always wait for the function to finish (assuming synchronous code) before running the timeout function again.
function thingToRepeat() {
let shouldCancel = false;
// send messages, do stuff,
// set shouldCancel to true to stop looping if needed
if (!shouldCancel) {
setTimeout(thingToRepeat, 1000);
}
}

Semaphore equivalent in Node js , variable getting modified in concurrent request?

I am facing this issue for the past 1 week and I am just confused about this.
Keeping it short and simple to explain the problem.
We have an in memory Model which stores values like budget etc.Now when a call is made to the API it has a spent associated with it.
We then check the in memory model and add the spent to the existing spend and then check to the budget and if it exceeds we donot accept any more clicks of that model. for each call we also udpate the db but that is a async operation.
A short example
api.get('/clk/:spent/:id', function(req, res) {
checkbudget(spent, id);
}
checkbudget(spent, id){
var obj = in memory model[id]
obj.spent+= spent;
obj.spent > obj.budjet // if greater.
obj.status = 11 // 11 is the stopped status
update db and rebuild model.
}
This used to work fine but now with concurrent requests we are getting false spends out spends increase more than budget and it stops after some time. We simulated the call with j meter and found this.
As far as we could find node is async so by the time the status is updated to 11 many threads have already updated the spent for the campaign.
How to have a semaphore kind of logic for Node.js so that the variable budget is in sync with the model
update
db.addSpend(campaignId, spent, function(err, data) {
campaign.spent += spent;
var totalSpent = (+camp.spent) + (+camp.cpb);
if (totalSpent > camp.budget) {
logger.info('Stopping it..');
camp.status = 11; // in-memory stop
var History = [];
History.push(some data);
db.stopCamp(campId, function(err, data) {
if (err) {
logger.error('Error while stopping );
}
model.campMAP = buildCatMap(model);
model.campKeyMap = buildKeyMap(model);
db.campEventHistory(cpcHistory, false, function(err) {
if (err) {
logger.error(Error);
}
})
});
}
});
GIST of the code can anyone help now please

Q: Is there semaphore or equivalent in NodeJs?
A: No.
Q: Then how do NodeJs users deal with race condition?
A: In theory you shouldn't have to as there is no thread in javascript.
Before going deeper into my proposed solution I think it is important for you to know how NodeJs works.
For NodeJs it is driven by an event based architecture. This means that in the Node process there is an event queue that contains all the "to-do" events.
When an event gets pop from the queue, node will execute all of the required code until it is finished. Any async calls that were made during the run were spawned as other events and they are queued up in the event queue until a response is heard back and it is time to run them again.
Q: So what can I do to ensure that only 1 request can perform updates to the database at a time?
A: I believe there are many ways you can achieve this but one of the easier way out is to use the set_timeout API.
Example:
api.get('/clk/:spent/:id', function(req, res) {
var data = {
id: id
spending: spent
}
canProceed(data, /*functions to exec after canProceed=*/ checkbudget);
}
var canProceed = function(data, next) {
var model = in memory model[id];
if (model.is_updating) {
set_timeout(isUpdating(data, next), /*try again in=*/1000/*milliseconds*/);
}
else {
// lock is released. Proceed.
next(data.spending, data.id)
}
}
checkbudget(spent, id){
var obj = in memory model[id]
obj.is_updating = true; // Lock this model
obj.spent+= spent;
obj.spent > obj.budjet // if greater.
obj.status = 11 // 11 is the stopped status
update db and rebuild model.
obj.is_updating = false; // Unlock the model
}
Note: What I got here is pseudo code as well so you'll may have to tweak it a bit.
The idea here is to have a flag in your model to indicate whether a HTTP request can proceed to do the critical code path. In this case your checkbudget function and beyond.
When a request comes in it checks the is_updating flag to see if it can proceed. If it is true then it schedules an event, to be fired in a second later, this "setTimeout" basically becomes an event and gets placed into node's event queue for later processing
When this event gets fired later, the checks again. This occurs until the is_update flag becomes false then the request goes on to do its stuff and is_update is set to false again when all the critical code is done.
Not the most efficient way but it gets the job done, you can always revisit the solution when performance becomes a problem.

Calling socket.disconnect in a forEach loop doesn't actually call disconnect on all sockets

I am new to javascript world. Recently I was working on a chat application in nodejs. So I have a method called gracefulshutdown as follows.
var gracefulShutdown = function() {
logger.info("Received kill signal, shutting down gracefully.");
server.close();
logger.info('Disconnecting all the socket.io clients');
if (Object.keys(io.sockets.sockets).length == 0) process.exit();
var _map = io.sockets.sockets,
_socket;
for (var _k in _map) {
if (_map.hasOwnProperty(_k)) {
_socket = _map[_k];
_socket.disconnect(true);
}
}
...code here...
setTimeout(function() {
logger.error("Could not close connections in time, shutting down");
process.exit();
}, 10 * 1000);
}
Here is what is happening in the disconnect listener.The removeDisconnectedClient method simply updates an entry in the db to indicate the removed client.
socket.on('disconnect', function() {
removeDisconnectedClient(socket);
});
So in this case the disconnect event wasn't fired for all sockets. It was fired for only a few sockets randomly from the array. Although I was able to fix it using setTimeout(fn, 0) with the help of a teammate.
I read about it online and understood only this much that setTimeout defers the execution of of code by adding it to end of event queue. I read about javascript context, call stack, event loop. But I couldn't put together all of it in this context. I really don't understand why and how this issue occurred. Could someone explain it in detail. And what is the best way to solve or avoid them.

It is hard to say for sure without a little more context about the rest of the code in gracefulShutdown but I'm surprised it is disconnecting any of the sockets at all:
_socket = _map[ _k ];
socket.disconnect(true);
It appears that you are assigning an item from _map to the variable _socket but then calling disconnect on socket, which is a different variable. I'm guessing it is a typo and you meant to call disconnect on _socket?
Some of the sockets might be disconnecting for other reasons and the appearance that your loop is disconnecting some but not all the sockets is probably just coincidence.
As far as I can tell from the code you posted, socket should be undefined and you should be getting errors about trying to call the disconnect method on undefined.

From the method name where you use it I can suppose that application exits after attempts to disconnect all sockets. The nature of socket communication is asynchronous, so given you have a decent amount of items in _map it can occur that not all messages with disconnect will be sent before the process exits.
You can increase chances by calling exit after some timeout after disconnecting all sockets. However, why would you manually disconnect? On connection interruption remote sockets will automatically get disconnected...
UPDATE
Socket.io for Node.js doesn't have a callback to know for sure that packet with disconnect command was sent. At least in v0.9. I've debugged that and came to conclusion that without modification of sources it is not possible to catch that moment.
In file "socket.io\lib\transports\websocket\hybi-16.js" a method write is called to send the disconnect packet
WebSocket.prototype.write = function (data) {
...
this.socket.write(buf, 'binary');
...
}
Whereas socket.write is defined in Node.js core transport "nodejs-{your-node-version}-src\core-modules-sources\lib\net.js" as
Socket.prototype.write = function(chunk, encoding, cb)
//cb is a callback to be called on writeRequest complete
However as you see this callback is not provided, so socket.io will not know about the packet having been sent.
At the same time when disconnect() is called for websocket, member disconnected is set to true, and "disconnect" event is broadcasted, indeed. But synchronously. So .on('disconnect' handler on server socket doesn't give and valuable information about whether the packet was sent or not.
Solution
I can make a general conclusion from this. If it is so critical to make sure that all clients are immediately informed (and not wait for a heartbeat timeout or if heartbeat is disabled) then this logic should be implemented manually.
You can send an ordinary message which will mean for the client that server is shutting down and call socket disconnect as soon as the message is received. At the same time server will be able to accept all acknowledgements
Server-side:
var sockets = [];
for (var _k in _map) {
if (_map.hasOwnProperty(_k)) {
sockets.push(_map[_k]);
}
}
sockets.map(function (socket) {
socket.emit('shutdown', function () {
socket.isShutdown = true;
var all = sockets.every(function (skt) {
return skt.isShutdown;
});
if (all) {
//wrap in timeout to let current tick finish before quitting
setTimeout(function () {
process.exit();
});
}
})
})
Clients should behave simply
socket.on('shutdown', function () {
socket.disconnect();
});
Thus we make sure each client has explicitly disconnected. We don't care about server. It will be shutdown shortly.

In the example code it looks like io.sockets.sockets is an Object, however, at least in the library version I am using, it is a mutable array which the socket.io library is free to modify each time you are removing a socket with disconnect(true).
Thus, when you call disconnect(true); if the currently iterated item from index i is removed, this effect like this happens:
var a = [1,2,3,4];
for( var i in a) {
a.splice(i,1); // remove item from array
alert(i);
}
// alerts 0,1
Thus, the disconnect(true) call will ask the socket.io to remove the item from the array - and because you are both holding reference to the same array, the contents of the array are modified during the loop.
The solution is to create a copy of the _map with slice() before the loop:
var _map = io.sockets.sockets.slice(); // copy of the original
It would create a copy of the original array and thus should go through all the items in the array.
The reason why calling setTimeout() would also work is that it would defer the removal of the items from the array, allowing the whole loop iterate without modifying the sockets -Array.

The problem here is that sockjs and socket.io use asynchronous "disconnect" methods. IE. When you call disconnect, it is not immediately terminated. It is just a promise that it WILL be terminated. This has the following effect (assuming 3 sockets)
Your for loop grabs the first socket
The disconnect method is called on the first socket
Your for loop grabs the second socket
The disconnect method is called on the second socket
The disconnect method on the first socket finishes
Your for loop grabs the third socket
The disconnect method is called on the third socket
Program kills itself
Notice, that sockets 2 and 3 haven't necessarily finished yet. This could be for a number of reasons.
Finally, setTimeout(fn, 0) is, as you said, blocking the final call, but it may not be consistent (I haven't dug into this too much). By that I mean, you've set the final termination to be AFTER all your sockets have disconnected. The setTimeout and setInterval methods essentially act more like a queue. Your position in the queue is dictated by the timer you set. Two intervals set for 10s each, where they both run synchronously will cause one to run AFTER the other.

After Socket.io 1.0, the library does not expose you an array of the connected sockets. You can check that io.socket.sockets.length, is not equal to the open socket objects. Your best bet is that you broadcast a 'disconnect' message to all the clients that you want to off, and on.'disconnect' on the client side close the actual WebSocket.

Efficient closure structure in node.js

I'm starting to write a server in node.js and wondering whether or not I'm doing things the right way...
Basically my structure is like the following pseudocode:
function processStatus(file, data, status) {
...
}
function gotDBInfo(dbInfo) {
var myFile = dbInfo.file;
function gotFileInfo(fileInfo) {
var contents = fileInfo.contents;
function sentMessage(status) {
processStatus(myFile, contents, status);
}
sendMessage(myFile.name + contents, sentMessage);
}
checkFile(myFile, gotFileInfo);
}
checkDB(query, gotDBInfo);
In general, I'm wondering if this is the right way to code for node.js, and more specifically:
1) Is the VM smart enough to run "concurrently" (i.e. switch contexts) between each callback to not get hung up with lots of connected clients?
2) When garbage collection is run, will it clear the memory completely if the last callback (processStatus) finished?

Node.js is event-based, all codes are basically handlers of events. The V8 engine will execute-to-end any synchronous code in the handler and then process the next event.
Async call (network/file IO) will post an event to another thread to do the blocking IO (that's in libev libeio AFAIK, I may be wrong on this). Your app can then handle other clients. When the IO task is done, an event is fired and your callback function is called upon.
Here's an example of aync call flow, simulating a Node app handling a client request:
onRequest(req, res) {
// we have to do some IO and CPU intensive task before responding the client
asyncCall(function callback1() {
// callback1() trigger after asyncCall() done it's part
// *note that some other code might have been executed in between*
moreAsyncCall(function callback2(data) {
// callback2() trigger after moreAsyncCall() done it's part
// note that some other code might have been executed in between
// res is in scope thanks to closure
res.end(data);
// callback2() returns here, Node can execute other code
// the client should receive a response
// the TCP connection may be kept alive though
});
// callback1() returns here, Node can execute other code
// we could have done the processing of asyncCall() synchronously
// in callback1(), but that would block for too long
// so we used moreAsyncCall() to *yield to other code*
// this is kind of like cooperative scheduling
});
// tasks are scheduled by calling asyncCall()
// onRequest() returns here, Node can execute other code
}
When V8 does not have enough memory, it will do garbage collection. It knows whether a chunk of memory is reachable by live JavaScript object. I'm not sure if it will aggressively clean up memory upon reaching end of function.
References:
This Google I/O presentation discussed the GC mechanism of Chrome (hence V8).
http://platformjs.wordpress.com/2010/11/24/node-js-under-the-hood/
http://blog.zenika.com/index.php?post/2011/04/10/NodeJS

How to develop node.js run-time strategy?

Node.js approach is event driven and I was wondering how would you tackle the problem of when to fire off an event?
Lets say that we have some actions on a web application: create some data, serve pages, receive data etc.
How would you lay out these events? In a threaded system the design is rather "simple". You dedicated threads to specific set of tasks and you go down the road of thread synchronization. While these task are at low on demand the threads sit idle and do nothing. When they are needed they run their code. While this road has issues it's well documented and kind of solved.
I find it hard to wrap my head around the node.js event way of doing things.
I have 10 request coming in, but I haven't created any data so I can't serve anying, creating data is a long action and another 5 client wants to send data. What now?
I've created the following untested code which is basically a pile of callbacks which get registered and should be executed. There will be some kind of a pile manager that will run and decide which code does it want to execute now. All the callback created by that callback can be added "naturally" to the even loop. It should also register it's self so the event loop could give the control back to it. Other things like static content and what ever can be bound differently.
How can I register a call back to be the last call in the current event loop state?
Is this a good way to solve this issue?

The most important thing to remember when coming from a threaded environment is that in node you don't wait for an action to finish happening, instead you tell it what to do when it is done. To do this you use a callback, this is a variable which contains a function to execute, or a pointer to a function if you like.
For example:
app.get('/details/:id?', function (req, res) {
var id = req.params.ucid,
publish = function (data) {
res.send(data);
};
service.getDetails(id, publish);
});
You can then invoke the publish method from within your get details method once you have created the required data.
getDetail : function (id, callback) {
var data = makeMyData(id);
callback(data)
}
Which will then publish your data back to the response object. Because of the event loop node will continue to serve requests to this url without interrupting the data generation from the first request

The answer chosen is the most correct, there is but one minor code change and that is:
Change this function from this:
getDetail : function (id, callback) {
var data = makeMyData(id);
callback(data)
}
To that:
getDetail : function (id, callback) {
var data = makeMyData(id);
setTimeout(callback, 0, data);
}
Update 2019:
In order to comply with community standard I've broken off an update to a new answer.
I've used setTimeout because I wanted to defer the callback to the back of the event loop. Another option I've used was process.nextTick(), this helped to defer the callback to the end of the current event processed.
For example:
getDetail : function (id, callback) {
var data = makeMyData(id);
process.nextTick(((info)=> callback(info))(data))
}

We Keep Coding

JavaScript is the programming language of the Web.