Semaphore equivalent in Node js , variable getting modified in concurrent request? - javascript

I am facing this issue for the past 1 week and I am just confused about this.
Keeping it short and simple to explain the problem.
We have an in memory Model which stores values like budget etc.Now when a call is made to the API it has a spent associated with it.
We then check the in memory model and add the spent to the existing spend and then check to the budget and if it exceeds we donot accept any more clicks of that model. for each call we also udpate the db but that is a async operation.
A short example
api.get('/clk/:spent/:id', function(req, res) {
checkbudget(spent, id);
}
checkbudget(spent, id){
var obj = in memory model[id]
obj.spent+= spent;
obj.spent > obj.budjet // if greater.
obj.status = 11 // 11 is the stopped status
update db and rebuild model.
}
This used to work fine but now with concurrent requests we are getting false spends out spends increase more than budget and it stops after some time. We simulated the call with j meter and found this.
As far as we could find node is async so by the time the status is updated to 11 many threads have already updated the spent for the campaign.
How to have a semaphore kind of logic for Node.js so that the variable budget is in sync with the model
update
db.addSpend(campaignId, spent, function(err, data) {
campaign.spent += spent;
var totalSpent = (+camp.spent) + (+camp.cpb);
if (totalSpent > camp.budget) {
logger.info('Stopping it..');
camp.status = 11; // in-memory stop
var History = [];
History.push(some data);
db.stopCamp(campId, function(err, data) {
if (err) {
logger.error('Error while stopping );
}
model.campMAP = buildCatMap(model);
model.campKeyMap = buildKeyMap(model);
db.campEventHistory(cpcHistory, false, function(err) {
if (err) {
logger.error(Error);
}
})
});
}
});
GIST of the code can anyone help now please

Q: Is there semaphore or equivalent in NodeJs?
A: No.
Q: Then how do NodeJs users deal with race condition?
A: In theory you shouldn't have to as there is no thread in javascript.
Before going deeper into my proposed solution I think it is important for you to know how NodeJs works.
For NodeJs it is driven by an event based architecture. This means that in the Node process there is an event queue that contains all the "to-do" events.
When an event gets pop from the queue, node will execute all of the required code until it is finished. Any async calls that were made during the run were spawned as other events and they are queued up in the event queue until a response is heard back and it is time to run them again.
Q: So what can I do to ensure that only 1 request can perform updates to the database at a time?
A: I believe there are many ways you can achieve this but one of the easier way out is to use the set_timeout API.
Example:
api.get('/clk/:spent/:id', function(req, res) {
var data = {
id: id
spending: spent
}
canProceed(data, /*functions to exec after canProceed=*/ checkbudget);
}
var canProceed = function(data, next) {
var model = in memory model[id];
if (model.is_updating) {
set_timeout(isUpdating(data, next), /*try again in=*/1000/*milliseconds*/);
}
else {
// lock is released. Proceed.
next(data.spending, data.id)
}
}
checkbudget(spent, id){
var obj = in memory model[id]
obj.is_updating = true; // Lock this model
obj.spent+= spent;
obj.spent > obj.budjet // if greater.
obj.status = 11 // 11 is the stopped status
update db and rebuild model.
obj.is_updating = false; // Unlock the model
}
Note: What I got here is pseudo code as well so you'll may have to tweak it a bit.
The idea here is to have a flag in your model to indicate whether a HTTP request can proceed to do the critical code path. In this case your checkbudget function and beyond.
When a request comes in it checks the is_updating flag to see if it can proceed. If it is true then it schedules an event, to be fired in a second later, this "setTimeout" basically becomes an event and gets placed into node's event queue for later processing
When this event gets fired later, the checks again. This occurs until the is_update flag becomes false then the request goes on to do its stuff and is_update is set to false again when all the critical code is done.
Not the most efficient way but it gets the job done, you can always revisit the solution when performance becomes a problem.

Related

SailsJS test if process is already running

say I have a long task that starts running when a person connects to InitializeDB. (Of course with authorization in the future, but left that out for now).
'post /initializeDB':'OrderController.initializeAll',
Now the problem is: the initialize function should never be run twice. - I know ideally I set up a taskmanager which just starts a task in the background which I could poll.
However for current simplicity, (and to show a proof of concept) is it possible for a sails route to "know" that another connection/route is already running? So that if I connect twice to /initializeDB it won't try to initialize the database twice?
You can use a variable in your controller - just toggle it to true when the process is running, something like that. So, in OrderController.js:
var initializeRunning = false;
module.exports = {
initializeAll: function(req, res) {
// return benign result if already running
if (initializeRunning) {
return res.send({alreadyRunning: true});
}
// start running
initializeRunning = true;
// using setTimeout as a stand-in for a long async process
setTimeout(function() {
// finished the process
res.send({complete: true});
// if you want to allow this method to run again later, unset your toggle
initializeRunning = false;
}, 3000);
},
};

Nodejs manage different threads

I'm a little bit newbie with Nodejs.
I'm working in a Nodejs - express solution.
I want to send and e-mail when some information is added to a MSSSQL database.
This is working well for me. The problem is that I want to check every five minutes that this information added to the database is modified or not, and if not, send another e-mail.
The call to add information to the db is this route:
router.post('/postlinevalidation', function(req, res) {
//insert function into mssql
silkcartCtrl.sendMail(req, res);
});
The controller part for sending the e-mail:
exports.sendMail = function(req, res) {
var emails = "";
fs.readFile('./config/email.conf', 'utf8', function (err,data) {
if (err) {
return logger.error(err);
}
emails = data;
});
var minutes = 5, the_interval = minutes * 60 * 1000;
var refreshId = setInterval(function() {
logger.info("I am doing my 5 minutes check FL_PENDIENTE");
var request = new sql.Request(req.dbsqlserver);
var sqlpendinglinesvalidation = "SELECT [FK_IDCHECK],[FK_IDPEDIDO],[BK_IDPROVEEDOR],[DE_PROVEEDOR]"+
",[FK_FAMILIA],[BK_FAMILIA],[FK_SUBFAMILIA],[BK_SUBFAMILIA],[FK_ARTICULO]"+
",[BK_ARTICULO],[FL_VALIDAR],[DT_FECHA],[FL_PENDIENTE],[DES_CHECK],[QNT_PROPUESTA],[FECHA]"+
"FROM TABLE"+
" WHERE [FL_PENDIENTE] = 1";
request.query(sqlpendinglinesvalidation, function (err, recordset) {
if (recordset.length > 0) {
var transporter = nodemailer.createTransport('smtps://user%40gmail.com:pwd#smtp.gmail.com');
var mailOptions = {
from: '"Mailer" <mail#mail.com>', // sender address
to: emails, // list of receivers
subject: 'Tienes compras pendientes de validar', // Subject line
text: 'Tienes compras pendientes de validar', // plaintext body
html: '<b>Tienes compras pendientes de validar.</b>' // html body
};
// send mail with defined transport object
transporter.sendMail(mailOptions, function(error, info){
if(error){
return logger.error(error);
}
logger.info('Message sent: ' + info.response);
});
} else {
clearInterval(refreshId);
return true;
}
});
}, the_interval);
};
As I said this is working well.
I control the five minutes withsetInterval
But I supossed every time the route postlinevalidation is called, a new thread is open, so I will have several setInterval processes running.
I want to know how to manage it. If the controller function exports.sendMail is running, when the route is called again, "kill this process", and start again exports.sendMail
Thanks in advance
But I supossed every time the route postlinevalidation is called, a
new thread is open, so I will have several setInterval processes
running.
No, this is not how node.js works. You don't get multiple threads because of multiple setInterval() timers.
node.js by itself is single threaded. So, each time a route is called, it just creates an event in the node.js event queue and they are served FIFO, one at a time. At any point that one of the route handlers makes an async call, it essentially "yields" control back and the next item in the event queue gets to run until it yields or finishes.
Timers like setInterval() also use the event queue so no additional threads are creates by setInterval(). It is possible that node.js modules that use native code may themselves use threads and node.js uses a small thread pool that it uses for disk managemnet, but neither of those have anything to do with setInterval().
If you explicitly want to create another execution context for a long running operation in node.js to separate it from the single node.js thread, then that is usually done with the child process module that is part of node.js. You create a new process (which can be a node.js process or some other program running in the process) and you can then communicate with that other process.
If the controller function exports.sendMail is running, when the route
is called again, "kill this process", and start again
exports.sendMail
This is something that would need to be an explicit feature of the nodemailer module in order for you to cancel an operation in process. How "in process" asynchronous operations are implemented and controlled is not a generic node.js thing, but is specific to how that specific module implements things and keeps track of things.
Looking into the code for the node-mailer and more specifically the smtp-connection module, it looks like it uses plain async node.js socket code. That means it does not create any new threads or processes on its own.
As for your setInterval() calls, you need to make sure that any body of code that creates a setInterval() keeps track of the interval timer ID and eventually clears the interval so it stops and you don't keep piling up more and more interval timers. Another possibility is that you have only one interval and it does checking for all outstanding operations (rather than have a separate interval for each one).
From a quick look, I think you don't really need to put the sendMail function inside postlinevalidation. If you want to control it, you could run it in a different script from the express app. You can use something like pm2 or parallelshell to run multiple scripts at the same time.
If you are using setInterval then you can use clearInterval to stop the setInterval based on your condition. Whenever you call a setInterval function, it returns an id using which you can stop the setInterval.
var interval = setInterval(doStuff, 5000);
function doStuff() {
if(your_condition) {
clearInterval(interval);
}
}

Calculating when multiple writes to a file will cause inaccuracies?

in my node server I have a variable,
var clicks = 0;
each time a user clicks in the webapp, a websocket event sends a message. on the server,
clicks++;
if (clicks % 10 == 0) {
saveClicks();
}
function saveClicks() {
var placementData = JSON.stringify({'clicks' : clicks});
fs.writeFile( __dirname + '/clicks.json', placementData, function(err) {
});
}
At what rate do I have to start worrying about overwrites? How would I calculate this math?
(I'm looking at creating a MongoDB json object for each click but I'm curious what a native solution can offer).
From the node.js doc for fs.writeFile():
Note that it is unsafe to use fs.writeFile() multiple times on the
same file without waiting for the callback. For this scenario,
fs.createWriteStream() is strongly recommended.
This isn't a math problem to figure out when this might cause a problem - it's just bad code that gives you the chance of a conflict in circumstances that cannot be predicted. The node.js doc clearly states that this can cause a conflict.
To make sure you don't have a conflict, write the code in a different way so a conflict cannot happen.
If you want to make sure that all writes happen in the proper order of incoming requests so the last request to arrive is always the one who ends up in the file, then you make need to queue your data as it arrives (so order is preserved) and write to the file in a way that opens the file for exclusive access so no other request can write while that prior request is still writing and handle contention errors appropriately.
This is an issue that databases mostly do for you automatically so it may be one reason to use a database.
Assuming you weren't using clustering and thus do not have multiple processes trying to write to this file and that you just want to make sure the last value sent is the one written to the file by this process, you could do something like this:
var saveClicks = (function() {
var isWriting = false;
var lastData;
return function() {
// always save most recent data here
lastData = JSON.stringify({'clicks' : clicks});
if (!isWriting) {
writeData(lastData);
}
function writeData(data) {
isWriting = true;
lastData = null;
fs.writeFile(__dirname + '/clicks.json', data, function(err) {
isWriting = false;
if (err) {
// decide what to do if an error occurs
}
// if more data arrived while we were writing this, then write it now
if (lastData) {
writeData(lastData);
}
});
}
}
})();
#jfriend00 is definitely right about createWriteStream and already made a point about the database, and everything's pretty much said, but I would like to emphasize on the point about databases because basically the file-saving approach seems weird to me.
So, use databases.
Not only would this save you from the headache of tracking such things, but would significantly speed up things (remember that the way stuff is done in node, the numerous file reading-writing processes would be parallelized in a single thread, so basically if one of them lasts for ages, it might slightly affect the overall performance).
Redis is a perfect solution to store key-value data, so you can store data like clicks per user in a Redis database which you'll have to get running alongside anyway when your get enough traffic :)
If you're not convinced yet, take a look at this simple benchmark:
Redis:
var async = require('async');
var redis = require("redis"),
client = redis.createClient();
console.time("To Redis");
async.mapLimit(new Array(100000).fill(0), 1, (el, cb) => client.set("./test", 777, cb), () => {
console.timeEnd("To Redis");
});
To Redis: 5410.383ms
fs:
var async = require('async');
var fs = require('fs');
console.time("To file");
async.mapLimit(new Array(100000).fill(0), 1, (el, cb) => fs.writeFile("./test", 777, cb), () => {
console.timeEnd("To file");
});
To file: 20344.749ms
And, by the way, you can significantly increase the number of clicks after which the progress would be stored (now it's 10) by simply adding this "click-saver" to the socket socket.on('disconnect', ....

node.js handling stream back pressure when using setTimeout

This is a follow on question for a further issue I've encountered from this earlier question:
nodejs: read from file and store to db, limit maximum concurrent db operations
Problem:
I want to condtionally reschedule some operations for a later time, however this is breaking my method for handling back-pressure.
Detail:
I have a CSV file that I am reading in as a stream, and using transforms to convert to JSON and then asynchronously store each line to a DB.
As lines are processed by the transform, they are placed onto an async queue which is responsible for issuing the DB operations.
E.g.
parser._transform = function(data, encoding, done) {
var tick = this._parseRow(data);
dbQueue.push(tick, function(err, result) {
if (typeof(err) != 'undefined') { console.log(err) }
});
this.push(tick);
done();
}
Back pressure is handled by pausing and resuming the parser when the queue is saturated/empty:
dbQueue.saturated = function() {
parser.pause();
}
dbQueue.empty = function() {
parser.resume();
}
The change I have been trying to make is that when an item is pulled off the queue, it is conditionally rescheduled for some time (100ms) in future:
var dbQueue = async.queue(function(data, callback) {
if (condition) {
// re-schedule operation by adding back to queue 100ms later
setTimeout(function(data, callback) {
dbQueue.push(data, function(err, result){
});
}, 100, data, callback);
} else {
//execute the db store
... ...
}
}
I believe my problem is that now many operations will spend most their time in setTimeout, so the dbQueue will be empty, and the back-pressure on the transform stream is not being handled as desired.
I have tried a few attempts at using counters such as max_ops and running_ops to ensure the stream is paused/resumed, but unsuccessfully.
Is there a more idiomatic way of handling this in node.js?
Since this looks like it's an external condition and not something related to what dbQueue is doing, instead of re-inserting the data in to the queue when the condition occurs, I would pause simply pause dbQueue. For example, lets say your condition is that the database disconnected for some reason and there's an event you can listen to for that. In that case you can just do something similar to what you're doing when dbQueue is saturated/empty:
db.on('disconnect', function() {
dbQueue.pause();
});
db.on('connect', function() {
dbQueue.resume();
});
This is usually a better approach than waiting for some pre-determined timeout. That being said, sometimes waiting for a timeout is the only option. In that case you could do something similar but, instead of waiting for a separate event to trigger the resume(), simply use setTimeout():
db.on('disconnect', function() {
dbQueue.pause();
setTimeout(function() {
dbQueue.resume();
});
});
Note: If we are really talking about db disconnects here, then you might also want to pause/resume dbQueue if there's a db error in the case that 100ms isn't enough time for the db to re-connect
If you have a more specific condition you're looking for, and you're willing to share what that is, I may be able to give you a better example :)

Best way to prevent race condition in multiple chrome.storage API calls?

Something requests a task
Something else pulls the task list out of storage, and checks if there are tasks there.
If there are tasks it removes one and the smaller "task list" is put back in storage.
Between steps 2 and 3 a race condition can occur if multiple requests occur, and the same task will be served twice.
Is the correct resolution to "lock" the "tasks table" while a single task is "checked out", to prevent any other requests?
What is the solution with the least performance impact, such as delay of execution, and how should it be implemented in javascript with chrome.storage API ?
Some code for example :
function decide_response ( ) {
if(script.replay_type == "reissue") {
function next_task( tasks ) {
var no_tasks = (tasks.length == 0);
if( no_tasks ) {
target_complete_responses.close_requester();
}
else {
var next_task = tasks.pop();
function notify_execute () {
target_complete_responses.notify_requester_execute( next_task );
}
setTable("tasks", tasks, notify_execute);
}
}
getTable( "tasks", next_tasks );
...
}
...
}
I think you can manage without a lock by taking advantage of the fact that javascript is single-threaded within a context, even with the asynchronous chrome.storage API. As long as you're not using chrome.storage.sync, that is - if there may or may not be changes from the cloud I think all bets are off.
I would do something like this (written off the cuff, not tested, no error handling):
var getTask = (function() {
// Private list of requests.
var callbackQueue = [];
// This function is called when chrome.storage.local.set() has
// completed storing the updated task list.
var tasksWritten = function(nComplete) {
// Remove completed requests from the queue.
callbackQueue = callbackQueue.slice(nComplete);
// Handle any newly arrived requests.
if (callbackQueue.length)
chrome.storage.local.get('tasks', distributeTasks);
};
// This function is called via chrome.storage.local.get() with the
// task list.
var distributeTasks = function(items) {
// Invoke callbacks with tasks.
var tasks = items['tasks'];
for (var i = 0; i < callbackQueue.length; ++i)
callbackQueue[i](tasks[i] || null);
// Update and store the task list. Pass the number of requests
// handled as an argument to the set() handler because the queue
// length may change by the time the handler is invoked.
chrome.storage.local.set(
{ 'tasks': tasks.slice(callbackQueue.length) },
function() {
tasksWritten(callbackQueue.length);
}
);
};
// This is the public function task consumers call to get a new
// task. The task is returned via the callback argument.
return function(callback) {
if (callbackQueue.push(callback) === 1)
chrome.storage.local.get('tasks', distributeTasks);
};
})();
This stores task requests from consumers as callbacks in a queue in local memory. When a new request arrives, the callback is added to the queue and the task list is fetched iff this is the only request in the queue. Otherwise we can assume that the queue is already being processed (this is an implicit lock that allows only one strand of execution to access the task list).
When the task list is fetched, tasks are distributed to requests. Note that there may be more than one request if more have arrived before the fetch completed. This code just passes null to a callback if there are more requests than tasks. To instead block requests until more tasks arrive, hold unused callbacks and restart request processing when tasks are added. If tasks can be dynamically produced as well as consumed, remember that race conditions will need to be prevented there as well but is not shown here.
It's important to prevent reading the task list again until the updated task list is stored. To accomplish this, requests aren't removed from the queue until the update is complete. Then we need to make sure to process any requests that arrived in the meantime (it's possible to short-circuit the call to chrome.storage.local.get() but I did it this way for simplicity).
This approach should be pretty efficient in the sense that it should minimize updates to the task list while still responding as quickly as possible. There is no explicit locking or waiting. If you have task consumers in other contexts, set up a chrome.extension message handler that calls the getTask() function.

Categories