Firebase Web SDK - Transaction causing on('child_added', func.. to be called - javascript

See jsbin.com/ceyiqi/edit?html,console,output for a verifiable example.
I have a reference listening to a database point
jobs/<key>/list
where <key> is the teams unique number
Under this entry point is a list of jobs
I have a listener on this point with
this.jobsRef.orderByChild('archived')
.equalTo(false)
.on('child_added', function(data) {
I also have a method that does the following transaction:
ref.transaction(function(post) {
// correct the counter
if(post)
{
// console.log(post);
if(active)
{
// if toggeling on
}
else
{
// if toggeling off
}
}
return post;
})
When invoking the transaction the child_added is also invoked again, giving me duplicate jobs.
Is this expected behavior?
Should I simply check to see if the item has been got before and add to the array accordingly?
Or am I doing something wrong?
Thanks in advance for your time

You're hitting an interesting edge case in how the Firebase client handles transactions. Your transaction function is running twice, first on null and then on the actual data. It's apparent to see if you listen for value events on /jobs, since then you'll see:
initial value
null (when transaction starts and runs on null)
initial value again (when transaction runs again on real data)
The null value in step 2 is the client's initial guess for the current value. Since the client doesn't have cached data for /jobs (it ignores the cached data for /jobs/list), it guesses null. It is clearly wrong. But unfortunately has been like this for a long time, so it's unlikely to change in the current major version of the SDK.
And because of the null in step 2, you'll get child_removed events (which you're not handling right now) and then in step 3 you'll get child_added events to re-add them.
If you handled the child_removed events, you're items wouldn't end up duplicated, but they would still disappear / reappear, which probably isn't desirable. Another workaround in the current setup is to explicitly tell the transaction to not run with the local estimate, which you can do by passing in false as the third parameter:
function transact() {
var path = 'jobs/';
var ref = firebase.database().ref(path);
ref.transaction(function(post) {
return post;
}, function(error, committed, snapshot) {
if (error) {
console.log('Transaction failed abnormally!', error);
} else if (!committed) {
console.log('Transaction aborted.');
} else {
console.log('Transaction completed.');
}
}, false);
}
I'm sorry I don't have a better solution. But as I said: it's very unlikely that we'll be able to change this behavior in the current generation of SDKs.

Related

How can I use mongodb to only allow one entry when multiple servers are accessing the database?

I have multiple "worker" servers processing jobs and accessing the same MongoDB database but I only want one message to be created and never allow two servers running the same job to create the same message.
When a message is sent, its status field is set to "sent" or if it's disabled it's set to "disabled". So first it checks if there are any sent or disabled messages. Then it creates a document with a lockedAt field set to the current time and checks if the same message has already been locked. The reason I use the lockedAt field is so that if the job fails for some reason, it will allow the lock to expire and run again.
This seems to work most of the time but there are a few messages getting through if two "workers" run the same job within a few milliseconds so my logic isn't perfect but I can't figure out how duplicate messages are being created.
How can I use MongoDB to prevent the same job from running at the exact same time and creating the same message twice?
// Check if there is a locked message.
// insert a new message or update if one is found but return old message (or nothing if one didn't' exist)
const messageQuery = {
listingID: listing._id,
messageRuleID: messageRule._id,
reservationID: reservation._id
};
let message = await Message.findOne({
...messageQuery,
...{status: {$in: ["disabled", "sent"]}}
});
// If message has been sent or is disabled don't continue
if (message) {
return;
}
message = await Message.findOneAndUpdate(
messageQuery,
{
listingID: listing._id,
messageRuleID: messageRule._id,
reservationID: reservation._id,
lockedAt: moment().toDate() // Check out the message with the current date
},
{upsert: true}
);
// If no message is defined, then it's new and not locked, move on.
if (message) {
// If message has been locked for less than X minutes don't continue
const cutoff = moment().subtract(
Config.messageSendLock.amount,
Config.messageSendLock.unit
);
if (message.lockedAt && moment(message.lockedAt).isAfter(cutoff)) {
// Put the last lock time back
await Message.findOneAndUpdate(messageQuery, {lockedAt: message.lockedAt});
return;
}
}
It's much simpler than that. In Mongo write operations are atomic on document level.
All you need to do is to define a unique key for your message.
Assuming the key is:
const messageId = {
listingID: listing._id,
messageRuleID: messageRule._id,
reservationID: reservation._id
};
Then all you need is 2 write queries. First one is the "insert" half of your upsert:
try {
await Message.create(messageId);
} catch (err) {
if (err.code !== '11000') { // ignore duplicate key error
throw err
}
}
This is idempotent and can be executed by multiple workers at the same time. The result is always the same - exactly 1 document in the collection.
This index can be defined on schema level as following:
Message.index({
listingID: 1,
messageRuleID: 1,
reservationID: 1
}, { unique: true })
IMPORTANT: Ensure you not only define the unique index on your model but also the index was applied to the collection on db side. Mongoose has some optimisations and does not always apply indexes. Otherwise the code above will create duplicates. The relevant piece from the mongoose documentation:
When your application starts up, Mongoose automatically calls createIndex for each defined index in your schema. Mongoose will call createIndex for each index sequentially, and emit an 'index' event on the model when all the createIndex calls succeeded or when there was an error. While nice for development, it is recommended this behavior be disabled in production since index creation can cause a significant performance impact. Disable the behavior by setting the autoIndex option of your schema to false, or globally on the connection by setting the option autoIndex to false.
The second query is the "update" part of your upsert:
const cutoff = moment().subtract(
Config.messageSendLock.amount,
Config.messageSendLock.unit
);
message = await Message.findOneAndUpdate(
{
...messageId,
status: {$nin: ["disabled", "sent"]},
$or: [
{ lockedAt: {$exists: false} },
{ lockedAt: {$lte: cutoff}}
]
},
{ lockedAt: moment().toDate() }
);
// If no message is defined, then it's either sent, disabled, or is locked
if (!message) {
return
}
Running this query from multiple workers at the same time will result with a queue of IX locks on the matching document on the db side. The first one will add/update the lockedAt field and return the document to your worker. All consecutive queries won't match the filter and return nothing.
Create a separate collection, say jobs.
Whenever you want to start processing a document, insert into jobs with _id equal to the document _id.
When processing is done, remove the job.
Set the equivalent of lockedAt on the job.

Semaphore equivalent in Node js , variable getting modified in concurrent request?

I am facing this issue for the past 1 week and I am just confused about this.
Keeping it short and simple to explain the problem.
We have an in memory Model which stores values like budget etc.Now when a call is made to the API it has a spent associated with it.
We then check the in memory model and add the spent to the existing spend and then check to the budget and if it exceeds we donot accept any more clicks of that model. for each call we also udpate the db but that is a async operation.
A short example
api.get('/clk/:spent/:id', function(req, res) {
checkbudget(spent, id);
}
checkbudget(spent, id){
var obj = in memory model[id]
obj.spent+= spent;
obj.spent > obj.budjet // if greater.
obj.status = 11 // 11 is the stopped status
update db and rebuild model.
}
This used to work fine but now with concurrent requests we are getting false spends out spends increase more than budget and it stops after some time. We simulated the call with j meter and found this.
As far as we could find node is async so by the time the status is updated to 11 many threads have already updated the spent for the campaign.
How to have a semaphore kind of logic for Node.js so that the variable budget is in sync with the model
update
db.addSpend(campaignId, spent, function(err, data) {
campaign.spent += spent;
var totalSpent = (+camp.spent) + (+camp.cpb);
if (totalSpent > camp.budget) {
logger.info('Stopping it..');
camp.status = 11; // in-memory stop
var History = [];
History.push(some data);
db.stopCamp(campId, function(err, data) {
if (err) {
logger.error('Error while stopping );
}
model.campMAP = buildCatMap(model);
model.campKeyMap = buildKeyMap(model);
db.campEventHistory(cpcHistory, false, function(err) {
if (err) {
logger.error(Error);
}
})
});
}
});
GIST of the code can anyone help now please
Q: Is there semaphore or equivalent in NodeJs?
A: No.
Q: Then how do NodeJs users deal with race condition?
A: In theory you shouldn't have to as there is no thread in javascript.
Before going deeper into my proposed solution I think it is important for you to know how NodeJs works.
For NodeJs it is driven by an event based architecture. This means that in the Node process there is an event queue that contains all the "to-do" events.
When an event gets pop from the queue, node will execute all of the required code until it is finished. Any async calls that were made during the run were spawned as other events and they are queued up in the event queue until a response is heard back and it is time to run them again.
Q: So what can I do to ensure that only 1 request can perform updates to the database at a time?
A: I believe there are many ways you can achieve this but one of the easier way out is to use the set_timeout API.
Example:
api.get('/clk/:spent/:id', function(req, res) {
var data = {
id: id
spending: spent
}
canProceed(data, /*functions to exec after canProceed=*/ checkbudget);
}
var canProceed = function(data, next) {
var model = in memory model[id];
if (model.is_updating) {
set_timeout(isUpdating(data, next), /*try again in=*/1000/*milliseconds*/);
}
else {
// lock is released. Proceed.
next(data.spending, data.id)
}
}
checkbudget(spent, id){
var obj = in memory model[id]
obj.is_updating = true; // Lock this model
obj.spent+= spent;
obj.spent > obj.budjet // if greater.
obj.status = 11 // 11 is the stopped status
update db and rebuild model.
obj.is_updating = false; // Unlock the model
}
Note: What I got here is pseudo code as well so you'll may have to tweak it a bit.
The idea here is to have a flag in your model to indicate whether a HTTP request can proceed to do the critical code path. In this case your checkbudget function and beyond.
When a request comes in it checks the is_updating flag to see if it can proceed. If it is true then it schedules an event, to be fired in a second later, this "setTimeout" basically becomes an event and gets placed into node's event queue for later processing
When this event gets fired later, the checks again. This occurs until the is_update flag becomes false then the request goes on to do its stuff and is_update is set to false again when all the critical code is done.
Not the most efficient way but it gets the job done, you can always revisit the solution when performance becomes a problem.

Firebase transactions bug

Consider the following:
function useCredits(userId, amount){
var userRef = firebase.database().ref().child('users').child(userId);
userRef.transaction(function(user) {
if (!user){
return user;
}
user.credits -= amount;
return user;
}, NOOP, false);
}
function notifyUser(userId, message){
var notificationId = Math.random();
var userNotificationRef = firebase.database().ref().child('users').child(userId).child('notifications').child(notificationId);
userNotificationRef.transaction(function(notification) {
return message;
}, NOOP, false);
}
These are called from the same node js process.
A user looks like this:
{
"name": 'Alex',
"age": 22,
"credits": 100,
"notifications": {
"1": "notification 1",
"2": "notification 2"
}
}
When I run my stress tests I notice that sometimes the user object passed to the userRef transaction update function is not the full user it is only the following:
{
"notifications": {
"1": "notification 1",
"2": "notification 2"
}
}
This obviously causes an Error because user.credits does not exist.
It is suspicious that the user object passed to update function of the userRef transaction is the same as the data returned by the userNotificationRef transaction's update function.
Why is this the case? This problem goes away if I run both transactions on the user parent location, but this is a less optimal solution as I am then effectively locking on and reading the whole user object, which is redundant when adding a write once notification.
In my experience, you can't rely on the initial value passed into a transaction update function. Even if the data is populated in the datastore, the function might be called with null, a partial value, or a stale old value (in case of a local update in flight). This is not usually a problem as long as you take a defensive approach when writing the function (and you should!), since the bogus update will be refused and the transaction retried.
But beware: if you abort the transaction (by returning undefined) because the data doesn't make sense, then it's not checked against the server and won't get retried. For this reason, I recommend never aborting transactions. I built a monkey patch to apply this fix (and others) transparently; it's browser-only but could be adapted to Node trivially.
Another thing you can do to help a bit is to insert an on('value') call on the same ref just before the transaction and keep it alive until the transaction completes. This will usually cause the transaction to run on the correct data on the first try, doesn't affect bandwidth too much (since the current value would need to be transmitted anyway), and increases local latency a little if you have applyLocally set or defaulting to true. I do this in my NodeFire library, among many other optimizations and tweaks.
On top of all the above, as of this writing there's still a bug in the SDK where very rarely the wrong base value will get "stuck" and the transaction retry continuously (failing with maxretry every so often) until you restart the process.
Good luck! I still use transactions in my server, where failures can be retried easily and I have multiple processes running, but have given up on using them on the client -- they're just too unreliable. In my opinion it's often better to redesign your data structures so that transactions aren't needed.

Firebase transactions coming back multiple times much later?

This is a fairly weird thing, and it's hard to reproduce. Not the best state of a bug report, I apologize.
I'm using .transaction() to write a value to a location in Firebase. Here's some pseudo-code:
var ref = firebase.child('/path/to/location');
var storeSafely = function(val) {
ref.transaction(
function updateFunc(currentData) {
console.log('Attempting update: ' + JSON.stringify(val));
if (currentData) return;
return val;
},
function onTransactionCompleteFunc(err, isCommitted, snap) {
if (err) {
console.log('Error in onTransactionCompleteFunc: ' + JSON.stringify(err));
return;
}
if (! isCommitted) {
console.log('Not committed');
return;
}
ref.onDisconnect().remove();
doSomeStuff();
});
};
var doSomeStuff = function() {
// Things get done, time passes.
console.log('Cleaning up');
ref.onDisconnect().cancel();
ref.set(
null,
function onSetCompleteFunc(err) {
if (err) {
console.log('Error in onSetCompleteFunc: ' + JSON.stringify(err));
}
});
};
storeSafely(1);
// later...
storeSafely(2);
// even later...
storeSafely(3);
I'm effectively using Firebase transactions as a sort of mutex lock:
Store a value at a location via transaction.
Set the onDisconnect for the location to remove the value in case my app dies while working.
Do some stuff.
Remove the onDisconnect for the location, because I'm done with the stuff.
Remove the value at the location.
I do this every few minutes, and it all works great. Things get written and removed perfectly, and the logs show me creating the lock, doing stuff, and then releasing the lock.
The weird part is what happens hours later. Occasionally Firebase has maintenance, and my app gets a bunch of permission denied errors. At the same time this happens, I suddenly start getting a bunch of this output in the logs:
Attempting update 1
Attempting update 2
Attempting update 3
...in other words, it looks like the transactions never fully completed, and they're trying to retry now that the location can't be read any more. It's almost like there's a closure in the transaction() code that never completed, and it's getting re-executed now for some reason.
Am I missing something really important here about how to end a transaction?
(Note: I originally posted this to the Firebase Google Group, but was eventually reminded that code questions are supposed to go to Stack Overflow. I apologize for the cross-posting.)
Just a guess, but I wonder if your updateFunc() function is being called with null when your app gets the permission-denied errors from Firebase. (If so, I could believe that's part of their "Offline Writes" support.)
In any case, you should handle null as a possible state. Saving Transactional Data says:
transaction() will be called multiple times and must be able to handle
null data. Even if there is existing data in your database it may not
be locally cached when the transaction function is run.
I don't know the intricacies of Firebase's transaction mechansim, but I would try changing your .set(null) to set the value to 0 instead, change your .remove() to also set the value with .set(0), and change your line in updateFunc() to:
if (currentData === null || currentData) return;
Unfortunately, that assumes that '/path/to/location' is initially set to 0 at some point. If that's a problem, maybe you can muck around with null versus undefined. For example, it would be nice if Firebase used one of those for non-existent data and another when it's offline.

What happens with unhandled socket.io events?

Does socket.io ignore\drop them?
The reason why Im asking this is the following.
There is a client with several states. Each state has its own set of socket handlers. At different moments server notifies the client of state change and after that sends several state dependent messages.
But! It takes some time for the client to change state and to set new handlers. In this case client can miss some msgs... because there are no handlers at the moment.
If I understand correctly unhandled msgs are lost for client.
May be I miss the concept or do smth wrong... How to hanle this issues?
Unhandled messages are just ignored. It's just like when an event occurs and there are no event listeners for that event. The socket receives the msg and doesn't find a handler for it so nothing happens with it.
You could avoid missing messages by always having the handlers installed and then deciding in the handlers (based on other state) whether to do anything with the message or not.
jfriend00's answer is a good one, and you are probably fine just leaving the handlers in place and using logic in the callback to ignore events as needed. If you really want to manage the unhandled packets though, read on...
You can get the list of callbacks from the socket internals, and use it to compare to the incoming message header. This client-side code will do just that.
// Save a copy of the onevent function
socket._onevent = socket.onevent;
// Replace the onevent function with a handler that captures all messages
socket.onevent = function (packet) {
// Compare the list of callbacks to the incoming event name
if( !Object.keys(socket._callbacks).map(x => x.substr(1)).includes(packet.data[0]) ) {
console.log(`WARNING: Unhandled Event: ${packet.data}`);
}
socket._onevent.apply(socket, Array.prototype.slice.call(arguments));
};
The object socket._callbacks contains the callbacks and the keys are the names. They have a $ prepended to them, so you can trim that off the entire list by mapping substring(1) onto it. That results in a nice clean list of event names.
IMPORTANT NOTE: Normally you should not attempt to externally modify any object member starting with an underscore. Also, expect that any data in it is unstable. The underscore indicates it is for internal use in that object, class or function. Though this object is not stable, it should be up to date enough for us to use it, and we aren't modifying it directly.
The event name is stored in the first entry under packet.data. Just check to see if it is in the list, and raise the alarm if it is not. Now when you send an event from the server the client does not know it will note it in the browser console.
Now you need to save the unhandled messages in a buffer, to play back once the handlers are available again. So to expand on our client-side code from before...
// Save a copy of the onevent function
socket._onevent = socket.onevent;
// Make buffer and configure buffer timings
socket._packetBuffer = [];
socket._packetBufferWaitTime = 1000; // in milliseconds
socket._packetBufferPopDelay = 50; // in milliseconds
function isPacketUnhandled(packet) {
return !Object.keys(socket._callbacks).map(x => x.substr(1)).includes(packet.data[0]);
}
// Define the function that will process the buffer
socket._packetBufferHandler = function(packet) {
if( isPacketUnhandled(packet) ) {
// packet can't be processed yet, restart wait cycle
socket._packetBuffer.push(packet);
console.log(`packet handling not completed, retrying`)
setTimeout(socket._packetBufferHandler, socket._packetBufferWaitTime, socket._packetBuffer.pop());
}
else {
// packet can be processed now, start going through buffer
socket._onevent.apply(socket, Array.prototype.slice.call(arguments));
if(socket._packetBuffer.length > 0) {
setTimeout(socket._packetBufferHandler,socket._packetBufferPopDelay(), socket._packetBuffer.pop());
}
else {
console.log(`all packets in buffer processed`)
socket._packetsWaiting = false;
}
}
}
// Replace the onevent function with a handler that captures all messages
socket.onevent = function (packet) {
// Compare the list of callbacks to the incoming event name
if( isPacketUnhandled(packet) ) {
console.log(`WARNING: Unhandled Event: ${packet.data}`);
socket._packetBuffer.push(packet);
if(!socket._packetsWaiting) {
socket._packetsWaiting = true;
setTimeout(socket._packetBufferHandler, socket._packetBufferWaitTime, socket._packetBuffer.pop());
}
}
socket._onevent.apply(socket, Array.prototype.slice.call(arguments));
};
Here the unhandled packets get pushed into the buffer and a timer is set running. Once the given amount of time has passed, if starts checking to see if the handlers for each item are ready. Each one is handled until all are exhausted or a handler is missing, which trigger another wait.
This can and will stack up unhandled calls until you blow out the client's allotted memory, so make sure that those handlers DO get loaded in a reasonable time span. And take care not to send it anything that will never get handled, because it will keep trying forever.
I tested it with really long strings and it was able to push them through, so what they are calling 'packet' is probably not a standard packet.
Tested with SocketIO version 2.2.0 on Chrome.

Categories