Firebase transactions bug

Firebase transactions bug - javascript

Consider the following:
function useCredits(userId, amount){
var userRef = firebase.database().ref().child('users').child(userId);
userRef.transaction(function(user) {
if (!user){
return user;
}
user.credits -= amount;
return user;
}, NOOP, false);
}
function notifyUser(userId, message){
var notificationId = Math.random();
var userNotificationRef = firebase.database().ref().child('users').child(userId).child('notifications').child(notificationId);
userNotificationRef.transaction(function(notification) {
return message;
}, NOOP, false);
}
These are called from the same node js process.
A user looks like this:
{
"name": 'Alex',
"age": 22,
"credits": 100,
"notifications": {
"1": "notification 1",
"2": "notification 2"
}
}
When I run my stress tests I notice that sometimes the user object passed to the userRef transaction update function is not the full user it is only the following:
{
"notifications": {
"1": "notification 1",
"2": "notification 2"
}
}
This obviously causes an Error because user.credits does not exist.
It is suspicious that the user object passed to update function of the userRef transaction is the same as the data returned by the userNotificationRef transaction's update function.
Why is this the case? This problem goes away if I run both transactions on the user parent location, but this is a less optimal solution as I am then effectively locking on and reading the whole user object, which is redundant when adding a write once notification.

In my experience, you can't rely on the initial value passed into a transaction update function. Even if the data is populated in the datastore, the function might be called with null, a partial value, or a stale old value (in case of a local update in flight). This is not usually a problem as long as you take a defensive approach when writing the function (and you should!), since the bogus update will be refused and the transaction retried.
But beware: if you abort the transaction (by returning undefined) because the data doesn't make sense, then it's not checked against the server and won't get retried. For this reason, I recommend never aborting transactions. I built a monkey patch to apply this fix (and others) transparently; it's browser-only but could be adapted to Node trivially.
Another thing you can do to help a bit is to insert an on('value') call on the same ref just before the transaction and keep it alive until the transaction completes. This will usually cause the transaction to run on the correct data on the first try, doesn't affect bandwidth too much (since the current value would need to be transmitted anyway), and increases local latency a little if you have applyLocally set or defaulting to true. I do this in my NodeFire library, among many other optimizations and tweaks.
On top of all the above, as of this writing there's still a bug in the SDK where very rarely the wrong base value will get "stuck" and the transaction retry continuously (failing with maxretry every so often) until you restart the process.
Good luck! I still use transactions in my server, where failures can be retried easily and I have multiple processes running, but have given up on using them on the client -- they're just too unreliable. In my opinion it's often better to redesign your data structures so that transactions aren't needed.

Related

Global memoizing fetch() to prevent multiple of the same request

I have an SPA and for technical reasons I have different elements potentially firing the same fetch() call pretty much at the same time.[1]
Rather than going insane trying to prevent multiple unrelated elements to orchestrate loading of elements, I am thinking about creating a gloabalFetch() call where:
the init argument is serialised (along with the resource parameter) and used as hash
when a request is made, it's queued and its hash is stored
when another request comes, and the hash matches (which means it's in-flight), another request will NOT be made, and it will piggy back from the previous one
async function globalFetch(resource, init) {
const sigObject = { ...init, resource }
const sig = JSON.stringify(sigObject)
// If it's already happening, return that one
if (globalFetch.inFlight[sig]) {
// NOTE: I know I don't yet have sig.timeStamp, this is just to show
// the logic
if (Date.now - sig.timeStamp < 1000 * 5) {
return globalFetch.inFlight[sig]
} else {
delete globalFetch.inFlight[sig]
}
const ret = globalFetch.inFlight[sig] = fetch(resource, init)
return ret
}
globalFetch.inFlight = {}
It's obviously missing a way to have the requests' timestamps. Plus, it's missing a way to delete old requests in batch. Other than that... is this a good way to go about it?
Or, is there something already out there, and I am reinventing the wheel...?
[1] If you are curious, I have several location-aware elements which will reload data independently based on the URL. It's all nice and decoupled, except that it's a little... too decoupled. Nested elements (with partially matching URLs) needing the same data potentially end up making the same request at the same time.

Your concept will generally work just fine.
Some thing missing from your implementation:
Failed responses should either not be cached in the first place or removed from the cache when you see the failure. And failure is not just rejected promises, but also any request that doesn't return an appropriate success status (probably a 2xx status).
JSON.stringify(sigObject) is not a canonical representation of the exact same data because properties might not be stringified in the same order depending upon how the sigObject was built. If you grabbed the properties, sort them and inserted them in sorted order onto a temporary object and then stringified that, it would be more canonical.
I'd recommend using a Map object instead of a regular object for globalFetch.inFlight because it's more efficient when you're adding/removing items regularly and will never have any name collision with property names or methods (though your hash would probably not conflict anyway, but it's still a better practice to use a Map object for this kind of thing).
Items should be aged from the cache (as you apparently know already). You can just use a setInterval() that runs every so often (it doesn't have to run very often - perhaps every 30 minutes) that just iterates through all the items in the cache and removes any that are older than some amount of time. Since you're already checking the time when you find one, you don't have to clean the cache very often - you're just trying to prevent non-stop build-up of stale data that isn't going to be re-requested - so it isn't getting automatically replaced with newer data and isn't being used from the cache.
If you have any case insensitive properties or values in the request parameters or the URL, the current design would see different case as different requests. Not sure if that matters in your situation or not or if it's worth doing anything about it.
When you write the real code, you need Date.now(), not Date.now.
Here's a sample implementation that implements all of the above (except for case sensitivity because that's data-specific):
function makeHash(url, obj) {
// put properties in sorted order to make the hash canonical
// the canonical sort is top level only,
// does not sort properties in nested objects
let items = Object.entries(obj).sort((a, b) => b[0].localeCompare(a[0]));
// add URL on the front
items.unshift(url);
return JSON.stringify(items);
}
async function globalFetch(resource, init = {}) {
const key = makeHash(resource, init);
const now = Date.now();
const expirationDuration = 5 * 1000;
const newExpiration = now + expirationDuration;
const cachedItem = globalFetch.cache.get(key);
// if we found an item and it expires in the future (not expired yet)
if (cachedItem && cachedItem.expires >= now) {
// update expiration time
cachedItem.expires = newExpiration;
return cachedItem.promise;
}
// couldn't use a value from the cache
// make the request
let p = fetch(resource, init);
p.then(response => {
if (!response.ok) {
// if response not OK, remove it from the cache
globalFetch.cache.delete(key);
}
}, err => {
// if promise rejected, remove it from the cache
globalFetch.cache.delete(key);
});
// save this promise (will replace any expired value already in the cache)
globalFetch.cache.set(key, { promise: p, expires: newExpiration });
return p;
}
// initalize cache
globalFetch.cache = new Map();
// clean up interval timer to remove expired entries
// does not need to run that often because .expires is already checked above
// this just cleans out old expired entries to avoid memory increasing
// indefinitely
globalFetch.interval = setInterval(() => {
const now = Date.now()
for (const [key, value] of globalFetch.cache) {
if (value.expires < now) {
globalFetch.cache.delete(key);
}
}
}, 10 * 60 * 1000); // run every 10 minutes
Implementation Notes:
Depending upon your situation, you may want to customize the cleanup interval time. This is set to run a cleanup pass every 10 minutes just to keep it from growing unbounded. If you were making millions of requests, you'd probably run that interval more often or cap the number of items in the cache. If you aren't making that many requests, this can be less frequent. It is just to clean up old expired entries sometime so they don't accumulate forever if never re-requested. The check for the expiration time in the main function already keeps it from using expired entries - that's why this doesn't have to run very often.
This looks as response.ok from the fetch() result and promise rejection to determine a failed request. There could be some situations where you want to customize what is and isn't a failed request with some different criteria than that. For example, it might be useful to cache a 404 to prevent repeating it within the expiration time if you don't think the 404 is likely to be transitory. This really depends upon your specific use of the responses and behavior of the specific host you are targeting. The reason to not cache failed results is for cases where the failure is transitory (either a temporary hiccup or a timing issue and you want a new, clean request to go if the previous one failed).
There is a design question for whether you should or should not update the .expires property in the cache when you get a cache hit. If you do update it (like this code does), then an item could stay in the cache a long time if it keeps getting requested over and over before it expires. But, if you really want it to only be cached for a maximum amount of time and then force a new request, you can just remove the update of the expiration time and let the original result expire. I can see arguments for either design depending upon the specifics of your situation. If this is largely invariant data, then you can just let it stay in the cache as long as it keeps getting requested. If it is data that can change regularly, then you may want it to be cached no more than the expiration time, even if its being requested regularly.

Consider using a ServiceWorker or Workbox to separate caching logic from your application. The Stale-While-Revalidate strategy could apply here.

How to cancel a wasm process from within a webworker

I have a wasm process (compiled from c++) that processes data inside a web application. Let's say the necessary code looks like this:
std::vector<JSONObject> data
for (size_t i = 0; i < data.size(); i++)
{
process_data(data[i]);
if (i % 1000 == 0) {
bool is_cancelled = check_if_cancelled();
if (is_cancelled) {
break;
}
}
}
This code basically "runs/processes a query" similar to a SQL query interface:
However, queries may take several minutes to run/process and at any given time the user may cancel their query. The cancellation process would occur in the normal javascript/web application, outside of the service Worker running the wasm.
My question then is what would be an example of how we could know that the user has clicked the 'cancel' button and communicate it to the wasm process so that knows the process has been cancelled so it can exit? Using the worker.terminate() is not an option, as we need to keep all the loaded data for that worker and cannot just kill that worker (it needs to stay alive with its stored data, so another query can be run...).
What would be an example way to communicate here between the javascript and worker/wasm/c++ application so that we can know when to exit, and how to do it properly?
Additionally, let us suppose a typical query takes 60s to run and processes 500MB of data in-browser using cpp/wasm.
Update: I think there are the following possible solutions here based on some research (and the initial answers/comments below) with some feedback on them:
Use two workers, with one worker storing the data and another worker processing the data. In this way the processing-worker can be terminated, and the data will always remain. Feasible? Not really, as it would take way too much time to copy over ~ 500MB of data to the webworker whenever it starts. This could have been done (previously) using SharedArrayBuffer, but its support is now quite limited/nonexistent due to some security concerns. Too bad, as this seems like by far the best solution if it were supported...
Use a single worker using Emterpreter and using emscripten_sleep_with_yield. Feasible? No, destroys performance when using Emterpreter (mentioned in the docs above), and slows down all queries by about 4-6x.
Always run a second worker and in the UI just display the most recent. Feasible? No, would probably run into quite a few OOM errors if it's not a shared data structure and the data size is 500MB x 2 = 1GB (500MB seems to be a large though acceptable size when running in a modern desktop browser/computer).
Use an API call to a server to store the status and check whether the query is cancelled or not. Feasible? Yes, though it seems quite heavy-handed to long-poll with network requests every second from every running query.
Use an incremental-parsing approach where only a row at a time is parsed. Feasible? Yes, but also would require a tremendous amount of re-writing the parsing functions so that every function supports this (the actual data parsing is handled in several functions -- filter, search, calculate, group by, sort, etc. etc.
Use IndexedDB and store the state in javascript. Allocate a chunk of memory in WASM, then return its pointer to JavaScript. Then read database there and fill the pointer. Then process your data in C++. Feasible? Not sure, though this seems like the best solution if it can be implemented.
[Anything else?]
In the bounty then I was wondering three things:
If the above six analyses seem generally valid?
Are there other (perhaps better) approaches I'm missing?
Would anyone be able to show a very basic example of doing #6 -- seems like that would be the best solution if it's possible and works cross-browser.

For Chrome (only) you may use shared memory (shared buffer as memory). And raise a flag in memory when you want to halt. Not a big fan of this solution (is complex and is supported only in chrome). It also depends on how your query works, and if there are places where the lengthy query can check the flag.
Instead you should probably call the c++ function multiple times (e.g. for each query) and check if you should halt after each call (just send a message to the worker to halt).
What I mean by multiple time is make the query in stages (multiple function cals for a single query). It may not be applicable in your case.
Regardless, AFAIK there is no way to send a signal to a Webassembly execution (e.g. Linux kill). Therefore, you'll have to wait for the operation to finish in order to complete the cancellation.
I'm attaching a code snippet that may explain this idea.
worker.js:
... init webassembly
onmessage = function(q) {
// query received from main thread.
const result = ... call webassembly(q);
postMessage(result);
}
main.js:
const worker = new Worker("worker.js");
const cancel = false;
const processing = false;
worker.onmessage(function(r) {
// when worker has finished processing the query.
// r is the results of the processing.
processing = false;
if (cancel === true) {
// processing is done, but result is not required.
// instead of showing the results, update that the query was canceled.
cancel = false;
... update UI "cancled".
return;
}
... update UI "results r".
}
function onCancel() {
// Occurs when user clicks on the cancel button.
if (cancel) {
// sanity test - prevent this in UI.
throw "already cancelling";
}
cancel = true;
... update UI "canceling".
}
function onQuery(q) {
if (processing === true) {
// sanity test - prevent this in UI.
throw "already processing";
}
processing = true;
// Send the query to the worker.
// When the worker receives the message it will process the query via webassembly.
worker.postMessage(q);
}
An idea from user experience perspective:
You may create ~two workers. This will take twice the memory, but will allow you to "cancel" "immediately" once. (it will just mean that in the backend the 2nd worker will run the next query, and when the 1st finishes the cancellation, cancellation will again become immediate).

Shared Thread
Since the worker and the C++ function that it called share the same thread, the worker will also be blocked until the C++ loop is finished, and won't be able to handle any incoming messages. I think the a solid option would minimize the amount of time that the thread is blocked by instead initializing one iteration at a time from the main application.
It would look something like this.
main.js -> worker.js -> C++ function -> worker.js -> main.js
Breaking up the Loop
Below, C++ has a variable initialized at 0, which will be incremented at each loop iteration and stored in memory.
C++ function then performs one iteration of the loop, increments the variable to keep track of loop position, and immediately breaks.
int x;
x = 0; // initialized counter at 0
std::vector<JSONObject> data
for (size_t i = x; i < data.size(); i++)
{
process_data(data[i]);
x++ // increment counter
break; // stop function until told to iterate again starting at x
}
Then you should be able to post a message to the web worker, which then sends a message to main.js that the thread is no longer blocked.
Canceling the Operation
From this point, main.js knows that the web worker thread is no longer blocked, and can decide whether or not to tell the web worker to execute the C++ function again (with the C++ variable keeping track of the loop increment in memory.)
let continueOperation = true
// here you can set to false at any time since the thread is not blocked here
worker.expensiveThreadBlockingFunction()
// results in one iteration of the loop being iterated until message is received below
worker.onmessage = function(e) {
if (continueOperation) {
worker.expensiveThreadBlockingFunction()
// execute worker function again, ultimately continuing the increment in C++
} {
return false
// or send message to worker to reset C++ counter to prepare for next execution
}
}
Continuing the Operation
Assuming all is well, and the user has not cancelled the operation, the loop should continue until finished. Keep in mind you should also send a distinct message for whether the loop has completed, or needs to continue, so you don't keep blocking the worker thread.

Firebase Web SDK - Transaction causing on('child_added', func.. to be called

See jsbin.com/ceyiqi/edit?html,console,output for a verifiable example.
I have a reference listening to a database point
jobs/<key>/list
where <key> is the teams unique number
Under this entry point is a list of jobs
I have a listener on this point with
this.jobsRef.orderByChild('archived')
.equalTo(false)
.on('child_added', function(data) {
I also have a method that does the following transaction:
ref.transaction(function(post) {
// correct the counter
if(post)
{
// console.log(post);
if(active)
{
// if toggeling on
}
else
{
// if toggeling off
}
}
return post;
})
When invoking the transaction the child_added is also invoked again, giving me duplicate jobs.
Is this expected behavior?
Should I simply check to see if the item has been got before and add to the array accordingly?
Or am I doing something wrong?
Thanks in advance for your time

You're hitting an interesting edge case in how the Firebase client handles transactions. Your transaction function is running twice, first on null and then on the actual data. It's apparent to see if you listen for value events on /jobs, since then you'll see:
initial value
null (when transaction starts and runs on null)
initial value again (when transaction runs again on real data)
The null value in step 2 is the client's initial guess for the current value. Since the client doesn't have cached data for /jobs (it ignores the cached data for /jobs/list), it guesses null. It is clearly wrong. But unfortunately has been like this for a long time, so it's unlikely to change in the current major version of the SDK.
And because of the null in step 2, you'll get child_removed events (which you're not handling right now) and then in step 3 you'll get child_added events to re-add them.
If you handled the child_removed events, you're items wouldn't end up duplicated, but they would still disappear / reappear, which probably isn't desirable. Another workaround in the current setup is to explicitly tell the transaction to not run with the local estimate, which you can do by passing in false as the third parameter:
function transact() {
var path = 'jobs/';
var ref = firebase.database().ref(path);
ref.transaction(function(post) {
return post;
}, function(error, committed, snapshot) {
if (error) {
console.log('Transaction failed abnormally!', error);
} else if (!committed) {
console.log('Transaction aborted.');
} else {
console.log('Transaction completed.');
}
}, false);
}
I'm sorry I don't have a better solution. But as I said: it's very unlikely that we'll be able to change this behavior in the current generation of SDKs.

How to fail test if response from server is given?

I am working on an socket.io IRC and I don't want users to have a long username. I wrote the following (mocha) test to verify that the server doesn't send out a response to every connected socket when a longer username is provided:
it("should not accept usernames longer than 15 chars", function (done) {
var username = "a".repeat(server.getMaxUsernameLength() + 1);
client1.emit("username change", username);
client2.on("chat message", function (data) {
throw Error("Fail, server did send a response.");
});
setTimeout(function () {
done();
}, 50);
});
This currently does work, but it's far from optimal. What if my CI platform is slower or the server does respond after more than 50 ms? What's the best way to fail a test when a response is given, or should I structure my tests differently?
Thanks!
P.s. This question is different from Testing asynchronous function with mocha, because while the problem does have to do with asynchronous testing, I am aware of the done() method (and I'm using it obviously).

What you're trying to do is verify that the callback to client2.on("chat message"... is never called. Testing for negative cases can be tough, and your case seems to be exacerbated by the fact that you're trying to do a complete end-to-end (client-to-server-to-client) integration test. Personally, I would try to test this in a unit case suite and avoid introducing the complexity of asynchronicity to the test.
However, if it must be done, here's a tip from Eradicating Non-Determinism in Tests:
This is the trickiest case since you can test for your expected response, but there's nothing to do to detect a failure other than timing-out. If the provider is something you're building you can handle this by ensuring the provider implements some way of indicating that it's done - essentially some form of callback. Even if only the testing code uses it, it's worth it - although often you'll find this kind of functionality is valuable for other purposes too.
Your server should send some sort of notice to client1 that it's going to ignore the name change, even if you aren't testing, but since you are you could use such a notification to verify that it really didn't send a notification to the other client. So something like:
it("should not accept usernames longer than 15 chars", function (done) {
var chatSpy = sinon.spy();
client2.on("chat message", chatSpy);
client1.on('error', function(err) {
assertEquals(err.msg, 'Username too long');
assert(chatSpy.neverCalledWith(...));
done();
});
var username = "a".repeat(server.getMaxUsernameLength() + 1);
client1.emit("username change", username);
});
would be suitable.
Also, if for whatever reason, server.getMaxUsernameLength() ever starts returning something other than 15, the best case scenario is that your test description becomes wrong. It can become worse if getMaxUsernameLength and the server code for handling the name change event don't get their values from the same place. A test probably should not rely on the system under test to provide test values.

Firebase transactions coming back multiple times much later?

This is a fairly weird thing, and it's hard to reproduce. Not the best state of a bug report, I apologize.
I'm using .transaction() to write a value to a location in Firebase. Here's some pseudo-code:
var ref = firebase.child('/path/to/location');
var storeSafely = function(val) {
ref.transaction(
function updateFunc(currentData) {
console.log('Attempting update: ' + JSON.stringify(val));
if (currentData) return;
return val;
},
function onTransactionCompleteFunc(err, isCommitted, snap) {
if (err) {
console.log('Error in onTransactionCompleteFunc: ' + JSON.stringify(err));
return;
}
if (! isCommitted) {
console.log('Not committed');
return;
}
ref.onDisconnect().remove();
doSomeStuff();
});
};
var doSomeStuff = function() {
// Things get done, time passes.
console.log('Cleaning up');
ref.onDisconnect().cancel();
ref.set(
null,
function onSetCompleteFunc(err) {
if (err) {
console.log('Error in onSetCompleteFunc: ' + JSON.stringify(err));
}
});
};
storeSafely(1);
// later...
storeSafely(2);
// even later...
storeSafely(3);
I'm effectively using Firebase transactions as a sort of mutex lock:
Store a value at a location via transaction.
Set the onDisconnect for the location to remove the value in case my app dies while working.
Do some stuff.
Remove the onDisconnect for the location, because I'm done with the stuff.
Remove the value at the location.
I do this every few minutes, and it all works great. Things get written and removed perfectly, and the logs show me creating the lock, doing stuff, and then releasing the lock.
The weird part is what happens hours later. Occasionally Firebase has maintenance, and my app gets a bunch of permission denied errors. At the same time this happens, I suddenly start getting a bunch of this output in the logs:
Attempting update 1
Attempting update 2
Attempting update 3
...in other words, it looks like the transactions never fully completed, and they're trying to retry now that the location can't be read any more. It's almost like there's a closure in the transaction() code that never completed, and it's getting re-executed now for some reason.
Am I missing something really important here about how to end a transaction?
(Note: I originally posted this to the Firebase Google Group, but was eventually reminded that code questions are supposed to go to Stack Overflow. I apologize for the cross-posting.)

Just a guess, but I wonder if your updateFunc() function is being called with null when your app gets the permission-denied errors from Firebase. (If so, I could believe that's part of their "Offline Writes" support.)
In any case, you should handle null as a possible state. Saving Transactional Data says:
transaction() will be called multiple times and must be able to handle
null data. Even if there is existing data in your database it may not
be locally cached when the transaction function is run.
I don't know the intricacies of Firebase's transaction mechansim, but I would try changing your .set(null) to set the value to 0 instead, change your .remove() to also set the value with .set(0), and change your line in updateFunc() to:
if (currentData === null || currentData) return;
Unfortunately, that assumes that '/path/to/location' is initially set to 0 at some point. If that's a problem, maybe you can muck around with null versus undefined. For example, it would be nice if Firebase used one of those for non-existent data and another when it's offline.

We Keep Coding

JavaScript is the programming language of the Web.