Database cluster - asynchronous tasks

Database cluster - asynchronous tasks - javascript

Let's say I have a couchDB database called "products" and a frontend with a form.
Now if a user opens a document from this database in the form I want to prevent other user from editing this specific document.
Usually pretty simple:
-> read document from couchDB
-> set a variable to true like: { edit : true }
-> save (merge) document to couchDB
-> if someone else tries to open the document he will receive an error, becaus of edit:true.
BUT, what if two user open the document at the exact same time?
The function would be called twice and when the second one opens the document he would falsely receive an edit:false because the first didn't had enough time to save his edit:true. So how to prevent this behaviour?
First solution would be:
Build an array as a cue for database requests and dont allow parallel requests, so all requests would be worked off one after another. But in my opinion this is a bad solution because the system would be incredible slow at some point.
Second solution:
Store the documentIDs of the currently edited documents in an local array in the script. This would work because this is no asynchronous process and the second user would receive his error immediately.
So far so good, BUT, what if some day there are too many user and this system should run in a cluster (the node client server, not the database) - now the second solution would not work anymore because every cluster slave would have its own array of documentIDs. Sharing there would end in another asynchronous task and result in the same problem above.
Now i'm out of ideas, how do big clustered systems usually handle problems like that?

CouchDB uses MVCC to maintain consistency in your database. When a document is being updated, you must supply both the ID (_id) and revision number (_rev) otherwise your change will be rejected.
This means that if 2 clients read the document at revision 1 and both attempt to write a change using that same revision number, only the first will be accepted by the database. The 2nd client will receive an error, and it should fetch the latest revision of the document in order to proceed.
In a single-node environment, this model prevents conflicts outright. However, in cases where replication is occurring, it is still possible to get conflicts, even when using MVCC. This is because conflicting revisions can technically be written to different nodes before they have been replicated to one another. In this case, CouchDB will record the conflict and your application is responsible to resolve them.
CouchDB has stellar documentation, in particular they have an article all about conflicts and replication that I highly recommend for this subject.

Related

How can I check if there is an instance running for a certain block of code in Cloud Functions for Firebase?

I would like to know if it is possible to detect that a thread is already running a Cloud Functions, and if possible to also detect if it is running on a particular ID's data. I think I could have a variable stored in firebase memory of the ID in Firebase Database that the function is being run on from the Database, and to remove the variable when the function is done running,but the concern is of two writes to the database happening subsequently and very rapidly, causing the initial thread to not be able to write to memory fast enough before the second thread checks if the variable is there, especially on a cold start from the firebase thread - which in my understanding is a variable amount of time in which either thread could potentially spin up first.
My use case is this:
Let's say a write to the realtime database happens from the client side that causes a trigger for Cloud Functions to run a handler. This handlers job is to loop through and do work with the snapshot of records that was just written to by the client, and using a loop will parse each record in the snapshot, and when it is done, delete them. The handler works great until another record is written to the same group of records in the database before the handler's job is done, which causes a second handler thread to spin up, and start moving through the records in the same group of records, which would cause records to be iterated over twice, and possibly the data to be handled twice.
I have other solutions for my particular case which I can use instead, but it involves just allowing each record to trigger a separate thread like normal.
Thanks in advance!

There is no way to track running instances "in-memory" for Cloud Functions, as each function invocation may be running in entirely different virtual infra. Instead, what you'd most likely want to do here is have some kind of lock persisted in e.g. the Firebase Realtime Database using a transaction. So you'd do something like:
When the function invocation starts, generate a random "worker ID".
Run a transaction to check a DB path derived from the file you're processing. If it's empty, or populated with a timestamp that is older than a function timeout, write your worker ID and the current timestamp to the location. If it's not empty or the timestamp is fresh, exit your function immediately because there's already an active worker.
Do your file processing.
Run another transaction that deletes the lock from the DB if the worker ID in the DB still matches your worker ID.
This will prevent two functions from processing the same file at the same time. It will mean, however, that any functions that execute while a path is locked will be discarded (which may or may not be what you want).

Can Firebase transform data server-side before writing it?

According to this documentation, and this accompanying example, Firebase tends to follow the following flow when transforming newly written data:
Client writes data to Firebase, which is immediately accepted
The supplied Cloud Function is triggered, which transforms the data (in the example above, it removes swear words)
The transformed data is written again, overwriting the original data written in step 1
Maybe I'm missing something here, but this flow seems to present some problems. For example, if there is an error in step 2 above, and step 3 is never fired, the un-transformed data will just linger in the database. It seems like it would be better to transform the data as soon as it hits the server, but before writing. This would be followed by a single write operation, which will leave no loose artifacts behind if it fails. Is there any way in the current Firebase + Google Cloud Functions stack to add these types of pre-write data transforms?
My (tentative and weird) solution so far is to have a "shadow" /_temp/{endpoint} area in my Firebase db, so that when I want to write to /{endpoint}, I write there instead, which then triggers the relevant cloud function to do the transformation before writing to /{endpoint}. This at least prevents potentially incomplete data from leaking into my database, but it seems very inelegant and "hacky."
I'd also be interested to know if there are any server-side methods for transforming data before responding to read requests.

There is no hook in the Firebase Database (neither through Cloud Functions nor elsewhere) that allows you to modify values before they're written to the database. The temporary queue is the idiomatic way to address this use-case. It functions pretty similar to a moderator queue in most forum software.
You could use a HTTP Function to create an endpoint that your code calls and then perform the transformation there. You could use a similar pattern for reading data, although you'd have to rebuild the realtime synchronization capabilities of Firebase yourself.

Slow app with LocalStorage and AngularJS

I created an app that stores, compares, filters and takes statistics out of a collection of records. I've done it so it works offline, as in some user cases the user might not have constant (or at all) access to internet.
My problem is that after I've included ~60 records, the app starts to behave really slow. For instance, I list a collection of simple objects from LocalStorage into a ng-model (Select list), and after those ~60 records are in, to open the Select box will be seriously slowed down.
What could the problem be? I'm thinking, either some function is sucking more resources than necessary, or LocalStorage is not intended for such uses?
I'm starting to get into PouchDB, would you say that migrating all to Pouch instead of LocalStorage would be a good move?
I can't paste the whole controller here as it's huge, but I've put an online version for testing. You can see it here.
For you not to have to create 60 records just to see the effect, you can download this CSV and import it in the app.
In order to import, the pass for Edit Mode is: admin
Let's see if someone has a tip for this one!

I see you are storing all your records inside a single LocalStorage value (with the key being recordspax). So yeah, that will get quite slow, because your app has to 1) JSON parse/stringify and 2) store/retrieve the entire list every time you read/write data to the database.
Basically you are reading your entire database in and out of disk for every operation. Since both LocalStorage and JSON stringify/parse happen synchronously on the main thread, it can block DOM rendering and will thus slow down your app.
PouchDB could be a help here, but you could also benefit from something simpler like LocalForage, or simply changing your DB design so that every record has its own key/value rather than storing everything into a single key with a single value.
(Both LocalForage and PouchDB use IndexedDB/WebSQL rather than LocalStorage, meaning that database operations are not synchronous and do not block the DOM. However, you still don't want to stuff everything into a single document and therefore read the entire DB in and out of disk. :))

Save to 3 firebase locations with a slow internet connection

Sometimes I'm having issues with firebase when the user is on a slow mobile connection. When the user saves an entry to firebase I actually have to write to 3 different locations. Sometimes, the first one works, but if the connection is slow the 2nd and 3rd may fail.
This leaves me with entries in the first location that I constantly need to clean up.
Is there a way to help prevent this from happening?
var newTikiID = ref.child("tikis").push(tiki, function(error){
if(!error){
console.log("new tiki created")
var tikiID = newTikiID.key()
saveToUser(tikiID)
saveToGeoFire(tikiID, tiki.tikiAddress)
} else {
console.log("an error occurred during tiki save")
}
});

There is no Firebase method to write to multiple paths at once. Some future tools planned by the team (e.g. Triggers) may resolve this in the future.
This topic has been explored before and the firebase-multi-write README contains a lot of discussion on the topic. The repo also has a partial solution to client-only atomic writes. However, there is no perfect solution without a server process.
It's important to evaluate your use case and see if this really matters. If the second and third writes failed to write to a geo query, chances are, there's really no consequence. Most likely, it's essentially the same as if the first write had failed, or if all writes had failed; it won't appear in searches by geo location. Thus, the complexity of resolving this issue is probably a time sink.
Of course, it does cost a few bytes of storage. If we're working with millions of records, that may matter. A simple solution for this scenario would be to run and audit report that detects broken links between the data and geofire tables and cleans up old data.
If an atomic operation is really necessary, such as gaming mechanics where fairness or cheating could be an issue, or where integrity is lost by having partial results, there are a couple options:
1) Master Record approach
Pick a master path (the one that must exist) and use security rules to ensure other records cannot be written, unless the master path exists.
".write": "root.child('maste_path').child(newData.child('master_record_id')).exists()"
2) Server-side script approach
Instead of writing the paths separately, use a queue strategy.
Create an single event by writing a single event to a queue
Have a server-side process monitor the queue and process events
The server-side process does the multiple writes and ensures they
all succeed
If any fail, the server-side process handles
rollbacks or retries
By using the server-side queue, you remove the risk of a client going offline between writes. The server can safely survive restarts and retry events or failures when using the queue model.

I have had the same problem and I ended up choosing to use condition Conditional Request with the Firebase REST API in order to write data transactionally. See my question and answer. Firebase: How to update multiple nodes transactionally? Swift 3 .
If you need to write concurrently (but not transactionally) to several paths, you can do that now as Firebase supports multi-path updates. https://firebase.google.com/docs/database/rest/save-data
https://firebase.googleblog.com/2015/09/introducing-multi-location-updates-and_86.html

Node.js, MongoDB, and Concurrency

I'm working on a game prototype and worried about the following case: Browser does AJAX to Node.JS, which has to do several MongoDB operations using async.series.
What prevents multiple requests at the same time causing the database issues? New events (i.e. db operations) seem like they could be run out of order or in between the async.series steps.
In other words, what happens if a user does AJAX calls very quickly, before the prior ones have finished their async.series. Hopefully that makes sense.
If this is indeed an issue, what is the proper way to handle it?

First and foremost, #fmodos's comment should be completely disregarded. It is wrong on many levels but most simply you could have any number of nodes running (say on Heroku) and there is no guarantee that subsequent requests will hit the same node.
Now, I'm going to answer your question by asking more questions. (You really didn't give me a choice here)
What are these operations doing? Inserting documents? Updating existing documents? Removing documents? This is very important because if all you're doing is simply inserting documents then why does it matter if one finishes for before the other? If you're updating documents then you should NOT be issuing a find, grabbing a ref to the object, and then calling save. (I'm making the assumption you're using mongoose, if you're not, I would) Instead what you should be doing is using built in mongo functions like $inc which properly handle concurrent requests.
http://docs.mongodb.org/manual/reference/operator/update/inc/
Does that help at all? If not, please let me know and I will give it another shot.

Mongo has database wide read/write locks. It gives preference to writes of the same collection first then fulfills reads. So, if by chance, you have Bill writing to the db and Joe is reading at the same time, Bill's write will execute first while Joe waits until the write is complete and then he is given all the data (including Bill's).

We Keep Coding

JavaScript is the programming language of the Web.