Save to 3 firebase locations with a slow internet connection - javascript

Sometimes I'm having issues with firebase when the user is on a slow mobile connection. When the user saves an entry to firebase I actually have to write to 3 different locations. Sometimes, the first one works, but if the connection is slow the 2nd and 3rd may fail.
This leaves me with entries in the first location that I constantly need to clean up.
Is there a way to help prevent this from happening?
var newTikiID = ref.child("tikis").push(tiki, function(error){
if(!error){
console.log("new tiki created")
var tikiID = newTikiID.key()
saveToUser(tikiID)
saveToGeoFire(tikiID, tiki.tikiAddress)
} else {
console.log("an error occurred during tiki save")
}
});

There is no Firebase method to write to multiple paths at once. Some future tools planned by the team (e.g. Triggers) may resolve this in the future.
This topic has been explored before and the firebase-multi-write README contains a lot of discussion on the topic. The repo also has a partial solution to client-only atomic writes. However, there is no perfect solution without a server process.
It's important to evaluate your use case and see if this really matters. If the second and third writes failed to write to a geo query, chances are, there's really no consequence. Most likely, it's essentially the same as if the first write had failed, or if all writes had failed; it won't appear in searches by geo location. Thus, the complexity of resolving this issue is probably a time sink.
Of course, it does cost a few bytes of storage. If we're working with millions of records, that may matter. A simple solution for this scenario would be to run and audit report that detects broken links between the data and geofire tables and cleans up old data.
If an atomic operation is really necessary, such as gaming mechanics where fairness or cheating could be an issue, or where integrity is lost by having partial results, there are a couple options:
1) Master Record approach
Pick a master path (the one that must exist) and use security rules to ensure other records cannot be written, unless the master path exists.
".write": "root.child('maste_path').child(newData.child('master_record_id')).exists()"
2) Server-side script approach
Instead of writing the paths separately, use a queue strategy.
Create an single event by writing a single event to a queue
Have a server-side process monitor the queue and process events
The server-side process does the multiple writes and ensures they
all succeed
If any fail, the server-side process handles
rollbacks or retries
By using the server-side queue, you remove the risk of a client going offline between writes. The server can safely survive restarts and retry events or failures when using the queue model.

I have had the same problem and I ended up choosing to use condition Conditional Request with the Firebase REST API in order to write data transactionally. See my question and answer. Firebase: How to update multiple nodes transactionally? Swift 3 .
If you need to write concurrently (but not transactionally) to several paths, you can do that now as Firebase supports multi-path updates. https://firebase.google.com/docs/database/rest/save-data
https://firebase.googleblog.com/2015/09/introducing-multi-location-updates-and_86.html

Related

Can Firebase transform data server-side before writing it?

According to this documentation, and this accompanying example, Firebase tends to follow the following flow when transforming newly written data:
Client writes data to Firebase, which is immediately accepted
The supplied Cloud Function is triggered, which transforms the data (in the example above, it removes swear words)
The transformed data is written again, overwriting the original data written in step 1
Maybe I'm missing something here, but this flow seems to present some problems. For example, if there is an error in step 2 above, and step 3 is never fired, the un-transformed data will just linger in the database. It seems like it would be better to transform the data as soon as it hits the server, but before writing. This would be followed by a single write operation, which will leave no loose artifacts behind if it fails. Is there any way in the current Firebase + Google Cloud Functions stack to add these types of pre-write data transforms?
My (tentative and weird) solution so far is to have a "shadow" /_temp/{endpoint} area in my Firebase db, so that when I want to write to /{endpoint}, I write there instead, which then triggers the relevant cloud function to do the transformation before writing to /{endpoint}. This at least prevents potentially incomplete data from leaking into my database, but it seems very inelegant and "hacky."
I'd also be interested to know if there are any server-side methods for transforming data before responding to read requests.
There is no hook in the Firebase Database (neither through Cloud Functions nor elsewhere) that allows you to modify values before they're written to the database. The temporary queue is the idiomatic way to address this use-case. It functions pretty similar to a moderator queue in most forum software.
You could use a HTTP Function to create an endpoint that your code calls and then perform the transformation there. You could use a similar pattern for reading data, although you'd have to rebuild the realtime synchronization capabilities of Firebase yourself.

Database cluster - asynchronous tasks

Let's say I have a couchDB database called "products" and a frontend with a form.
Now if a user opens a document from this database in the form I want to prevent other user from editing this specific document.
Usually pretty simple:
-> read document from couchDB
-> set a variable to true like: { edit : true }
-> save (merge) document to couchDB
-> if someone else tries to open the document he will receive an error, becaus of edit:true.
BUT, what if two user open the document at the exact same time?
The function would be called twice and when the second one opens the document he would falsely receive an edit:false because the first didn't had enough time to save his edit:true. So how to prevent this behaviour?
First solution would be:
Build an array as a cue for database requests and dont allow parallel requests, so all requests would be worked off one after another. But in my opinion this is a bad solution because the system would be incredible slow at some point.
Second solution:
Store the documentIDs of the currently edited documents in an local array in the script. This would work because this is no asynchronous process and the second user would receive his error immediately.
So far so good, BUT, what if some day there are too many user and this system should run in a cluster (the node client server, not the database) - now the second solution would not work anymore because every cluster slave would have its own array of documentIDs. Sharing there would end in another asynchronous task and result in the same problem above.
Now i'm out of ideas, how do big clustered systems usually handle problems like that?
CouchDB uses MVCC to maintain consistency in your database. When a document is being updated, you must supply both the ID (_id) and revision number (_rev) otherwise your change will be rejected.
This means that if 2 clients read the document at revision 1 and both attempt to write a change using that same revision number, only the first will be accepted by the database. The 2nd client will receive an error, and it should fetch the latest revision of the document in order to proceed.
In a single-node environment, this model prevents conflicts outright. However, in cases where replication is occurring, it is still possible to get conflicts, even when using MVCC. This is because conflicting revisions can technically be written to different nodes before they have been replicated to one another. In this case, CouchDB will record the conflict and your application is responsible to resolve them.
CouchDB has stellar documentation, in particular they have an article all about conflicts and replication that I highly recommend for this subject.

Why observing oplog takes so much time in meteor / mongo?

I have a MongoLab cluster, which allows me to use Oplog tailing in order to improve performances, availability and redundancy in my Meteor.js app.
Problem is : since I've been using it, all my publications take more time to finish. When it only take like 200ms, that's not a problem, but it often takes much more, like here, where I'm subscribing to the publication I described here.
This publication already has a too long response time, and oplog observations are slowing it too, though it's far from being the only publication where observing oplog takes that much time.
Could anyone explain to me what's happening ? Nowhere where i search on the web I find any explanation about why observing oplog slow my publication that much.
Here are some screenshots from Kadira to illustrate what I'm saying :
Here is a screenshot from another pub/sub :
And finally, one where observing oplogs take a reasonable time (but still slow my pub/sub a bit) :
Oplog tailing is very fast. Oplog tailing isn't the issue here.
You're probably doing a lot of things that you don't realize make publications slow:
One-by-one document followed by update loops: You're probably doing a document update inside the body of a Collection.forEach call. This is incredibly slow, and the origin of your poor performance in method bodies. Every time you do a single document update that's listened to by hundreds of concurrent connections, each of those need to be updated; by doing a query following by an update one at a time, neither Mongo nor Meteor can optimize and they must wait for every single user to be updated on every change. It's a doubly-asymptotic increase in your performance. Solution: Think about how to do the update using {multi:true}.
Unique queries for every user: If you make a single change to a user document that has say, 100 concurrent unique subscriptions connected to it, the connections will be notified serially. That means the first connection will be notified in 90ms, while the last connection will be notified after 90ms * 100 users later. That's the other reason your observeChanges are slow. Solution: Think about if you really need a unique subscription on each users document. Meteor has optimizations for identical subscriptions shared between multiple concurrent collections.
Lots of documents: You're probably encoding each thread comment, post, chat message, etc. as its own document. Each document needs to be sent individually to each client, introducing some related overhead. This is the right schema for a relational database, and the wrong one for a document-based database. Solution: Try to hold every single thing you need to render a page to a user in a single document (de-normalization). With regards to chat, you should have a single "conversation" document that contains all the messages between two+ users.
Database isn't co-located with your host: If you're using MongoLab, your database may not be in the same datacenter as your web host (which I assume is Galaxy or Modulus). Intra-datacenter latencies can be very, very high, and this is probably the explanation for your poor collection reads. Indices, as other commenters have noticed, might play a role, but my bet is that you have fewer than a hundred records in any of these collections so it won't really matter.

dealing with long server side calculations in meteor

I am using jimp (https://www.npmjs.com/package/jimp) in meteor JS to generate an image server side. In other words I am 'calculating' the pixels of the image using a recursive algorithm. The algorithm takes quite some time to complete.
The issue I am having is that this seems to completely block the meteor server. Users trying to visit the webpage while an image is being generated are forced to wait. The website is therefore not rendered at all.
Is there any (meteor) way to run the heavy recursive algorithm in a thread or something so that it does not block the entire website?
Node (and consequently meteor) runs in a single process which blocks on CPU activity. In short, node works really well when you are IO-bound, but as soon as you do anything that's compute-bound you need another approach.
As was suggested in the comments above, you'll need to offload this CPU-intensive activity to another process which could live on the same server (if you have multiple cores) or a different server.
We have a similar problem at Edthena were we need to transcode a subset of our video files. For now I decided to use a meteor-based solution, because it was easy to set up. Here's what we did:
When new transcode jobs need to happen, we insert a "video job" document in to the database.
On a separate server (we max out the full CPU when transcoding), we have an app which calls observe like this:
Meteor.startup(function () {
// Listen for non-failed transcode jobs in creation order. Use a limit of 1 to
// prevent multiple jobs of this type from running concurrently.
var selector = {
type: 'transcode',
state: { $ne: 'failed' },
};
var options = {
sort: { createdAt: 1 }, limit: 1,
};
VideoJobs.find(selector, options).observe({
added: function (videoJob) {
transcode(videoJob);
}, });
});
As the comments indicate this allows only one job to be called at a time, which may or may not be what you want. This has the further limitation that you can only run it on one app instance (multiple instances calling observe would simultaneously complete the job). So it's a pretty simplistic job queue, but it may work for your purposes for a while.
As you scale, you could use a more robust mechanism for dequeuing and processing the tasks like Amazon's sqs service. You can also explore other meteor-based solutions like job-collection.
I believe you're looking for Meteor.defer(yourFunction).
Relevant Kadira article: https://kadira.io/academy/meteor-performance-101/content/make-your-app-faster
Thanks for the comments and answers! It seems to be working now. What I did is what David suggested. I am running a meteor app on the same server. This app deals with the generating of the images. However, this resulted in the app still eating away all the processing power.
As a result of this I set a slightly lower priority on the generating algorithm with the renice command on the PID. (https://www.nixtutor.com/linux/changing-priority-on-linux-processes/) This works! Any time a user logs into the website the other (client) meteor application gains priority over the generating algorithm. Absolutely no delay at all anymore now.
The only issue I am having now is that whenever the server restarts I somehow have to rerun or run the (re)nice command.
Since I am using meteor up for deployment both apps run the same user and the same 'command': node main.js. I am currently trying to figure out how to run the nice command within the startup script of meteor up. (located at /etc/init/.conf)

How does NodeJS handle async file IO?

Having worked with NodeJS for some time now, I've been wondering about how node handles file operations internally.
Considering the following pseudo code:
initialize http server
on connection:
modify_some_file:
on success:
print "it worked"
Let's consider two users A & B that try to access the page nearly simultaneously. Let's further assume A is the first one to connect, then the following happens:
A connects
NodeJS initializes the file operation and tells the operating system to be notified once it is done
And here's what I'm wondering about: Let's say, the file operation isn't done yet and B connects, what does node do? How and when does it access the file when it is still in the process of "being modified"?
I hope my question is somewhat clear ;)
Looking forward to your answers!
AFAIK, Node won't care.
At least on Unix, it's perfectly legal to have multiple writers to the same file. Sometimes that's not a problem (say your file consists of fixed-size records, where writer #1 writes to record X and writer #2 writes to record Y, with X !== Y), and sometimes it is (same example: when both writers want to write to record X).
In Node, the problems are mitigated because I/O operations "take turns", but I think there's still potential of two writers getting in each others way. It's up to the programmer to make sure that doesn't happen.
With Node, you could use the *Sync() versions of the fs operations (but those will block your app during the operation), use append mode (which is only atomic up to certain write sizes I think, and it depends on your requirements if appending is actually useful), use some form of locking, or use something like a queue where write operations would be put onto the queue and there's a single queue consumer to handle the writes.

Categories