Can Firebase transform data server-side before writing it?

Can Firebase transform data server-side before writing it? - javascript

According to this documentation, and this accompanying example, Firebase tends to follow the following flow when transforming newly written data:
Client writes data to Firebase, which is immediately accepted
The supplied Cloud Function is triggered, which transforms the data (in the example above, it removes swear words)
The transformed data is written again, overwriting the original data written in step 1
Maybe I'm missing something here, but this flow seems to present some problems. For example, if there is an error in step 2 above, and step 3 is never fired, the un-transformed data will just linger in the database. It seems like it would be better to transform the data as soon as it hits the server, but before writing. This would be followed by a single write operation, which will leave no loose artifacts behind if it fails. Is there any way in the current Firebase + Google Cloud Functions stack to add these types of pre-write data transforms?
My (tentative and weird) solution so far is to have a "shadow" /_temp/{endpoint} area in my Firebase db, so that when I want to write to /{endpoint}, I write there instead, which then triggers the relevant cloud function to do the transformation before writing to /{endpoint}. This at least prevents potentially incomplete data from leaking into my database, but it seems very inelegant and "hacky."
I'd also be interested to know if there are any server-side methods for transforming data before responding to read requests.

There is no hook in the Firebase Database (neither through Cloud Functions nor elsewhere) that allows you to modify values before they're written to the database. The temporary queue is the idiomatic way to address this use-case. It functions pretty similar to a moderator queue in most forum software.
You could use a HTTP Function to create an endpoint that your code calls and then perform the transformation there. You could use a similar pattern for reading data, although you'd have to rebuild the realtime synchronization capabilities of Firebase yourself.

Related

Firebase Realtime Database document ordering

I am listening for new Firebase Realtime Database documents with code something like this:
firebase.database().ref(path)
.orderByChild('timestamp')
.on('child_added', snap => {
...
});
where timestamp is set on the server with firebase.database.ServerValue.TIMESTAMP. I would like to have documents always handled in timestamp order, but I am aware that documents I add locally may arrive in the above code out of order.
I can check for and fix mis-ordered arrivals but I'd prefer not to if there is some way to have this not happen. I know about this answer (and answers that link to it) but I believe that applies to an earlier API without ordering methods like orderByChild.
I believe that I should be able to get timestamp order if I always add documents using a transaction and pass false in the applyLocally argument. I am wondering if it also works to add documents from a separate Javascript context on the same client (e.g. from a Web Worker) without a transaction.
Will either or both of these approaches guarantee timestamp ordering? Is there any other way to achieve this? Among approaches that work, is one clearly superior or are there trade-offs among them?

The local estimate/latency compensation event is only fired on the client that performs the write operation. So if you perform a write operation in a different context, the original context will only see the operation when it comes from the server.
You might even be able to accomplish this by using two FirebaseApp instances, although I couldn't get that working in a quick test here myself.

Should updates to Firstore items in AngularFire be done through the AngularFirestoreCollection?

In my app, I have a list that requires an "or" condition. But, as the docs say:
In this case, you should create a separate query for each OR condition and merge the query results in your app.
As a result, in my service, I'm managing two queries and surfacing them as a single observable list to consumers.
The problem comes in with updating. I have the choice of doing extra work to match up the item needing update to the correct collection so I can do the following:
myCollection.doc(item.id).update(item);
or I can make this much more simple and just:
angularFirestore.doc(`path/to/${item.id}`).update(item);
I'm operating under the assumption that the first method will result in faster updates as I'm using the same reference that it would optimistically update instantly. And that the latter will be slower in that it would be more round about by updating the persistence layer and then the collection referencing getting notified about later (probably still a small time).
All of the above is assumption, however. I back this just with a few random instances where I've seen it take a second or two for an update or delete to show up in an other part of the view, but I haven't been able to actually inspect the process.
Does anyone know if the above is correct? Should I be doing the extra work to write through the collection references or does angularfire(and/or firestore) handle this and make them effectively the same operation under the hood?

AngularFire2 is a thin wrapper around RxFire, which itself is a relatively thin wrapper around the Firebase JavaScript SDK.
There should be no significant performance difference between updating a document through AngularFire or updating it directly through the JavaScript SDK. In both cases the majority of the time is spent in the JavaScript SDK, and on the wire between the client and server. For this reason I typically update directly through the JavaScript SDK, since it's often a bit more direct and the AngularFire abstraction has little advantage for me in write operations. Given that AngularFire is built on top of this SDK, it picks up the changes instantly even when they're not made through AngularFire.
If you have an instance where this does not seem to be the case, I recommend creating a question with the minimal, complete/standalone code that reproduces that problem.

Firebase "update" operation downloads data?

I was profiling a "download leak" in my firebase database (I'm using JavaScript SDK/firebase functions Node.js) and finally narrowed down to the "update" function which surprisingly caused data download (which impacts billing in my case quite significantly - ~50% of the bill comes from this leak):
Firebase functions index.js:
exports.myTrigger = functions.database.ref("some/data/path").onWrite((data, context) => {
var dbRootRef = data.after.ref.root;
return dbRootRef.child("/user/gCapeUausrUSDRqZH8tPzcrqnF42/wr").update({field1:"val1", field2:"val2"})
}
This function generates downloads at "/user/gCapeUausrUSDRqZH8tPzcrqnF42/wr" node
If I change the paths to something like this:
exports.myTrigger = functions.database.ref("some/data/path").onWrite((data, context) => {
var dbRootRef = data.after.ref.root;
return dbRootRef.child("/user/gCapeUausrUSDRqZH8tPzcrqnF42").update({"wr/field1":"val1", "wr/field2":"val2"})
}
It generates download at "/user/gCapeUausrUSDRqZH8tPzcrqnF42" node.
Here is the results of firebase database:profile
How can I get rid of the download while updating data or reduce the usage since I only need to upload it?

I dont think it is possible in firebase cloudfunction trigger.
The .onWrite((data, context) has a data field, which is the complete DataSnapshot.
And there is no way to configure not fetching its val.
Still, there are two things that you might do to help reduce the data cost:
Watch a smaller set for trigger. e.g. functions.database.ref("some/data/path") vs ("some").
Use more specific hook. i.e. onCreate() and onUpdate() vs onWrite().

You should expect that all operations will round trip with your client code. Otherwise, how would the client know when the work is complete? It's going to take some space to express that. The screenshot you're showing (which is very tiny and hard to read - consider copying the text directly into your question) indicates a very small amount of download data.
To get a better sense of what the real cost is, run multiple tests and see if that tiny cost is itself actually just part of the one-time handshake between the client and server when the connection is established. That cost might not be an issue as your function code maintains a persistent connection over time as the Cloud Functions instance is reused.

How to tell whether my firebase node is used?

Firebase offers some overall Analytics in their App Dashboard, however, I need to know whether my stored data are ever used or they are just lying idly on a per node basis.
Why? It's simple: we are learning while developing, which makes the app a very fast evolving one. Not only the logic changes, but also the data stored need to be refactored from time to time. I would like to get rid of abandoned and forgotten data. Any ideas?
In best case, I would like to know this:
When was a node used last time? (was it used at all?)
How many times was it used in 1h/24h/1w/1M?
Differentiate between read/write operations

2017 update
Cloud Functions trigger automatically and run on server.
https://firebase.google.com/docs/functions/
https://howtofirebase.com/firebase-cloud-functions-753935e80323
2016 answer
So apparently the Firebase itself doesn't provide any of this.
The only way I can think of right now is to create wrappers for firebase query and write functions and either do the statistics in a client app or create a devoted node for storing the statistical data.
In case of storing the data in firebase, the wrapper for writing functions (set, update, push, remove, setWithPriority) is relatively easy. The query functions (on, once) will have to write in a successCallback.

Save to 3 firebase locations with a slow internet connection

Sometimes I'm having issues with firebase when the user is on a slow mobile connection. When the user saves an entry to firebase I actually have to write to 3 different locations. Sometimes, the first one works, but if the connection is slow the 2nd and 3rd may fail.
This leaves me with entries in the first location that I constantly need to clean up.
Is there a way to help prevent this from happening?
var newTikiID = ref.child("tikis").push(tiki, function(error){
if(!error){
console.log("new tiki created")
var tikiID = newTikiID.key()
saveToUser(tikiID)
saveToGeoFire(tikiID, tiki.tikiAddress)
} else {
console.log("an error occurred during tiki save")
}
});

There is no Firebase method to write to multiple paths at once. Some future tools planned by the team (e.g. Triggers) may resolve this in the future.
This topic has been explored before and the firebase-multi-write README contains a lot of discussion on the topic. The repo also has a partial solution to client-only atomic writes. However, there is no perfect solution without a server process.
It's important to evaluate your use case and see if this really matters. If the second and third writes failed to write to a geo query, chances are, there's really no consequence. Most likely, it's essentially the same as if the first write had failed, or if all writes had failed; it won't appear in searches by geo location. Thus, the complexity of resolving this issue is probably a time sink.
Of course, it does cost a few bytes of storage. If we're working with millions of records, that may matter. A simple solution for this scenario would be to run and audit report that detects broken links between the data and geofire tables and cleans up old data.
If an atomic operation is really necessary, such as gaming mechanics where fairness or cheating could be an issue, or where integrity is lost by having partial results, there are a couple options:
1) Master Record approach
Pick a master path (the one that must exist) and use security rules to ensure other records cannot be written, unless the master path exists.
".write": "root.child('maste_path').child(newData.child('master_record_id')).exists()"
2) Server-side script approach
Instead of writing the paths separately, use a queue strategy.
Create an single event by writing a single event to a queue
Have a server-side process monitor the queue and process events
The server-side process does the multiple writes and ensures they
all succeed
If any fail, the server-side process handles
rollbacks or retries
By using the server-side queue, you remove the risk of a client going offline between writes. The server can safely survive restarts and retry events or failures when using the queue model.

I have had the same problem and I ended up choosing to use condition Conditional Request with the Firebase REST API in order to write data transactionally. See my question and answer. Firebase: How to update multiple nodes transactionally? Swift 3 .
If you need to write concurrently (but not transactionally) to several paths, you can do that now as Firebase supports multi-path updates. https://firebase.google.com/docs/database/rest/save-data
https://firebase.googleblog.com/2015/09/introducing-multi-location-updates-and_86.html

We Keep Coding

JavaScript is the programming language of the Web.