I have an node-js application that I'm switching from a single-tenant database to a multi-tenant database. The application code is called from an express api but there are also services that run through a different entrypoints, so req.session is not always available.
Currently I have database function calls all throughout the app like:
database.select.users.findByUserId(123, callback)
Since the app is changing to multi-tenant database, I need to be able to send the postgreSQL schemaName to the database functions. I know I can edit the signature of every database call to this:
database.select.users.findByUserId(schemaName, 123, callback)
But it's very labor intensive, broad sweeping, and is going to create a lot of bugs. I'm hoping to find a safe way to pass the postgres schemaName to the database wrapper, without having a race condition of some kind where this "global" schemaName variable is somehow overwritten by another caller, thus sending the wrong data.
Here's some psuedo-code of what I'm considering writing, but I'm worried it wont be "thread-safe" once we deploy.
// as early as possible in the app call:
database.session.initSchema('schema123');
//session.js
let schema = null;
module.exports.initSchema = function (s) {
schema = s;
};
module.exports.getSchema = function () {
return schema;
};
// before I hit the database, i would call getSchema() and pass it to postgreSQL
This approach works, but what if Caller2 calls initForSchema() with different values while Caller1 hasn't finished executing? How can I distinguish which caller is asking for the data when using one variable like this? Is there any way for me to solve this problem safely without editing the signature of every database function call? Thanks for the advice.
edit
I'm leaning towards this solution:
database.session.initSchema('schema123');
//then immediately call
database.select.users.findByUserId(123, callback);
The advantage here is that nothing asynchonous happens between the two calls, which should nullify the race condition possibility, while keeping the original findByUserId signature.
I don't think doing what you're thinking will work because I don't see a way you're going to get around those race conditions. If you do:
app.use((request, response, next) => {
// initialize database schema
next()
})
It would be ideal because then you can do it only once across all routes, but another request might hit the server a millisecond later and it changes the schema again.
Alternatively you can do that in each separate route, which would work, but then you're doing just as much work as just doing it in the database call in the first place. If you have to reinitialize the schema in each route then it's the same thing as just doing it in the call itself.
I was thinking for a while about a solution and then best I can come up with is whether or not you can do it in the connection pool itself. I have no idea what package you're using or how it's creating DB connections. But something like this:
const database = connection.getInstance('schemaExample')
// continue to do database things here
Just to show an example of what I'm thinking. That way you can create multiple connection pools for the different schemas on startup and you can just query on the one with the correct schema avoiding all the race conditions.
The idea being that even if another request comes in now and uses a different schema, it will be executing on a different database connection.
Related
I was profiling a "download leak" in my firebase database (I'm using JavaScript SDK/firebase functions Node.js) and finally narrowed down to the "update" function which surprisingly caused data download (which impacts billing in my case quite significantly - ~50% of the bill comes from this leak):
Firebase functions index.js:
exports.myTrigger = functions.database.ref("some/data/path").onWrite((data, context) => {
var dbRootRef = data.after.ref.root;
return dbRootRef.child("/user/gCapeUausrUSDRqZH8tPzcrqnF42/wr").update({field1:"val1", field2:"val2"})
}
This function generates downloads at "/user/gCapeUausrUSDRqZH8tPzcrqnF42/wr" node
If I change the paths to something like this:
exports.myTrigger = functions.database.ref("some/data/path").onWrite((data, context) => {
var dbRootRef = data.after.ref.root;
return dbRootRef.child("/user/gCapeUausrUSDRqZH8tPzcrqnF42").update({"wr/field1":"val1", "wr/field2":"val2"})
}
It generates download at "/user/gCapeUausrUSDRqZH8tPzcrqnF42" node.
Here is the results of firebase database:profile
How can I get rid of the download while updating data or reduce the usage since I only need to upload it?
I dont think it is possible in firebase cloudfunction trigger.
The .onWrite((data, context) has a data field, which is the complete DataSnapshot.
And there is no way to configure not fetching its val.
Still, there are two things that you might do to help reduce the data cost:
Watch a smaller set for trigger. e.g. functions.database.ref("some/data/path") vs ("some").
Use more specific hook. i.e. onCreate() and onUpdate() vs onWrite().
You should expect that all operations will round trip with your client code. Otherwise, how would the client know when the work is complete? It's going to take some space to express that. The screenshot you're showing (which is very tiny and hard to read - consider copying the text directly into your question) indicates a very small amount of download data.
To get a better sense of what the real cost is, run multiple tests and see if that tiny cost is itself actually just part of the one-time handshake between the client and server when the connection is established. That cost might not be an issue as your function code maintains a persistent connection over time as the Cloud Functions instance is reused.
first question here but i really don't know where to go. I cannot find anything that help me on google.
i'm doing huge processing server side and i would like to keep track of the state and show it on the client side.
For that purpose i have a variable that i'm updating as the process go through. To keep track of it i'm using that client side:
Template.importJson.onCreated(function () {
Session.set('import_datas', null);
this.autorun(function(){
Meteor.call('readImportState', function(err, response) {
console.log(response);
if (response !== undefined) {
Session.set('importingMessage',response);
}
});
})
});
I'm reading it from template that way (in template.mytemplate.helpers):
readImportState: function() {
return Session.get('importingMessage');
},
And here is the server side code to be called by meteor.call:
readImportState: function() {
console.log(IMPORT_STATE);
return IMPORT_STATE;
}
The client grab the value at start but it is never updated later....
What am i missing here?
If somebody could point me in the right direction that would be awesome.
Thank you :)
TL;DR
As of this writing, the only easy way to share reactive state between the server and the client is to use the publish/subscribe mechanism. Other solutions will be like fighting an uphill battle.
In-memory State
Here's the (incorrect) solution you are looking for:
When the job starts, write to some in-memory state on the server. This probably looks like a global or file scoped variable like jobStates, where jobStates is an object with user ids as its keys, and state strings as its values.
The client should periodically poll the server for the current state. Note an autorun doesn't work for Meteor.call (there is no reactive state forcing the autorun to execute again) - you'd need to actually poll every N seconds via setInterval.
When the job completes, modify jobStates.
When the client sees a completed state, inform the user and cancel the setInterval.
Because the server could restart for any number of reasons while the job is running (and consequently forget its in-memory state), we'll need to build in some fault tolerance for both the state and the job itself. Write the job state to the database whenever it changes. When the server starts, we'll read this state back into jobStates.
The model above assumes only a single server is running. If there exist multiple server instances, each one will need to observe the collection in order to write to its own jobStates. Alternatively, the method from (2) should just read the database instead of actually keeping jobStates in memory.
This approach is complicated and error prone. Furthermore, it requires writing the state to the database anyway in order to handle restarts and multiple server instances.
Publish/Subscribe
As the job state changes, write the current state to the database. This could be to a separate collection just for job states, or it could be a collection with all the metadata used to execute the job (helpful for fault tolerance), or it could be to the document the job is producing (if any).
Publish the necessary document(s) to the client.
Subscribe for the document(s) on the client and use a simple find or findOne in a template to display the state to the user.
Optional: clean up the state document(s) periodically using with something like synced cron.
As you can see, the publish/subscribe mechanism is considerably easier to implement because most of the work is done for you by meteor.
I just noticed that Meteor.call, the concept that prevent user from invoke collection's insert, update, remove method, still able to be invoked from JavaScript console.
For client's example:
// client
...
Meteor.call('insertProduct', productInfo);
...
Here's the server part:
// server
Meteor.methods({
insertProduct: function( productInfo ){
Product.insert(...);
}
})
OK, I know people can't invoke Product.insert() directly from their JavaScript console.
But if they try a little bit more, they'd find out there's Meteor.call() in client's JavaScript from Developer tool's resource tab.
So now they can try to invoke Meteor.call from their console, then try to guessing what should be productInfo's properties.
So I wonder how can we prevent this final activity?
Does Meteor.call done the job well enough?
or I'm missing something important?
Meteor.call is a global function, just like window.alert(). Unfortunately, there is nothing you can do from preventing a user calling Meteor.call. However, you can validate the schema of data and the actual data of what a user is sending. I'd recommend https://github.com/aldeed/meteor-simple-schema (aldeed:simple-schema as the meteor package name) to ensure you don't get garbage data in your project.
As others pointed out, "Meteor.call" can surely be used from the console. The subtle issue here is that there could be a legal user of a meteor app who can in turn do bad things on the server. So even if one checks on the server if the user is legal, that by itself does not guarantee that the data is protected.
This is not an issue only with Meteor. I think all such apps would need to potentially protect against corruption of their data, even through legal users
One way to protect such corruption is by using IIFE (Immediately Invoked Function Expression)
Wrap your module in a IIFE. Inside the closure keep a private variable which stores a unique one time use key (k1). That key needs to be placed there using another route -- maybe by ensuring that a collection observer gets fired in the client at startup. One can use other strategies here too. The idea is to squirrel in the value of k1 from the server and deposit it in a private variable
Then each time you invoke a Meteor.call from inside you code, pass k1 along as one of the parameter. The server in turn checks if k1 was indeed legal for that browser connection
As k1 was stored inside a private variable in the closure that was invoked by the IIFE, it would be quite difficult for someone at the browser console to determine the value of k1. Hence, even though "Meteor.call" can indeed be called from the browser console, it would not cause any harm. This approach should be quite a good deterrent for data corruption
As mentionned by #Faysal, you have several ways to ensure your calls are legit. An easy step to do so is to implement alanning:roles and do role checks from within your method like the following:
Meteor.methods({
methodName: function() {
if (!Roles.userIsInRole(this.userId, 'admin')) {
throw new Meteor.Error(403, 'not authorized);
} else { yourcode });
This way, only admin users can call the method.
Note that you can also check this.connection from within the method and determine if the call comes from the server (this.connection === false) or from the client.
Generally speaking, doing checks and data manipulations from your methods is a nice way to go. Allow/deny are nice to begin with but become really hard to maintain when your collections get heavier and your edge-cases expand.
You cannot block Meteor.call from the console, just like you can't block CollectionName.find().count() from the console. These are global functions in meteor.
But there are simple steps you can take to secure your methods.
Use aldeed:simple-schema to set the types of data your collection can accept. This will allow you to set the specific keys that your collection takes as well as their type (string, boolean, array, object, integer) https://github.com/aldeed/meteor-simple-schema
Ensure that only logged in users can update from your method. Or set global Allow/Deny rules. https://www.meteor.com/tutorials/blaze/security-with-methods && https://www.discovermeteor.com/blog/allow-deny-a-security-primer/
Remove packages insecure and autopublish
The simple combo of schema and allow/deny should do you just fine.
As you know by now that you can't really block calling Meteor.call from Javascript console, what i'd like to add as a suggestion with #Stephen and #thatgibbyguy that, be sure to check your user's role when adding documents into the collection. Simple-Schema will help you prevent inserting/updating garbage data into the collection. and alanning:roles package certainly makes your app secure by controlling who has the permission to write/read/update your collection documents.
Alanning:roles Package
I've been following lots of meteor examples and working through discover meteor, and now I'm left with lots of questions. I understand subscribe and fetch are ways to get "reactivity" to work properly, but I still feel unsure about the relationship between find operations and subscriptions/fetch. I'll try to ask some questions in order to probe for some holistic/conceptual answers.
Question Set 1:
In the following example we are fetching 1 object and we are subscribing to changes on it:
Meteor.subscribe('mycollection', someID);
Mycollection.findOne(someID);
Does order of operations matter here?
When does this subscription "expire"?
Question Set 2:
In some cases we want to foreign key lookup and use fetch like this:
MyCollection2.find({myCollection1Id: doc1Id}).fetch();
Do we need also need a MyColletion2.subscribe when using fetch?
How does subscribe work with "foreign keys"?
Is fetch ~= to a subscription?
Question Set 3:
What is an appropriate use of Tracker.autorun?
Why/when should I use it instead of subscribe or fetch?
what happens when you subscribe and find/fetch
The client calls subscribe which informs the server that the client wants to see a particular set of documents.
The server accepts or rejects the subscription request and publishes the matching set of documents.
Sometime later (after network delay) the documents arrive on the client. They are stored in a database in the browser called minimongo.
A subsequent fetch/find on the collection in which the aforementioned documents are stored will query minimongo (not the server).
If the subscribed document set changes, the server will publish a new copy to the client.
Recommended reading: understanding meteor publications and subscriptions.
question 1
The order matters. You can't see documents that you haven't subscribed for (assuming autopublish is off). However, as I point out in common mistakes, subscriptions don't block. So a subscription followed by an immediate fetch is should return undefined.
Subscriptions don't stop on their own. Here's the breakdown:
A global subscription (one made outside of your router or template) will never stop until you call its stop method.
A route subscription (iron router) will stop when the route changes (with a few caveats).
A template subscription will stop when the template is destroyed.
question 2
This should be mostly explained by the first section of my answer. You'll need both sets of documents in order to join them on the client. You may publish both sets at once from the server, or individually - this is a complex topic and depends on your use case.
question 3
These two are somewhat orthogonal. An autorun is a way for you to create a reactive computation (a function which runs whenever its reactive variables change) - see the section on reactivity from the docs. A find/fetch or a subscribe could happen inside of an autorun depending on your use case. This probably will become more clear once you learn more about how meteor works.
Essentially, when you subscribe to a dataset, it fills minimongo with that data, which is stored in the window's local storage. This is what populates the client's instance of that Mongo with data, otherwise, basically all queries will return undefined data or empty lists.
To summarize: Subscribe and Publish are used to give different data to different users. The most common example would be giving different data based on roles. Say, for instance, you have a web application where you can see a "public" and a "friend" profile.
Meteor.publish('user_profile', function (userId) {
if (Roles.userIsInRole(this.userId, 'can-view', userId)) {
return Meteor.users.find(userId, {
fields: {
public: 1,
profile: 1,
friends: 1,
interests: 1
}
});
} else {
return Meteor.users.find(userId, {
fields: { public: 1 }
});
}
});
Now if you logged in as a user who was not friends with this user, and did Meteor.subscribe('user_profile', 'theidofuser'), and did Meteor.users.findOne(), you would only see their public profile. If you added yourself to the can-view role of the user group, you would be able to see public, profile, friends, and interests. It's essentially for security.
Knowing that, here's how the answers to your questions breaks down:
Order of operations matters, in the sense that you will get undefined unless it's in a reactive block (like Tracker.autorun or Template.helpers).
you still need to use the subscribe when using fetch. All fetch really does is return an array instead of a cursor. To publish with foreign keys is a pretty advanced problem at times, I recommend using reywood:publish-composite, which handles the annoying details for you
Tracker.autorun watches reactive variables within the block and will rerun the function when one of them changes. You don't really ever use it instead of subscribing, you just use it to watch the variables in your scope.
If I write a plugin which requires a very large initialization (14 mb JavaScript which takes 1 minute to set itself up), how can I make this object persistent (for lack of a better word) across the JavaScript files used in a Meteor projects?
After the initialization of the plugin, I have an object LargeObject and when I add a file simple_todo.js, I want to use LargeObject without it taking a minute to load after EVERY change.
I cannot find any solution.
I tried making a separate package to store this in Package object, but that is cleared after every change and reinitialized.
What would be the proper way of doing that? I imagine there should be something internal in Meteor which survives hot code push.
Here are two possible solutions I:
Cache some of its properties inside Session
Cache some of its properties inside a simple collection
Use a stub in your local environment.
Session can only be used client side. You can use a collection anywhere.
Session
client
example = function () {
if(!(this.aLotOfData = Session.get('aLotOfData'))) {
this.aLotOfData = computeALotOfData()
Session.set('aLotOfData', this.aLotOfData)
}
}
Here, no data has to be transferred between client and server. For every new client that connects, the code is rerun.
Collection
lib
MuchDataCollection = new Mongo.Collection('MuchDataCollection')
server
Meteor.publish('MuchData', function (query) {
return MuchDataCollection.find(query)
})
server
example = function () {
if(
!this.aLotOfData = MuchDataCollection.findOne({
name: 'aLotOfData'
}).data
) {
this.aLotOfData = computeALotOfData()
MuchDataCollection.insert({
name: 'aLotOfData',
data: this.aLotOfData
})
}
}
Even dough you can access the collection anywhere, you don't want anyone to be able to make changes to it. Because all clients share the same collection. Collections are cached client side. Read this for more info.
Stub
A stub is probably the easiest to implement. But it's the worst solution. You'll probably have to use a settings variable and still end up having the code for the stub inside the production environment.
What to choose
It depends on your exact use-case. If the contents of the object depend on the client or user, it's probably best to use a session-var. If it doesn't go for a collection. You'll probably need to build some cache-invalidation mechanisms, but I'd say, it's worth it.