Unique field constraint not enforced after dropDatabase

Unique field constraint not enforced after dropDatabase - javascript

I have a Node / Mongoose / MongoDB project which I use Selenium WebDriver for integration tests. Part of my testcase setup is to wipe the database before each test. I accomplish this via command line using the method that's heavily accepted here Delete everything in a MongoDB database
mongo [database] --eval "db.dropDatabase();"
The problem is however that after running this command, Mongo no longer enforces any unique fields, such as the following as defined by my Mongoose Schema:
new Mongoose.Schema
name :
type : String
required : true
unique : true
If I restart mongod after the dropDatabase then the unique constraints are enforced again.
However since my mongod is run by a separate process, restarting it as part of my test case set up would be cumbersome, and I'm hoping unnecessary. Is this a bug or am I missing something?

As you stated, your mongod (or multiple ones) is a separate process to your application, and as such your application has no knowledge of what has happened when you dropped the database. While the collections and indeed the database will be created when accessed by default, this is not true of all the indexes you defined.
Index creation actually happens on application start-up, as is noted in this passage from the documentation:
When your application starts up, Mongoose automatically calls ensureIndex for each defined index in your schema. ...
It does not automatically happen each time a model is accessed, mostly because that would be excessive overhead, even if the index is in place and the server does nothing, it's still additional calls that you don't want preceeding every "read", "update", "insert" or "delete".
So in order to have this "automatic" behavior "fire again", what you need to do is "restart" your application and this process will be run again on connection. In the same way, when your "application" cannot connect to the mongod process, it re-tries the connection and when reconnected the automatic code is run.
Alternately, you can code a function you could call to do this and expose the method from an API so you could link this function with any other external operation that does something like "drop database":
function reIndexAllModels(callback) {
console.log("Re-Indexing all models");
async.eachLimit(
mongoose.connection.modelNames(),
10,
function(name,callback) {
console.log("Indexing %s", name);
mongoose.model(name).ensureIndexes(callback);
},
function(err) {
if(err) throw err;
console.log("Re-Indexing done");
callback();
}
);
}
That will cycle through all registered models and re-create all indexes that are assigned to them. Depending on the size of registered models, you probably don't want to run all of these in parallel connections. So to help a little here, I use async.eachLimit in order to keep the concurrent operations to a manageable number.

Thanks to Neil Lunn for pointing out the pitfalls of dropDatabase and re-indexing. Ultimately, I went for the following command line solution, and removing the dropDatabase command.
mongo [database] --eval "db.getCollectionNames().forEach(function(n){db[n].remove()});"

Related

Database cluster - asynchronous tasks

Let's say I have a couchDB database called "products" and a frontend with a form.
Now if a user opens a document from this database in the form I want to prevent other user from editing this specific document.
Usually pretty simple:
-> read document from couchDB
-> set a variable to true like: { edit : true }
-> save (merge) document to couchDB
-> if someone else tries to open the document he will receive an error, becaus of edit:true.
BUT, what if two user open the document at the exact same time?
The function would be called twice and when the second one opens the document he would falsely receive an edit:false because the first didn't had enough time to save his edit:true. So how to prevent this behaviour?
First solution would be:
Build an array as a cue for database requests and dont allow parallel requests, so all requests would be worked off one after another. But in my opinion this is a bad solution because the system would be incredible slow at some point.
Second solution:
Store the documentIDs of the currently edited documents in an local array in the script. This would work because this is no asynchronous process and the second user would receive his error immediately.
So far so good, BUT, what if some day there are too many user and this system should run in a cluster (the node client server, not the database) - now the second solution would not work anymore because every cluster slave would have its own array of documentIDs. Sharing there would end in another asynchronous task and result in the same problem above.
Now i'm out of ideas, how do big clustered systems usually handle problems like that?

CouchDB uses MVCC to maintain consistency in your database. When a document is being updated, you must supply both the ID (_id) and revision number (_rev) otherwise your change will be rejected.
This means that if 2 clients read the document at revision 1 and both attempt to write a change using that same revision number, only the first will be accepted by the database. The 2nd client will receive an error, and it should fetch the latest revision of the document in order to proceed.
In a single-node environment, this model prevents conflicts outright. However, in cases where replication is occurring, it is still possible to get conflicts, even when using MVCC. This is because conflicting revisions can technically be written to different nodes before they have been replicated to one another. In this case, CouchDB will record the conflict and your application is responsible to resolve them.
CouchDB has stellar documentation, in particular they have an article all about conflicts and replication that I highly recommend for this subject.

dealing with long server side calculations in meteor

I am using jimp (https://www.npmjs.com/package/jimp) in meteor JS to generate an image server side. In other words I am 'calculating' the pixels of the image using a recursive algorithm. The algorithm takes quite some time to complete.
The issue I am having is that this seems to completely block the meteor server. Users trying to visit the webpage while an image is being generated are forced to wait. The website is therefore not rendered at all.
Is there any (meteor) way to run the heavy recursive algorithm in a thread or something so that it does not block the entire website?

Node (and consequently meteor) runs in a single process which blocks on CPU activity. In short, node works really well when you are IO-bound, but as soon as you do anything that's compute-bound you need another approach.
As was suggested in the comments above, you'll need to offload this CPU-intensive activity to another process which could live on the same server (if you have multiple cores) or a different server.
We have a similar problem at Edthena were we need to transcode a subset of our video files. For now I decided to use a meteor-based solution, because it was easy to set up. Here's what we did:
When new transcode jobs need to happen, we insert a "video job" document in to the database.
On a separate server (we max out the full CPU when transcoding), we have an app which calls observe like this:
Meteor.startup(function () {
// Listen for non-failed transcode jobs in creation order. Use a limit of 1 to
// prevent multiple jobs of this type from running concurrently.
var selector = {
type: 'transcode',
state: { $ne: 'failed' },
};
var options = {
sort: { createdAt: 1 }, limit: 1,
};
VideoJobs.find(selector, options).observe({
added: function (videoJob) {
transcode(videoJob);
}, });
});
As the comments indicate this allows only one job to be called at a time, which may or may not be what you want. This has the further limitation that you can only run it on one app instance (multiple instances calling observe would simultaneously complete the job). So it's a pretty simplistic job queue, but it may work for your purposes for a while.
As you scale, you could use a more robust mechanism for dequeuing and processing the tasks like Amazon's sqs service. You can also explore other meteor-based solutions like job-collection.

I believe you're looking for Meteor.defer(yourFunction).
Relevant Kadira article: https://kadira.io/academy/meteor-performance-101/content/make-your-app-faster

Thanks for the comments and answers! It seems to be working now. What I did is what David suggested. I am running a meteor app on the same server. This app deals with the generating of the images. However, this resulted in the app still eating away all the processing power.
As a result of this I set a slightly lower priority on the generating algorithm with the renice command on the PID. (https://www.nixtutor.com/linux/changing-priority-on-linux-processes/) This works! Any time a user logs into the website the other (client) meteor application gains priority over the generating algorithm. Absolutely no delay at all anymore now.
The only issue I am having now is that whenever the server restarts I somehow have to rerun or run the (re)nice command.
Since I am using meteor up for deployment both apps run the same user and the same 'command': node main.js. I am currently trying to figure out how to run the nice command within the startup script of meteor up. (located at /etc/init/.conf)

How to prevent invoking 'Meteor.call' from JavaScript Console?

I just noticed that Meteor.call, the concept that prevent user from invoke collection's insert, update, remove method, still able to be invoked from JavaScript console.
For client's example:
// client
...
Meteor.call('insertProduct', productInfo);
...
Here's the server part:
// server
Meteor.methods({
insertProduct: function( productInfo ){
Product.insert(...);
}
})
OK, I know people can't invoke Product.insert() directly from their JavaScript console.
But if they try a little bit more, they'd find out there's Meteor.call() in client's JavaScript from Developer tool's resource tab.
So now they can try to invoke Meteor.call from their console, then try to guessing what should be productInfo's properties.
So I wonder how can we prevent this final activity?
Does Meteor.call done the job well enough?
or I'm missing something important?

Meteor.call is a global function, just like window.alert(). Unfortunately, there is nothing you can do from preventing a user calling Meteor.call. However, you can validate the schema of data and the actual data of what a user is sending. I'd recommend https://github.com/aldeed/meteor-simple-schema (aldeed:simple-schema as the meteor package name) to ensure you don't get garbage data in your project.

As others pointed out, "Meteor.call" can surely be used from the console. The subtle issue here is that there could be a legal user of a meteor app who can in turn do bad things on the server. So even if one checks on the server if the user is legal, that by itself does not guarantee that the data is protected.
This is not an issue only with Meteor. I think all such apps would need to potentially protect against corruption of their data, even through legal users
One way to protect such corruption is by using IIFE (Immediately Invoked Function Expression)
Wrap your module in a IIFE. Inside the closure keep a private variable which stores a unique one time use key (k1). That key needs to be placed there using another route -- maybe by ensuring that a collection observer gets fired in the client at startup. One can use other strategies here too. The idea is to squirrel in the value of k1 from the server and deposit it in a private variable
Then each time you invoke a Meteor.call from inside you code, pass k1 along as one of the parameter. The server in turn checks if k1 was indeed legal for that browser connection
As k1 was stored inside a private variable in the closure that was invoked by the IIFE, it would be quite difficult for someone at the browser console to determine the value of k1. Hence, even though "Meteor.call" can indeed be called from the browser console, it would not cause any harm. This approach should be quite a good deterrent for data corruption

As mentionned by #Faysal, you have several ways to ensure your calls are legit. An easy step to do so is to implement alanning:roles and do role checks from within your method like the following:
Meteor.methods({
methodName: function() {
if (!Roles.userIsInRole(this.userId, 'admin')) {
throw new Meteor.Error(403, 'not authorized);
} else { yourcode });
This way, only admin users can call the method.
Note that you can also check this.connection from within the method and determine if the call comes from the server (this.connection === false) or from the client.
Generally speaking, doing checks and data manipulations from your methods is a nice way to go. Allow/deny are nice to begin with but become really hard to maintain when your collections get heavier and your edge-cases expand.

You cannot block Meteor.call from the console, just like you can't block CollectionName.find().count() from the console. These are global functions in meteor.
But there are simple steps you can take to secure your methods.
Use aldeed:simple-schema to set the types of data your collection can accept. This will allow you to set the specific keys that your collection takes as well as their type (string, boolean, array, object, integer) https://github.com/aldeed/meteor-simple-schema
Ensure that only logged in users can update from your method. Or set global Allow/Deny rules. https://www.meteor.com/tutorials/blaze/security-with-methods && https://www.discovermeteor.com/blog/allow-deny-a-security-primer/
Remove packages insecure and autopublish
The simple combo of schema and allow/deny should do you just fine.

As you know by now that you can't really block calling Meteor.call from Javascript console, what i'd like to add as a suggestion with #Stephen and #thatgibbyguy that, be sure to check your user's role when adding documents into the collection. Simple-Schema will help you prevent inserting/updating garbage data into the collection. and alanning:roles package certainly makes your app secure by controlling who has the permission to write/read/update your collection documents.
Alanning:roles Package

Save to 3 firebase locations with a slow internet connection

Sometimes I'm having issues with firebase when the user is on a slow mobile connection. When the user saves an entry to firebase I actually have to write to 3 different locations. Sometimes, the first one works, but if the connection is slow the 2nd and 3rd may fail.
This leaves me with entries in the first location that I constantly need to clean up.
Is there a way to help prevent this from happening?
var newTikiID = ref.child("tikis").push(tiki, function(error){
if(!error){
console.log("new tiki created")
var tikiID = newTikiID.key()
saveToUser(tikiID)
saveToGeoFire(tikiID, tiki.tikiAddress)
} else {
console.log("an error occurred during tiki save")
}
});

There is no Firebase method to write to multiple paths at once. Some future tools planned by the team (e.g. Triggers) may resolve this in the future.
This topic has been explored before and the firebase-multi-write README contains a lot of discussion on the topic. The repo also has a partial solution to client-only atomic writes. However, there is no perfect solution without a server process.
It's important to evaluate your use case and see if this really matters. If the second and third writes failed to write to a geo query, chances are, there's really no consequence. Most likely, it's essentially the same as if the first write had failed, or if all writes had failed; it won't appear in searches by geo location. Thus, the complexity of resolving this issue is probably a time sink.
Of course, it does cost a few bytes of storage. If we're working with millions of records, that may matter. A simple solution for this scenario would be to run and audit report that detects broken links between the data and geofire tables and cleans up old data.
If an atomic operation is really necessary, such as gaming mechanics where fairness or cheating could be an issue, or where integrity is lost by having partial results, there are a couple options:
1) Master Record approach
Pick a master path (the one that must exist) and use security rules to ensure other records cannot be written, unless the master path exists.
".write": "root.child('maste_path').child(newData.child('master_record_id')).exists()"
2) Server-side script approach
Instead of writing the paths separately, use a queue strategy.
Create an single event by writing a single event to a queue
Have a server-side process monitor the queue and process events
The server-side process does the multiple writes and ensures they
all succeed
If any fail, the server-side process handles
rollbacks or retries
By using the server-side queue, you remove the risk of a client going offline between writes. The server can safely survive restarts and retry events or failures when using the queue model.

I have had the same problem and I ended up choosing to use condition Conditional Request with the Firebase REST API in order to write data transactionally. See my question and answer. Firebase: How to update multiple nodes transactionally? Swift 3 .
If you need to write concurrently (but not transactionally) to several paths, you can do that now as Firebase supports multi-path updates. https://firebase.google.com/docs/database/rest/save-data
https://firebase.googleblog.com/2015/09/introducing-multi-location-updates-and_86.html

How does NodeJS handle async file IO?

Having worked with NodeJS for some time now, I've been wondering about how node handles file operations internally.
Considering the following pseudo code:
initialize http server
on connection:
modify_some_file:
on success:
print "it worked"
Let's consider two users A & B that try to access the page nearly simultaneously. Let's further assume A is the first one to connect, then the following happens:
A connects
NodeJS initializes the file operation and tells the operating system to be notified once it is done
And here's what I'm wondering about: Let's say, the file operation isn't done yet and B connects, what does node do? How and when does it access the file when it is still in the process of "being modified"?
I hope my question is somewhat clear ;)
Looking forward to your answers!

AFAIK, Node won't care.
At least on Unix, it's perfectly legal to have multiple writers to the same file. Sometimes that's not a problem (say your file consists of fixed-size records, where writer #1 writes to record X and writer #2 writes to record Y, with X !== Y), and sometimes it is (same example: when both writers want to write to record X).
In Node, the problems are mitigated because I/O operations "take turns", but I think there's still potential of two writers getting in each others way. It's up to the programmer to make sure that doesn't happen.
With Node, you could use the *Sync() versions of the fs operations (but those will block your app during the operation), use append mode (which is only atomic up to certain write sizes I think, and it depends on your requirements if appending is actually useful), use some form of locking, or use something like a queue where write operations would be put onto the queue and there's a single queue consumer to handle the writes.

We Keep Coding

JavaScript is the programming language of the Web.