How to handle many to many relationships in mongo

How to handle many to many relationships in mongo - javascript

What is the best way to structure many-to-many models in a mongoose schema?
I have two models that have a many-to-many relationship with eachother. Users can belong to many organistaions and Organisations can have many users.
Options:
Define the relationship in each model by referencing the other model
Define the relationship in one model by referencing the other model
Option 1
const UserSchema = new Schema({
organiations: { type: Schema.Types.ObjectId, ref: "Organiation" }, // references organisation
})
mongoose.model("User", UserSchema)
const OrganiationSchema = new Schema({
users: { type: Schema.Types.ObjectId, ref: "User" }, // references users
})
mongoose.model("Organiation", OrganiationSchema)
This seems like a good idea at first and it means I can query the Organisation model to get all users and I can also query the user model to get all relative organisations.
The only problem with this is I have to maintain 2 sources of truth. If I create a organisation, I must update the user witht the orgs it belongs to and I must update the organisation with the users it has.
This leads me to option 2 which is to have one source of truth by only defining the relationship in one model.
option 2:
const UserSchema = new Schema({
organiations: { type: Schema.Types.ObjectId, ref: "Organiation" }, // references organistion
})
mongoose.model("User", UserSchema)
const OrganiationSchema = new Schema({}) // no referencces
mongoose.model("Organiation", OrganiationSchema)
This means when I create a new organisation I only need to update user with the organisations they belong to. There is no risk of the 2 sources getting out of sync. However, it does mean when it comes to querying data, it makes it more tricky. If I want to get all users that belong to an organisation, I have to query the user document. This would mean my organisation controller has to then be aware of both user and organisation models and as I start adding more relationships and models, I get tight coupling between all of these modules which I want to avoid.
How would you recommend handling many-to-many relationship in a mongoose schema?

There is no fixed solution to this.
If the organization can have orders of magnitudes more users than users can have organizations, option 2 might be a better solution.
Performance wise, populating the referenced data would be about the same as long as the referenced ids are indexed.
Having said that you might still go for option 1, even if you organization collection has the potential to have "huge" arrays. Especially if you want to make simple computation such as number of organizations users or use the "organiztion's current userIds to some other collection". In this case option 1 would be way better.
But if you opt of option 1 and if your array has the potential to become very large, consider bucket design pattern. Basically you limit the max length of your nested array. If the array reaches its max length, you make another document that holds newly added ids(or nested documents). Think of it as pagination.

Related

Database modelling (mongoDB)

I am trying to model a database for a fitness app. Currently the 4 main entities are as follows:
Exercise
User
Workout
UserWorkout
id
id
id
id
name
email
name
userId (fk)
body_part
name
description
workoutId (fk)
category
password
level
date
age
exerciseIds (fk)
time_taken
The app will have default workouts as well as default exercises.
I would like the user to be able to add their own custom workouts/exercises that only they can see (in addition to the default ones) but I'm not sure on how to best structure the data?

Kris, MongoDB is a schemaless database, what makes it really flexible when it comes down to data modelling. There are different ways of achieving what you described, the one I would recommend would be adding nested documents to the user document if they belong to it. You would have something like this:
User {
firstName: ...,
lastName: ...,
age: ...,
weight: ...,
exercises: [
// User's exercise objects
],
workout: [
// User's workout objects
],
}
This way you can easily have access to information related to the user and avoid using expensive operations like $lookup or querying the database multiple times.
To handle the default exercises/workouts you can have a property in the respective objects like isDefault: true.

How do I retrieve all records that are NOT in a many to many association using Sequelize

Sequelize gives you the ability to define a many to many association between tables which adds some extra functionality to a Model instance.
I have a Users table and I have defined a self-association on the table like so:
User.belongsToMany(models.User, { through: 'Friends', as: 'friends', foreignKey: 'userId' });
This gives the instance of the User model a couple of extra methods like user.getFriends(). So far so good.
What I want to do is to get all users who aren't friends of our instance. Something like user.getNonFriends(). Would that be possible using Sequelize?

A quick solution I can think of is, you could get the list of friends of the user A from the database. Using that result you can get the friends list that is not in the user's A list. Here is an example in code
const friends = user.getFriends();
const friendIds friends.map(friend => friend.id)
Friend.findAll({ where: {
id: { $notIn: [...friendIds] }
}
})

Sequelize dynamic seeding

I'm currently seeding data with Sequelize.js and using hard coded values for association IDs. This is not ideal because I really should be able to do this dynamically right? For example, associating users and profiles with a "has one" and "belongs to" association. I don't necessarily want to seed users with a hard coded profileId. I'd rather do that in the profiles seeds after I create profiles. Adding the profileId to a user dynamically once profiles have been created. Is this possible and the normal convention when working with Sequelize.js? Or is it more common to just hard code association IDs when seeding with Sequelize?
Perhaps I'm going about seeding wrong? Should I have a one-to-one number of seeds files with migrations files using Sequelize? In Rails, there is usually only 1 seeds file you have the option of breaking out into multiple files if you want.
In general, just looking for guidance and advice here. These are my files:
users.js
// User seeds
'use strict';
module.exports = {
up: function (queryInterface, Sequelize) {
/*
Add altering commands here.
Return a promise to correctly handle asynchronicity.
Example:
return queryInterface.bulkInsert('Person', [{
name: 'John Doe',
isBetaMember: false
}], {});
*/
var users = [];
for (let i = 0; i < 10; i++) {
users.push({
fname: "Foo",
lname: "Bar",
username: `foobar${i}`,
email: `foobar${i}#gmail.com`,
profileId: i + 1
});
}
return queryInterface.bulkInsert('Users', users);
},
down: function (queryInterface, Sequelize) {
/*
Add reverting commands here.
Return a promise to correctly handle asynchronicity.
Example:
return queryInterface.bulkDelete('Person', null, {});
*/
return queryInterface.bulkDelete('Users', null, {});
}
};
profiles.js
// Profile seeds
'use strict';
var models = require('./../models');
var User = models.User;
var Profile = models.Profile;
module.exports = {
up: function (queryInterface, Sequelize) {
/*
Add altering commands here.
Return a promise to correctly handle asynchronicity.
Example:
return queryInterface.bulkInsert('Person', [{
name: 'John Doe',
isBetaMember: false
}], {});
*/
var profiles = [];
var genders = ['m', 'f'];
for (let i = 0; i < 10; i++) {
profiles.push({
birthday: new Date(),
gender: genders[Math.round(Math.random())],
occupation: 'Dev',
description: 'Cool yo',
userId: i + 1
});
}
return queryInterface.bulkInsert('Profiles', profiles);
},
down: function (queryInterface, Sequelize) {
/*
Add reverting commands here.
Return a promise to correctly handle asynchronicity.
Example:
return queryInterface.bulkDelete('Person', null, {});
*/
return queryInterface.bulkDelete('Profiles', null, {});
}
};
As you can see I'm just using a hard coded for loop for both (not ideal).

WARNING: after working with sequelize for over a year, I've come to realize that my suggestion is a very bad practice. I'll explain at the bottom.
tl;dr:
never use seeders, only use migrations
never use your sequelize models in migrations, only write explicit SQL
My other suggestion still holds up that you use some "configuration" to drive the generation of seed data. (But that seed data should be inserted via migration.)
vv DO NOT DO THIS vv
Here's another pattern, which I prefer, because I believe it is more flexible and more readily understood. I offer it here as an alternative to the accepted answer (which seems fine to me, btw), in case others find it a better fit for their circumstances.
The strategy is to leverage the sqlz models you've already defined to fetch data that was created by other seeders, use that data to generate whatever new associations you want, and then use bulkInsert to insert the new rows.
In this example, I'm tracking a set of people and the cars they own. My models/tables:
Driver: a real person, who may own one or more real cars
Car: not a specific car, but a type of car that could be owned by someone (i.e. make + model)
DriverCar: a real car owned by a real person, with a color and a year they bought it
We will assume a previous seeder has stocked the database with all known Car types: that information is already available and we don't want to burden users with unnecessary data entry when we can bundle that data in the system. We will also assume there are already Driver rows in there, either through seeding or because the system is in-use.
The goal is to generate a whole bunch of fake-but-plausible DriverCar relationships from those two data sources, in an automated way.
const {
Driver,
Car
} = require('models')
module.exports = {
up: async (queryInterface, Sequelize) => {
// fetch base entities that were created by previous seeders
// these will be used to create seed relationships
const [ drivers , cars ] = await Promise.all([
Driver.findAll({ /* limit ? */ order: Sequelize.fn( 'RANDOM' ) }),
Car.findAll({ /* limit ? */ order: Sequelize.fn( 'RANDOM' ) })
])
const fakeDriverCars = Array(30).fill().map((_, i) => {
// create new tuples that reference drivers & cars,
// and which reflect the schema of the DriverCar table
})
return queryInterface.bulkInsert( 'DriverCar', fakeDriverCars );
},
down: (queryInterface, Sequelize) => {
return queryInterface.bulkDelete('DriverCar');
}
}
That's a partial implementation. However, it omits some key details, because there are a million ways to skin that cat. Those pieces can all be gathered under the heading "configuration," and we should talk about it now.
When you generate seed data, you usually have requirements like:
I want to create at least a hundred of them, or
I want their properties determined randomly from an acceptable set, or
I want to create a web of relationships shaped exactly like this
You could try to hard-code that stuff into your algorithm, but that's the hard way. What I like to do is declare "configuration" at the top of the seeder, to capture the skeleton of the desired seed data. Then, within the tuple-generation function, I use that config to procedurally generate real rows. That configuration can obviously be expressed however you like. I try to put it all into a single CONFIG object so it all stays together and so I can easily locate all the references within the seeder implementation.
Your configuration will probably imply reasonable limit values for your findAll calls. It will also probably specify all the factors that should be used to calculate the number of seed rows to generate (either by explicitly stating quantity: 30, or through a combinatoric algorithm).
As food for thought, here is an example of a very simple config that I used with this DriverCar system to ensure that I had 2 drivers who each owned one overlapping car (with the specific cars to be chosen randomly at runtime):
const CONFIG = {
ownership: [
[ 'a', 'b', 'c', 'd' ], // driver 1 linked to cars a, b, c, and d
[ 'b' ], // driver 2 linked to car b
[ 'b', 'b' ] // driver 3 has two of the same kind of car
]
};
I actually used those letters, too. At runtime, the seeder implementation would determine that only 3 unique Driver rows and 4 unique Car rows were needed, and apply limit: 3 to Driver.findAll, and limit: 4 to Car.findAll. Then it would assign a real, randomly-chosen Car instance to each unique string. Finally, when generating association tuples, it uses the string to look up the chosen Car from which to pull foreign keys and other values.
There are undoubtedly fancier ways of specifying a template for seed data. Skin that cat however you like. Hopefully this makes it clear how you'd marry your chosen algorithm to your actual sqlz implementation to generate coherent seed data.
Why the above is bad
If you use your sequelize models in migration or seeder files, you will inevitably create a situation in which the application will not build successfully from a clean slate.
How to avoid madness:
Never use seeders, only use migrations
(Anything you can do in a seeder, you can do in a migration. Bear that in mind as I enumerate the problems with seeders, because that means none of these problems gain you anything.)
By default, sequelize does not keep records of which seeders have been run. Yes, you can configure it to keep records, but if the app has already been deployed without that setting, then when you deploy your app with the new setting, it'll still re-run all your seeders one last time. If that's not safe, your app will blow up. My experience is that seed data can't and shouldn't be duplicated: if it doesn't immediately violate uniqueness constraints, it'll create duplicate rows.
Running seeders is a separate command, which you then need to integrate into your startup scripts. It's easy for that to lead to a proliferation of npm scripts that make app startup harder to follow. In one project, I converted the only 2 seeders into migrations, and reduced the number of startup-related npm scripts from 13 to 5.
It's been hard to pin down, but it can be hard to make sense of the order in which seeders are run. Remember also that the commands are separate for running migrations and seeders, which means you can't interleave them efficiently. You'll have to run all migrations first, then run all seeders. As the database changes over time, you'll run into the problem I describe next:
Never use your sequelize models in your migrations
When you use a sequelize model to fetch records, it explicitly fetches every column it knows about. So, imagine a migration sequence like this:
M1: create tables Car & Driver
M2: use Car & Driver models to generate seed data
That will work. Fast-forward to a date when you add a new column to Car (say, isElectric). That involves: (1) creating a migraiton to add the column, and (2) declaring the new column on the sequelize model. Now your migration process looks like this:
M1: create tables Car & Driver
M2: use Car & Driver models to generate seed data
M3: add isElectric to Car
The problem is that your sequelize models always reflect the final schema, without acknowledging the fact that the actual database is built by ordered accretion of mutations. So, in our example, M2 will fail because any built-in selection method (e.g. Car.findOne) will execute a SQL query like:
SELECT
"Car"."make" AS "Car.make",
"Car"."isElectric" AS "Car.isElectric"
FROM
"Car"
Your DB will throw because Car doesn't have an isElectric column when M2 executes.
The problem won't occur in environments that are only one migration behind, but you're boned if you hire a new developer or nuke the database on your local workstation and build the app from scratch.

Instead of using different seeds for Users and Profiles you could seed them together in one file using sequelizes create-with-association feature.
And additionaly, when using a series of create() you must wrap those in a Promise.all(), because the seeding interface expects a Promise as return value.
up: function (queryInterface, Sequelize) {
return Promise.all([
models.Profile.create({
data: 'profile stuff',
users: [{
name: "name",
...
}, {
name: 'another user',
...
}]}, {
include: [ model.users]
}
),
models.Profile.create({
data: 'another profile',
users: [{
name: "more users",
...
}, {
name: 'another user',
...
}]}, {
include: [ model.users]
}
)
])
}
Not sure if this is really the best solution, but thats how I got around maintaining foreign keys myself in seeding files.

Object metadata (schema) design in mongodb

First of all excuse me since I don't know how it is called in computer since:
For each of my document types in my mongo app I want to define a structure, with every field defined with its constraints, validation patterns and, generally, roles that can view modify and delete this document.
For example: Book:
{
name: "Book",
viewRoles: ["Admin","User"],
createRoles: ["Admin"],
modifyRoles: ["Admin", "User"],
fields: [
{
id:"title",
name:"Book Title",
validation: "",
maxLength: 50,
minLength: 3,
required: true
},
{
id:"authorEmail",
name:"Email of the Author",
validation: "email",
maxLength: 50,
minLength: 3,
required: false
}
]
}
Then if I have this "schema" for all of my documents, I can have one view for creating modifying and showing this "entities".
I also want to have the ability to create new document types, modify their fields through admin panel of my application.
When I google "mongo dynamic schema", "mongo document meta design" I get useless information.
My question is how it is called -- when I want to have predefined schema of my documents and have the ability to modify it. Where I can get more information about how to design such systems?

Since you tagged this as having a Meteor connection, I'll point you to Simple Schema: https://github.com/aldeed/meteor-simple-schema/. I use it, along with the related collection2 package. I find it's a nice way to document and enforce schema design. When used with the autoform package, it also provides a way to create validated forms directly from your schema.

I think you are looking for how to model your data. The below link might be helpful:
http://docs.mongodb.org/manual/data-modeling/
I also want to have the ability to create new document types, modify
their fields through admin panel of my application.
For Administrative activities you may look into the options given in:
http://docs.mongodb.org/ecosystem/tools/administration-interfaces/
And once you are done, you might want to read this as a kick off:
https://blog.serverdensity.com/mongodb-schema-design-pitfalls/
In Mongo DB you don't create collections. You just start using them. So you can't define schemas before hand. The collection is created on the first insert you make to the collection. Just make sure to ensure Index on the collection before inserting documents into it:
db.collection.ensureIndex({keyField: 1})
So it all depends on maintaining the structure of the documents inserted to the collection rather than defining the collection.

Self-Join with Ember-Data

Does anyone have any suggestions on how to manually create a self-join relationship using ember-data?
If, for example, a user had many followers (other users), what would be the simplest way to build this data structure into ember-data?

Best way that we could find without going crazy was to proxy the self-join relationship with the relationship object, then just map that to the user.
So if a user has many "users" through follows then you can do:
App.User = DS.Model.extend
name: DS.attr('string')
follows: DS.hasMany('App.Follow')
followers:(->
#get('follows').map((data)-> App.User.find(data.get('followedUserId')))
).property('follows.#each')
App.Follow = Ds.Model.extend
user: DS.belongsTo('App.User')
followedUserId: DS.attr('string')
Hope that helps!

We Keep Coding

JavaScript is the programming language of the Web.