I have a GET all products endpoint which is taking an extremely long time to return responses:
Product.find(find, function(err, _products) {
if (err) {
res.status(400).json({ error: err })
return
}
res.json({ data: _products })
}).sort( [['_id', -1]] ).populate([
{ path: 'colors', model: 'Color' },
{ path: 'size', model: 'Size' },
{ path: 'price', model: 'Price' }
]).lean()
This query is taking up to 4 seconds, despite there only being 60 documents in the products collection.
This query came from a previous developer, and I'm not so familiar with Mongoose.
What are the performance consequences of sort and populate? I assume populate is to blame here? I am not really sure what populate is doing, so I'm unclear how to either avoid it or index at a DB level to improve performance.
From the Mongoose docs, "Population is the process of automatically replacing the specified paths in the document with document(s) from other collection(s)"
So your ObjectId reference on your model gets replaced by an entire Mongoose document. Doing so on multiple paths in one query will therefore slow down your app. If you want to keep the same code structure, you can use select to specify what fields of the document that should be populated, i.e. { path: 'colors', model: 'Color', select: 'name' }. So instead of returning all the data of the Color document here, you just get the name.
You can also call cursor() to stream query results from MongoDB:
var cursor = Person.find().cursor();
cursor.on('data', function(doc) {
// Called once for every document
});
cursor.on('close', function() {
// Called when done
});
You can read more about the cursor function in the Mongoose documentation here.
In general, try to only use populate for specific tasks like getting the name of a color for only one product.
sort will not cause any major performance issues until you reach much larger databases.
Hope it helps!
Related
It seems i have misunderstood sequelize .hasMany() and .belongsTo() associations and how to use them in service. I have two models:
const User = db.sequelize.define("user", {
uid: { /*...*/ },
createdQuestions: {
type: db.DataTypes.ARRAY(db.DataTypes.UUID),
unique: true,
allowNull: true,
},
});
const Question = db.sequelize.define("question", {
qid: { /*...*/ },
uid: {
type: db.DataTypes.TEXT,
},
});
Given that one user can have many questions and each question belongs to only one user I have the following associatons:
User.hasMany(Question, {
sourceKey: "createdQuestions",
foreignKey: "uid",
constraints: false,
});
Question.belongsTo(User, {
foreignKey: "uid",
targetKey: "createdQuestions",
constraints: false,
});
What I want to achieve is this: After creation of a question object, the qid should reside in the user object under "createdQuestions" - just as the uid resides in the question object under uid. What I thought sequelize associations would do for me is to save individual calling and updating the user object. Is there a corresponding method? What I have so far is:
const create_question = async (question_data) => {
const question = { /*... question body containing uid and so forth*/ };
return new Promise((resolve, rejected) => {
Question.sync({ alter: true }).then(
async () =>
await db.sequelize
.transaction(async (t) => {
const created_question = await Question.create(question, {
transaction: t,
});
})
.then(() => resolve())
.catch((e) => rejected(e))
);
});
};
This however only creates a question object but does not update the user. What am I missing here?
Modelling a One-to-many relationship in SQL
SQL vs NoSQL
In SQL, contrary to how it is in NoSQL, every attribute has a fixed data type with a fixed limit of bits. That's manifested by the SQL command when creating a new table:
CREATE TABLE teachers (
name VARCHAR(32),
department VARCHAR(64),
age INTEGER
);
The reason behind this is to allow us to easily access any attribute from the database by knowing the length of each row. In our case, each row will need the space needed to store:
32 bytes (name) + 64 bytes (department) + 4 bytes (age) = 100 byes
This is a very powerful feature in Relation Databases as it minimizes the time needed to retrieve data to Constant time since we knew where each piece of data is located in the memory.
One-to-Many Relationship: Case Study
Now, let's consider we have these 3 tables
Let's say we want to create a one-to-many relation between classes and teachers where a Teacher can give many classes.
We can think of it this way. But, this model is not possible for 2 main reasons:
It will make us lose our constant-time retrieval since we don't know the size of the list anymore
We fear that the amount of space given to the list attribute won't be enough for future data. Let's say we allocate space needed for 10 classes and we end up with a teacher giving 11 classes. This will push us to recreate our database to increase the column size.
Another way would be this:
While this approach will fix the limited column size problem, we no longer have a single source of truth. The same data is duplicated and stored multiple times.
That's why for this one-to-many relationship, we'll need to store the Id of the teacher inside this class table.
This way, we still can find all the classes a teacher can teach by running
SELECT *
FROM classes
WHERE teacherID = teacher_id
And we'll avoid all the problems discussed earlier.
Your relation is a oneToMany relation. One User can have multiple Questions. In SQL, this kind of relation is modelled by adding an attribute to Question called userId or Uid as you did. In Sequelize, this would be achieved through a hasMany or BelongsTo like this:
User.hasMany(Question)
Question.belongsTo(User, {
foreignKey: 'userId',
constraints: false
})
In other words, I don't think you need the CreatedQuestions attribute under User. Only one foreign key is needed to model the oneToMany relation.
Now, when creating a new question, you just need to add the userId this way
createNewQuestion = async (userId, title, body) => {
const question = await Question.create({
userId: userId, // or just userId
title: title, // or just title
body: body // or just body
})
return question
}
Remember, we do not store arrays in SQL. Even if we can find a way to do it, it is not what we need. There must be always a better way.
I am writing code in nodejs/MongoDB and am countering this particular issue which I was hoping to get help with:
I have various schemas defined within my Models and I note that for each of these MongoDB automatically populates a unique id, which is represented by the _id field.
However, I am interested in creating a customized ID, which is basically an integer that auto-increments. I have done so by utilizing the 'mongoose-auto-increment' npm package. For example, in my UserSchema:
UserSchema.plugin(passportLocalMongoose);
module.exports = mongoose.model("User", UserSchema);
autoIncrement.initialize(mongoose.connection);
UserSchema.plugin(autoIncrement.plugin, {
model: 'UserSchema',
field: 'user_id',
startAt: 1,
incrementBy: 1
});
To speed up my application, I have a seeds.js file which aims to load a bunch of data upon application initialization. However, to make this fully functional, I need a way to access my models and reference them over to other models (for cases when there is a one-to-one and one-to-many relationship). Since the mongoDB default _id is extremely long and there is no way to get the result unless I am actually on the html page and can use the req.params.id function, I have been trying to use mongoDB's findOne function to do this without success.
For example:
var myDocument = User.findOne({user_id: {$type: 25}});
if (myDocument) {
var myName = myDocument.user_id;
console.log(myName);
}
However, the result is always 'undefined' even though I know there is a User model saved in my database with a user_id of 25.
Any help would be much appreciated :)
User.findOne({ user_id: 25 }).exec(function (err, record) {
if (err) {
console.log(err);
} else {
console.log(record);
}
});
You need to undestand the nature of Node.js.
Node.js runs in async nature so you can't get the result here.
You need to do with other ways
like:
use callback
use promise
use async/await(ES8)
Try this:
User.findOne({user_id: {$type: 25}}, function (err, myDocument) {
if (myDocument) {
var myName = myDocument.user_id;
console.log(myName);
} else {
console.log(err);
}
});
I"m loading products via an infinite scroll in chunks of 12 at a time.
At times, I may want to sort these by how many followers they have.
Below is how i'm tracking how many followers each product has.
Follows are in a separate collection, because of the 16mb data cap, and the amount of follows should be unlimited.
follow schema:
var FollowSchema = new mongoose.Schema({
user: {
type: mongoose.Schema.ObjectId,
ref: 'User'
},
product: {
type: mongoose.Schema.ObjectId,
ref: 'Product'
},
timestamp: {
type: Date,
default: Date.now
}
});
Product that is followed schema:
var ProductSchema = new mongoose.Schema({
name: {
type: String,
unique: true,
required: true
},
followers: {
type: Number,
default: 0
}
});
Whenever a user follows / unfollows a product, I run this function:
ProductSchema.statics.updateFollowers = function (productId, val) {
return Product
.findOneAndUpdateAsync({
_id: productId
}, {
$inc: {
'followers': val
}
}, {
upsert: true,
'new': true
})
.then(function (updatedProduct) {
return updatedProduct;
})
.catch(function (err) {
console.log('Product follower update err : ', err);
})
};
My questions about this:
1: Is there a chance that the incremented "follower" value within product could hit some sort of error, resulting in un matching / inconsistent data?
2: would it be better to write an aggregate to count followers for each Product, or would that be too expensive / slow?
Eventually, I'll probably rewrite this in a graphDB, as it seems better suited, but for now -- this is an exercise in mastering MongoDB.
1 If you increment after inserting or decrement after removing, these is a chance resulting in inconsistent data. For example, insertion succeed but incrementing failed.
2 Intuitively, aggregation is much more expensive than find in this case. I did a benchmark to prove it.
First generate 1000 users, 1000 products and 10000 followers randomly. Then, use this code to benchmark.
import timeit
from pymongo import MongoClient
db = MongoClient('mongodb://127.0.0.1/test', tz_aware=True).get_default_database()
def foo():
result = list(db.products.find().sort('followers', -1).limit(12).skip(12))
def bar():
result = list(db.follows.aggregate([
{'$group': {'_id': '$product', 'followers': {'$sum': 1}}},
{'$sort': {'followers': -1}},
{'$skip': 12},
{'$limit': 12}
]))
if __name__ == '__main__':
t = timeit.timeit('foo()', 'from __main__ import foo', number=100)
print('time: %f' % t)
t = timeit.timeit('bar()', 'from __main__ import bar', number=100)
print('time: %f' % t)
output:
time: 1.230138
time: 3.620147
Creating index can speed up find query.
db.products.createIndex({followers: 1})
time: 0.174761
time: 3.604628
And If you need attributes from product such as name, you need another O(n) query.
I guess that when data scale up, aggregation will be much more slow. If need, I can benchmark on big scale data.
For number 1, if the only operations on that field are incrementing and decrementing, I think you'd be okay. If you start replicating that data or using it in joins for some reason, you'd run the risk of inconsistent data.
For number 2, I'd recommend you run both scenarios in the mongo shell to test them out. You can also review the individual explain plans for both queries to get an idea of which one would perform better. I'm just guessing, but it seems like the update route would perform well.
Also, the amount of expected data makes a difference. It might intially perform well one way, but after a million records the other route might be the way to go. If you have a test environment, that'd be a good thing to check.
1) This relies on the application layer to enforce consistency, and as such there is going to be a chance that you end up with inconsistencies. The questions I would ask are: how important is consistency in this case, and how likely is it that there will be a large inconsistency? My thought is that being off by one follower isn't as important as making your infinite scroll load as fast as possible to improve the user's experience.
2) Probably worth looking at the performance, but if I had to guess I would say this approach is going to be to slow.
I'm using $pull to pull a subdocument within an array of a document.
Don't know if it matters but in my case the subdocuments contain _id so they are indexed.
Here are JSONs that describes the schemas:
user: {
_id: String,
items: [UserItem]
}
UserItem: {
_id: String,
score: Number
}
Now my problem is this: I am using $pull to remove certain UserItem's from a User.
var delta = {$pull:{}};
delta.$pull.items={};
delta.$pull.items._id = {$in: ["mongoID1", "mongoID2" ...]};
User.findOneAndUpdate(query, delta, {multi: true}, function(err, data){
//whatever...
});
What i get in data here is the User object after the change, when what i wish to get is the items that were removed from the array (satellite data).
Can this be done with one call to the mongo or do I have to do 2 calls: 1 find and 1 $pull?
Thanks for the help.
You really cannot do this, or at least there is nothing that is going to return the "actual" elements that were "pulled" from the array in any response, even with the newer WriteResponse objects available to the newer Bulk Operations API ( which is kind of the way forward ).
The only way you can really do this is by "knowing" the elements you are "intending" to "pull", and then comparing that to the "original" state of the document before it was modified. The basic MongoDB .findAndModify() method allows this, as do the mongoose wrappers of .findByIdAndUpdate() as well and .findOneAndUpdate().
Basic usage premise:
var removing = [ "MongoId1", "MongoId2" ];
Model.findOneAndUpdate(
query,
{ "$pull": { "items._id": { "$in": removing } } },
{ "new": false },
function(err,doc) {
var removed = doc.items.filter(function(item) {
return removing.indexOf(item) != -1;
});
if ( removed.length > 0 )
console.log(removed);
}
);
Or something along those lines. You basically "turn around" the default mongoose .findOneAndUpdate() ( same for the other methods ) behavior and ask for the "original" document before it was modified. If the elements you asked to "pull" were present in the array then you report them, or inspect / return true or whatever.
So the mongoose methods differ from the reference implementation by returning the "new" document by default. Turn this off, and then you can "compare".
Further notes: "multi" is not a valid option here. The method modifies "one" document by definition. Also you make a statement that the array sub-documents conatain an _id. This is done by "mongoose" by default. But those _id values in the array are "not indexed" unless you specifically define an index on that field. The only default index is the "primary document" _id.
I'm linking the FB Graph API to Meteor so that I can retrieve a users photos and I'm having trouble setting the Meteor id to the Facebook id for each photo. Right now when the function is called it will return the same photo multiple times in the database since Meteor assigns a new _id to each photo each time.
For example, one entry might look like this:
Object {_id: "cnMsxSkmMXTjnhwRX", id: "1015160259999999", from: Object, picture: "https://photoSmall.jpg", source: "https://photoBig.jpg"…}
And a second, after the call has been performed again, like this:
Object {_id: "acMegKenftmnaefSf", id: "1015160259999999", from: Object, picture: "https://photoSmall.jpg", source: "https://photoBig.jpg"…}
Thereby creating two id fields in MongoDB.
The code I am using is below. I've tried a number of things to fix the code to no avail.
Meteor.methods({
getUserData: function() {
var fb = new Facebook(Meteor.user().services.facebook.accessToken);
var data = fb.getUserData();
_.forEach(data.data, function(photo) {
Photos.insert(photo, function(err) {
if(err) console.error(err);
});
});
}
});
Thanks in advance!
Check if the photo exists prior to inserting it
...
_.forEach(data.data, function(photo) {
if(Photos.findOne({id: photo.id})) return;
...
Another option is to add a unique key index to the id field. Or even use the _id field to store the id value. (be sure to use try catch to ensure it doesn't cause an error on the second insert).
Wouldn't there be something different to each of these with different ids?
You could also clean up the uniques before you run them with _.uniq