mongodb - get first X results and overall count of aggregate function - javascript

I have a aggregate query I'm using to get the 10 first results of a lookup between 2 collections.
I'm only getting the first 10, because if I use no limit and get 50 results the query gets slow (4-5 secs) (any suggestions on that will also be great)
So because Im doing some kind of scan I need to let the client know the number of total results, so it can query more when needed. currently im running the cursor twice and im sure that is not ideal.
const grades = database.collection('grades');
const match = { userId };
const aggregationPipeline = [
{ $match: match},
{ $addFields: { userIdObj: { $toObjectId: '$userId' } } },
{
$lookup: {
from: 'users',
localField: 'userIdObj',
foreignField: '_id',
as: 'userDetails',
},
},
];
const aggCursor = grades.aggregate(aggregationPipeline);
const aggCursorCount = grades.aggregate([...aggregationPipeline, {
$count: 'count',
}]);
const count = await aggCursorCount.toArray();
const allValues = await aggCursor.limit(10).toArray();
res.json({grades: allValues, count: count[0].count});
Im sure there is a more efficient way to get what I need. Still learning all mongodb stuff.
Thanks!

Related

Mongodb addFields from an async/await function

I am trying to aggregate rooms data then add new fields into it. However, since the rooms itself is an await function, it will return the value of rooms before the forEach function takes place. How should I address such issue? console.log will return:
a, rooms
b, rooms
(notification_count value)
Below is the code that I am trying to use:
let rooms = await this.aggregate([
{
$match: {
userIds: { $all: [userId] },
},
},
{
$lookup: {
from: "users",
localField: "userIds",
foreignField: "_id",
as: "userProfiles",
},
},
]).sort({ updatedAt: -1 });
console.log("a", rooms);
rooms.forEach(async (room) => {
const notification_count =
await ChatMessageModel.getUnreadMessagesCountByRoomId(
room._id,
userId,
"is_read"
);
room["notification_count"] = notification_count;
console.log(notification_count);
});
console.log("b", rooms);
UPDATE
Using for loop instead of forEach worked based on my code workflow. I can use the code as of now. However if there is a way where I can insert the code directly in aggregate/addFields, that would be great. getUnreadMessagesCountByRoomId() only returns int.

Bulk patch operation #azure/cosmos javascript sdk

I have a container of cost guides in my Azure Cosmos DB (using the core SQL API). Each cost guide has an array of materials. I need to add a material to this array in every document in the container. Is this possible with javascript in a single transaction? I am familiar with partially updating documents individually using the patch operation but I would prefer to do it all at once if possible. I'm using the #azure/cosmos version 3.15 package
This is how I update individual documents in my function app:
const CosmosClient = require('#azure/cosmos').CosmosClient;
const config = require('../config/config');
const { endpoint, key, databaseId } = config;
const client = new CosmosClient({ endpoint, key });
const database = client.database(databaseId);
module.exports = async function (context, req) {
const containerId = req.query.containerId;
const container = database.container(containerId);
const id = req.query.id;
const updates = req.body;
const querySpec = {
query: `SELECT * from c where c.id = "${id}"`
}
const { resources: items } = await container.items
.query(querySpec)
.fetchAll()
const patchOp = [];
// loop through updates object
Object.keys(updates).map(key => {
patchOp.push({
op: 'replace',
path: `/${key}`,
value: updates[key]
})
})
const { resource: patchSource } = await container.item(items[0].id, items[0].id).patch(patchOp);
}
Technically, till now no such type of single transaction using Java Script is available. There are other possibilities like using .NET which are used for similar requirements.
Other languages like JAVA and PYTHON are having partial implementation. Terraform can also help a bit in partial implementation.
https://learn.microsoft.com/en-us/azure/cosmos-db/sql/sql-api-sdk-bulk-executor-dot-net
The closest I have seen is using the bulk or batch operation on items within a container. For example:
const operations = [
{
operationType: "Create",
resourceBody: { id: "doc1", name: "sample", key: "A" }
},
{
operationType: "Upsert",
partitionKey: 'A',
resourceBody: { id: "doc2", name: "other", key: "A" }
}
];
await database.container.items.batch(operations);
Link to azure documentation: https://learn.microsoft.com/en-us/javascript/api/#azure/cosmos/items?view=azure-node-latest##azure-cosmos-items-batch

How to populate an array inside a map function in js and send it to the server?

This is my ObjectIds array -
obj_ids = [
"5ee71cc94be8d0180c1b63db",
"5ee71c884be8d0180c1b63d9",
"5ee71c494be8d0180c1b63d6",
"5ee71bfd4be8d0180c1b63d4"
]
I am using these objectids to serach whether they exist in the db or not and based on that I want to send the response to server.
This is the code I am trying but I dont know how to populate the array and send it to the server.
var msg = [];
obj_ids.map((ele) => {
Lead.find({ _id: ele._id }, async function (error, docs) {
if (docs.length) {
msg.push(
`Lead already exist for Lead id - ${ele._id} assgined to ${docs[0].salesPerson}`
);
} else {
msg.push(`Lead doesn't exist for Lead id: ${ele._id}`);
const newDuty = new AssignedDuty({
duty: ele._id,
salesPerson: req.body.salesPerson,
});
await newDuty.save();
}
});
});
res.json(msg);
By doing this approach I am getting an empty array. I cannot put res.json(msg) inside the loop. If it is possible by using async-await, please guide me through.
You don't need to make multiple queries to find whether given object ids exist in the database.
Using $in operator, you can make one query that will return all the documents where the _id is equal to one of the object id in the list.
const docs = await Lead.find({
_id: {
$in: [
"5ee71cc94be8d0180c1b63db",
"5ee71c884be8d0180c1b63d9",
"5ee71c494be8d0180c1b63d6",
"5ee71bfd4be8d0180c1b63d4"
]
}
});
After this query, you can check which object id is present in the docs array and which is absent.
For details on $in operator, see $in comparison operator
Your code can be simplified as shown below:
const obj_ids = [
"5ee71cc94be8d0180c1b63db",
"5ee71c884be8d0180c1b63d9",
"5ee71c494be8d0180c1b63d6",
"5ee71bfd4be8d0180c1b63d4"
];
const docs = await Lead.find({
_id: { $in: obj_ids }
});
const msg = [];
obj_ids.forEach(async (id) => {
const doc = docs.find(d => d._id == id);
if (doc) {
msg.push(
`Lead already exist for Lead id - ${doc._id} assgined to ${doc.salesPerson}`
);
}
else {
msg.push(`Lead doesn't exist for Lead id: ${id}`);
const newDuty = new AssignedDuty({
duty: id,
salesPerson: req.body.salesPerson
});
await newDuty.save();
}
});
res.json(msg);

MongoDB select random 5 documents via find query

I need to find random 5 documents from mongoDB by using find function. I using LoopBack 4 framework. I already try to use sample (it is in comment)
const userParties: IndividualParty[] = (await find(
this.logger,
{
where: {
and: [
{ _id: { nin: ids.map(id => id) } },
{ gender: { inq: gender } },
],
},
//sample: { size: 5 },
//limit: 5,
} as Filter<IndividualParty>,
this.partyRepository,
)) as IndividualParty[];
I'm not familiar with Loopback, but using pure node and node MongoDB driver, here's the shortest example I can come up with:
var run = async function() {
const conn = await require('mongodb').MongoClient.connect('mongodb://localhost:27017', {useNewUrlParser: true})
let agg = [
{'$match': {'_id': {'$gte': 50}}},
{'$sample': {'size': 5}}
]
let res = await conn.db('test').collection('test').aggregate(agg).toArray()
console.log(res)
await conn.close()
}()
In a collection containing _id from 0 to 99, this will randomly output 5 documents having _id larger than 50. Example output:
[ { _id: 60 }, { _id: 77 }, { _id: 84 }, { _id: 96 }, { _id: 63 } ]
You would need to make the above example work with Loopback, but the basic idea is there.
Note:
You need aggregation instead of find().
Have a read through the $sample documentation and note especially its behavior:
$sample uses one of two methods to obtain N random documents, depending on the size of the collection, the size of N, and $sample’s position in the pipeline.
The position of $sample in the pipeline is important. If you need to select a subset of the collection to do $sample on via a $match stage (as with the example above), then you will need to ensure that the subset to be sampled is within 16 MB (the limit of MongoDB in-memory sort).

How count pages total number of pages? [duplicate]

I am interested in optimizing a "pagination" solution I'm working on with MongoDB. My problem is straight forward. I usually limit the number of documents returned using the limit() functionality. This forces me to issue a redundant query without the limit() function in order for me to also capture the total number of documents in the query so I can pass to that to the client letting them know they'll have to issue an additional request(s) to retrieve the rest of the documents.
Is there a way to condense this into 1 query? Get the total number of documents but at the same time only retrieve a subset using limit()? Is there a different way to think about this problem than I am approaching it?
Mongodb 3.4 has introduced $facet aggregation
which processes multiple aggregation pipelines within a single stage
on the same set of input documents.
Using $facet and $group you can find documents with $limit and can get total count.
You can use below aggregation in mongodb 3.4
db.collection.aggregate([
{ "$facet": {
"totalData": [
{ "$match": { }},
{ "$skip": 10 },
{ "$limit": 10 }
],
"totalCount": [
{ "$group": {
"_id": null,
"count": { "$sum": 1 }
}}
]
}}
])
Even you can use $count aggregation which has been introduced in mongodb 3.6.
You can use below aggregation in mongodb 3.6
db.collection.aggregate([
{ "$facet": {
"totalData": [
{ "$match": { }},
{ "$skip": 10 },
{ "$limit": 10 }
],
"totalCount": [
{ "$count": "count" }
]
}}
])
No, there is no other way. Two queries - one for count - one with limit. Or you have to use a different database. Apache Solr for instance works like you want. Every query there is limited and returns totalCount.
MongoDB allows you to use cursor.count() even when you pass limit() or skip().
Lets say you have a db.collection with 10 items.
You can do:
async function getQuery() {
let query = await db.collection.find({}).skip(5).limit(5); // returns last 5 items in db
let countTotal = await query.count() // returns 10-- will not take `skip` or `limit` into consideration
let countWithConstraints = await query.count(true) // returns 5 -- will take into consideration `skip` and `limit`
return { query, countTotal }
}
Here's how to do this with MongoDB 3.4+ (with Mongoose) using $facets. This examples returns a $count based on the documents after they have been matched.
const facetedPipeline = [{
"$match": { "dateCreated": { $gte: new Date('2021-01-01') } },
"$project": { 'exclude.some.field': 0 },
},
{
"$facet": {
"data": [
{ "$skip": 10 },
{ "$limit": 10 }
],
"pagination": [
{ "$count": "total" }
]
}
}
];
const results = await Model.aggregate(facetedPipeline);
This pattern is useful for getting pagination information to return from a REST API.
Reference: MongoDB $facet
Times have changed, and I believe you can achieve what the OP is asking by using aggregation with $sort, $group and $project. For my system, I needed to also grab some user info from my users collection. Hopefully this can answer any questions around that as well. Below is an aggregation pipe. The last three objects (sort, group and project) are what handle getting the total count, then providing pagination capabilities.
db.posts.aggregate([
{ $match: { public: true },
{ $lookup: {
from: 'users',
localField: 'userId',
foreignField: 'userId',
as: 'userInfo'
} },
{ $project: {
postId: 1,
title: 1,
description: 1
updated: 1,
userInfo: {
$let: {
vars: {
firstUser: {
$arrayElemAt: ['$userInfo', 0]
}
},
in: {
username: '$$firstUser.username'
}
}
}
} },
{ $sort: { updated: -1 } },
{ $group: {
_id: null,
postCount: { $sum: 1 },
posts: {
$push: '$$ROOT'
}
} },
{ $project: {
_id: 0,
postCount: 1,
posts: {
$slice: [
'$posts',
currentPage ? (currentPage - 1) * RESULTS_PER_PAGE : 0,
RESULTS_PER_PAGE
]
}
} }
])
there is a way in Mongodb 3.4: $facet
you can do
db.collection.aggregate([
{
$facet: {
data: [{ $match: {} }],
total: { $count: 'total' }
}
}
])
then you will be able to run two aggregate at the same time
By default, the count() method ignores the effects of the
cursor.skip() and cursor.limit() (MongoDB docs)
As the count method excludes the effects of limit and skip, you can use cursor.count() to get the total count
const cursor = await database.collection(collectionName).find(query).skip(offset).limit(limit)
return {
data: await cursor.toArray(),
count: await cursor.count() // this will give count of all the documents before .skip() and limit()
};
It all depends on the pagination experience you need as to whether or not you need to do two queries.
Do you need to list every single page or even a range of pages? Does anyone even go to page 1051 - conceptually what does that actually mean?
Theres been lots of UX on patterns of pagination - Avoid the pains of pagination covers various types of pagination and their scenarios and many don't need a count query to know if theres a next page. For example if you display 10 items on a page and you limit to 13 - you'll know if theres another page..
MongoDB has introduced a new method for getting only the count of the documents matching a given query and it goes as follows:
const result = await db.collection('foo').count({name: 'bar'});
console.log('result:', result) // prints the matching doc count
Recipe for usage in pagination:
const query = {name: 'bar'};
const skip = (pageNo - 1) * pageSize; // assuming pageNo starts from 1
const limit = pageSize;
const [listResult, countResult] = await Promise.all([
db.collection('foo')
.find(query)
.skip(skip)
.limit(limit),
db.collection('foo').count(query)
])
return {
totalCount: countResult,
list: listResult
}
For more details on db.collection.count visit this page
It is possible to get the total result size without the effect of limit() using count() as answered here:
Limiting results in MongoDB but still getting the full count?
According to the documentation you can even control whether limit/pagination is taken into account when calling count():
https://docs.mongodb.com/manual/reference/method/cursor.count/#cursor.count
Edit: in contrast to what is written elsewhere - the docs clearly state that "The operation does not perform the query but instead counts the results that would be returned by the query". Which - from my understanding - means that only one query is executed.
Example:
> db.createCollection("test")
{ "ok" : 1 }
> db.test.insert([{name: "first"}, {name: "second"}, {name: "third"},
{name: "forth"}, {name: "fifth"}])
BulkWriteResult({
"writeErrors" : [ ],
"writeConcernErrors" : [ ],
"nInserted" : 5,
"nUpserted" : 0,
"nMatched" : 0,
"nModified" : 0,
"nRemoved" : 0,
"upserted" : [ ]
})
> db.test.find()
{ "_id" : ObjectId("58ff00918f5e60ff211521c5"), "name" : "first" }
{ "_id" : ObjectId("58ff00918f5e60ff211521c6"), "name" : "second" }
{ "_id" : ObjectId("58ff00918f5e60ff211521c7"), "name" : "third" }
{ "_id" : ObjectId("58ff00918f5e60ff211521c8"), "name" : "forth" }
{ "_id" : ObjectId("58ff00918f5e60ff211521c9"), "name" : "fifth" }
> db.test.count()
5
> var result = db.test.find().limit(3)
> result
{ "_id" : ObjectId("58ff00918f5e60ff211521c5"), "name" : "first" }
{ "_id" : ObjectId("58ff00918f5e60ff211521c6"), "name" : "second" }
{ "_id" : ObjectId("58ff00918f5e60ff211521c7"), "name" : "third" }
> result.count()
5 (total result size of the query without limit)
> result.count(1)
3 (result size with limit(3) taken into account)
Try as bellow:
cursor.count(false, function(err, total){ console.log("total", total) })
core.db.users.find(query, {}, {skip:0, limit:1}, function(err, cursor){
if(err)
return callback(err);
cursor.toArray(function(err, items){
if(err)
return callback(err);
cursor.count(false, function(err, total){
if(err)
return callback(err);
console.log("cursor", total)
callback(null, {items: items, total:total})
})
})
})
Thought of providing a caution while using the aggregate for the pagenation. Its better to use two queries for this if the API is used frequently to fetch data by the users. This is atleast 50 times faster than getting the data using aggregate on a production server when more users are accessing the system online. The aggregate and $facet are more suited for Dashboard , reports and cron jobs that are called less frequently.
We can do it using 2 query.
const limit = parseInt(req.query.limit || 50, 10);
let page = parseInt(req.query.page || 0, 10);
if (page > 0) { page = page - 1}
let doc = await req.db.collection('bookings').find().sort( { _id: -1 }).skip(page).limit(limit).toArray();
let count = await req.db.collection('bookings').find().count();
res.json({data: [...doc], count: count});
I took the two queries approach, and the following code has been taken straight out of a project I'm working on, using MongoDB Atlas and a full-text search index:
return new Promise( async (resolve, reject) => {
try {
const search = {
$search: {
index: 'assets',
compound: {
should: [{
text: {
query: args.phraseToSearch,
path: [
'title', 'note'
]
}
}]
}
}
}
const project = {
$project: {
_id: 0,
id: '$_id',
userId: 1,
title: 1,
note: 1,
score: {
$meta: 'searchScore'
}
}
}
const match = {
$match: {
userId: args.userId
}
}
const skip = {
$skip: args.skip
}
const limit = {
$limit: args.first
}
const group = {
$group: {
_id: null,
count: { $sum: 1 }
}
}
const searchAllAssets = await Models.Assets.schema.aggregate([
search, project, match, skip, limit
])
const [ totalNumberOfAssets ] = await Models.Assets.schema.aggregate([
search, project, match, group
])
return await resolve({
searchAllAssets: searchAllAssets,
totalNumberOfAssets: totalNumberOfAssets.count
})
} catch (exception) {
return reject(new Error(exception))
}
})
I had the same problem and came across this question. The correct solution to this problem is posted here.
You can do this in one query. First you run a count and within that run the limit() function.
In Node.js and Express.js, you will have to use it like this to be able to use the "count" function along with the toArray's "result".
var curFind = db.collection('tasks').find({query});
Then you can run two functions after it like this (one nested in the other)
curFind.count(function (e, count) {
// Use count here
curFind.skip(0).limit(10).toArray(function(err, result) {
// Use result here and count here
});
});

Categories