I have a collection that has the following documents:
{
_id: ObjectId("000000000000000000059734"),
locations: ["A", "B", "C"]
},
{
_id: ObjectId("000000000000000000059735"),
locations: ["A", "D", "K"]
},
{
_id: ObjectId("000000000000000000059736"),
locations: ["1", "3", "C"]
}
Now what I want is to count the total of documents based on the following array items:
let array = ['A', 'B', '1'];
my desired result is:
{
'A': 2,
'B': 1,
'1': 1
}
What I have tried:
db.getCollection('mycollection').aggregate([
{$group: {
"_id": {
"location": {
"A": { "$sum": { "$cond": [{ "$in": [ "A", "$locations" ] },1,0] } },
"B": { "$sum": { "$cond": [{ "$in": [ "B", "$locations" ] },1,0] } },
"1": { "$sum": { "$cond": [{ "$in": [ "1", "$locations" ] },1,0] } },
}
}
}}
])
But my query result format is not the same thing I want.
For any help and guides thanks.
If you have MongoDB 3.4.4 at least, then you can do something like this:
var array = ['A', 'B', '1'];
db.getCollection('mycollection').aggregate([
{ "$project": {
"locations": {
"$map": {
"input": {
"$filter": {
"input": "$locations",
"cond": { "$in": [ "$$this", array ] }
}
},
"in": { "k": "$$this", "v": 1 }
}
}
}},
{ "$unwind": "$locations" },
{ "$group": {
"_id": "$locations.k",
"v": { "$sum": "$locations.v" }
}},
{ "$sort": { "_id": 1 } },
{ "$group": {
"_id": null,
"obj": { "$push": { "k": "$_id", "v": "$v" } }
}},
{ "$replaceRoot": {
"newRoot": { "$arrayToObject": "$obj" }
}}
])
For an older version without things like $arrayToObject, you would transform the results "after" they are returned from the server, like this:
var array = ['A', 'B', '1'];
db.getCollection('mycollection').aggregate([
{ "$project": {
"locations": {
"$map": {
"input": {
"$filter": {
"input": "$locations",
"cond": {
// "$in": [ "$$this", array ]
"$or": array.map(a => ({ "$eq": [ "$$this", a ] }) )
}
}
},
"in": { "k": "$$this", "v": 1 }
}
}
}},
{ "$unwind": "$locations" },
{ "$group": {
"_id": "$locations.k",
"v": { "$sum": "$locations.v" }
}},
{ "$sort": { "_id": 1 } },
{ "$group": {
"_id": null,
"obj": { "$push": { "k": "$_id", "v": "$v" } }
}},
/*
{ "$replaceRoot": {
"newRoot": { "$arrayToObject": "$obj" }
}}
*/
]).map(d =>
d.obj.reduce((acc,curr) => Object.assign(acc,{ [curr.k]: curr.v }),{})
)
In either case, the very first stage is to $project with a $map in order to look at each value in the document array and compare it to the comparison array. In fact we use $filter to simply return only the "matches", and the $map returns a value of 1 to count each occurrence.
There are two basic approaches to the "filtering" either using $in for versions that support the operator, or using $or in older versions before that was introduced.
Frankly, would could simply use $setIntersection to get the matches as long as your document data is "unique", in that no document array contains more than one occurrence of a value. So I'm playing safe here with $filter because I don't know your data. Choose whichever suits.
// If the "locations" content is meant to be "unique"
{ "$project": {
"locations": {
"$map": {
"input": {
"$setIntersection": [ "$locations", array ]
},
"in": { "k": "$$this", "v": 1 }
}
}
}},
Note the $map output in k and v property form. This will continue as a pattern through the rest of the pipeline.
Because you want to "aggregate" on the k values from the array items, we use $unwind so we can add these together across documents. Then feed that through $group on the values of k and using $sum on each v to effectively "count" the occurrences.
The $sort is completely optional, really you should not care about the order of keys in a single output document. Note the difference from your "desire", but that's just a plain fact that "1" is lexically "less then" "A". So you cannot fight that, and it's just how the world works.
The next stage is simply to $group to a single document. Here we continue by reconstructing to an "array" with objects containing k and v.
The reason for this is because of the final handling. Either where you have a MongoDB with $arrayToObject support ( actually included since 3.4.4, though documentation claims 3.6 ). Where you do, we simply provide the generated "array" as input to this inside a $replaceRoot stage in order to return the final output.
Where you don't have that feature, you can process the cursor results ( here shown using the shell Cursor.map() ) and transform the document before further processing. Any iterator method will do, and most drivers have a Cursor.map(). Not that it really matters that much here, since the aggregation pipeline results in one document in this case.
The JavaScript way as will work in modern shell releases is to simply apply a .reduce() over the array and transform the output object into the desired object output. It's basically the exact same operation that the server does, but just in client code.
Either form returns the desired result:
{
"1" : 1.0,
"A" : 2.0,
"B" : 1.0
}
db.mycollection.aggregate(
// Pipeline
[
// Stage 1
{
$unwind: {
path: '$locations'
}
},
// Stage 2
{
$match: {
locations: {
$in: ['A', 'B', '1']
}
}
},
// Stage 3
{
$group: {
_id: '$locations',
total: {
$sum: 1
}
}
}
]
);
Related
currently, I am struggling with how the MongoDB document system works. I want to fetch array elements with an auto-generated id but how to fetch that specific data that I don't know.
my current schema is
const ItemPricesSchema = new mongoose.Schema({
_id : {
type: String
},
ItemsPrices: {
type: [{
barcode : {
type: String
},
itemName : {
type: String
},
price : {
type: String
}
}]
}
});
current data is stored in this way
{
"_id": "sha#c.c",
"ItemsPrices": [
{
"barcode": "345345",
"itemName": "maggie",
"price": "45",
"_id": "620a971e11120abbde5f4c3a"
},
{
"barcode": "356345",
"itemName": "monster",
"price": "70",
"_id": "620a971e11120abbde5f4c3b"
}
],
"__v": 0
}
what I want to achieve is that I want to find array elements through ids
if I want a specific array element with id "620a971e11120abbde5f4c3b" what should I do??
I have tried $unwind , $in, $match...
the result should be like
{
"_id": "sha#c.c",
"ItemsPrices": [
{
"barcode": "356345",
"itemName": "monster",
"price": "70",
"_id": "620a971e11120abbde5f4c3b"
}
],
"__v": 0
}
what I tried is like this from the answer
router.get('/filter/:id', async (req, res) => {
try {
const item = await ItemPricesSchema.aggregate([
{$project: {
"ItemsPrices": {
$filter: {
input: "$ItemsPrices",
as: "item",
cond: {
$eq: [
"$$item._id",
"620a8dd1c88ae3eb88a8107a"
]
}
}
}
}
}
])
res.json(item);
console.log(item);
} catch (error) {
res.status(500).json({message: error.message});
}
})
and returns something like this (Empty arrays)
[
{
"_id": "xvz#zyx.z",
"ItemsPrices": []
},
{
"_id": "zxc#xc.czx",
"ItemsPrices: []
},
{
"_id": "asd#asd.asd",
"ItemsPrices": []
},
{
"_id": "qwe#qwe.qwe",
"ItemsPrices": []
}
]
but If I search for price $$item.price
cond: {
$eq: [
"$$item.price",
"30"
]
}
it returns the perfect output
[
{
"_id": "xvz#zyx.z",
"ItemsPrices": []
},
{
"_id": "zxc#xc.czx",
"ItemsPrices: []
},
{
"_id": "asd#asd.asd",
"ItemsPrices": []
},
{
"_id": "qwe#qwe.qwe",
"ItemsPrices": [
{
"barcode":"234456345",
"price":"30",
"itemName":"monster",
"_id":"620a8dd1c88ae3eb88a8107a"
}
]
}
]
You can do an aggregation with $project and apply $filter on the array part. In mongoose you can apply the aggregation query in a more or less similar way https://mongoosejs.com/docs/api/aggregate.html
db.collection.aggregate([
{
$project: {
"ItemsPrices": {
$filter: {
input: "$ItemsPrices",
as: "item",
cond: {
$eq: [
"$$item._id",
mongoose.Types.ObjectId("620a971e11120abbde5f4c3b")
]
}
}
},
"__v": 1 //when projecting 1 means in the final result this field appears
}
}
])
more examples
demo
Option 1:
Use $filter in an aggregation query as explained by cmgchess
Option 2:
If you only want one object from array you can use $elemMatch like this:
db.collection.find({
"ItemsPrices._id": "620a971e11120abbde5f4c3b"
},
{
"ItemsPrices": {
"$elemMatch": {
"_id": "620a971e11120abbde5f4c3b"
}
}
})
Example here
But take care, using $elemMatch only the first element is returned. Check this other example where there are two objects with the desired _id but only returns one.
As said before, if you only one (or only exists one) maybe you can use find and $elemMatch to avoid a filter by the entire array. But if can be multiple values use $filter.
For the following dataset example:
lists
{ _id: 1, included_lists: [ 2 ], items: [ "i1" ]}
{ _id: 2, included_lists: [], items: [ "i2", "i3" ]}
items
{ _id: "i1", details: [{}, {}, {}]}
{ _id: "i2", details: [{}, {}, {}]}
{ _id: "i3", details: [{}, {}, {}]}
I want to grab all the items for a list, including the ones attached to the included_lists
For example: if we're looking at list _id 1, we should get items i1, i2, i3
I have an idea how to do this, which involves using populate or $lookup, but I'm not sure how to unwind the nested items inside the included_lists and join them with the items in the original list.
In the end, I would like to have a dataset where I am able to use limit, skip and match.
I'm using mongoose, but vanilla mongodb code would also be fine.
Update
My current idea of how to do this is to retrieve all of the list ids first in one query i.e.
List.find({ _id: id}, { included_lists: 1})
Then, with the list ids, make an array of that i.e.
var all_ids = [id, ...included_lists]
Then just find the items and unwind
Psuedo-code:
List
.aggregate([
{
$match: {
_id: {
$in: all_ids
}
}
},
{ $lookup: {} }
{
$unwind: "$items"
},
{
$project: {
"list.name": 1,
"list._id": 1,
"items": 1
}
}
])
But I don't want to have to do a first query to retrieve all the list_ids, I should be able to retrieve all related items just through one _id which would then be able to retrieve the rest of the items through included_lists
You can try below aggregation from mongodb 3.6 and above
List.aggregate([
{ "$match": { "_id": id }},
{ "$lookup": {
"from": Items.collection.name,
"let": { "items": "$items" },
"pipeline": [
{ "$match": { "$expr": { "$in": [ "$_id", "$$items" ] } } }
],
"as": "items"
}},
{ "$lookup": {
"from": Lists.collection.name,
"let": { "included_lists": "$included_lists", "items": "$items" },
"pipeline": [
{ "$match": { "$expr": { "$in": [ "$_id", "$$included_lists" ] } } },
{ "$lookup": {
"from": Items.collection.name,
"let": { "items": "$items" },
"pipeline": [
{ "$match": { "$expr": { "$in": [ "$_id", "$$items" ] } } }
],
"as": "items"
}},
{ "$project": { "allItems": { "$concatArrays": [ "$$items", "$items" ]}}}
],
"as": "included_lists"
}},
{ "$unwind": "$included_lists" },
{ "$replaceRoot": { "newRoot": "$included_lists" }}
])
You can try below aggregation in 3.4.
Initial $lookup to get the items values for included_lists followed by $concatArrays to merge the looked up items and items.
Second $lookup to get the item details followed by $unwind to flatten the results.
List.aggregate([
{"$lookup":{
"from":name of the list collection,
"localField":"included_lists",
"foreignField":"_id",
"as":"included_items"
}},
{"$unwind":"$included_items"},
{"$project":{"allItems":{"$concatArrays":["$items","$included_items.items"]}}},
{"$lookup":{
"from":name of the item collection,
"localField":"allItems",
"foreignField":"_id",
"as":"lookedup_items"
}},
{"$unwind":"$lookedup_items"},
{"$skip": some number},
{"$limit": some number}
])
I have created an aggregate function and I feel it's pretty long and non-DRY. I'm wondering what ways I can improve it.
My Thread model has a sub-document called revisions. The function tries to get the most recent revision that has the status of APPROVED.
Here is the full model.
{
"_id": ObjectId("56dc750769faa2393a8eb656"),
"slug": "my-thread",
"title": "my-thread",
"created": 1457249482555.0,
"user": ObjectId("56d70a491128bb612c6c9220"),
"revisions": [
{
"body": "This is the body!",
"status": "APPROVED",
"_id": ObjectId("56dc750769faa2393a8eb657"),
"comments": [
],
"title": "my-thread"
}
]
}
And here is the aggregate function I want to improve.
Thread.aggregate([
{ $match: {
slug: thread
} },
{ $project: {
user: '$user',
created: '$created',
slug: '$slug',
revisions: {
$filter: {
input: '$revisions',
as: 'revision',
cond: { $eq: [ '$$revision.status', 'APPROVED' ] }
}
}
} },
{ $sort: { 'revisions.created': -1 } },
{ $project: {
user: '$user',
created: '$created',
slug: '$slug',
revisions: { $slice: ["$revisions", 0, 1] }
} },
{ $unwind: '$revisions'},
{ $project: {
body: '$revisions.body',
title: '$revisions.title',
user: '$user',
slug: '$slug',
created: '$created'
}}
])
Well you cannot really since there are $sort and $unwind stages in between on purpose. It's also basically "wrong", since the $sort cannot re-order the array until you $unwind it first.
Then it is better to use $group and $first instead, to just get the first element from the sort in each document:
Thread.aggregate([
{ "$match": {
"slug": thread
} },
{ "$project": {
"user": 1,
"created": 1,
"slug": 1,
"revisions": {
"$filter": {
"input": "$revisions",
"as": "revision",
"cond": { "$eq": [ "$$revision.status", "APPROVED" ] }
}
}
} },
// Cannot sort until you $unwind
{ "$unwind": "$revisions" },
// Now that will sort the elements
{ "$sort": { "_id": 1, "revisions.created": -1 } },
// And just grab the $first boundary for everything
{ "$group": {
"_id": "$_id",
"body": { "$first": "$revisions.body" },
"title": { "$first": "$revisions.title" },
"user": { "$first": "$user" },
"slug": { "$first": "$slug" },
"created": { "$first": "$created" }
}}
])
You could always reform the array with $push and then apply $arrayElemAt instead of the $slice to yield just a single element, but it's kind of superflous considering it would need another $project after the $group in the first place.
So even though there are "some" operations you can do without using $unwind, unfortunately "sorting" the arrays generated out of functions like $filter is not something that can be presently done, until you $unwind the array first.
If you didn't "need" the $sort on the "revisions.created" ( notably missing from your sample document ) then you can instead just use normal projection instead:
Thread.find(
{ "slug": slug, "revisions.status": "APPROVED" },
{ "revisions.$": 1 },
)
Only when sorting array elements would you need anything else, since the $ positional operator will just return the first matched element anyway.
I am trying to aggregate a collection which has an array. In this array, there's a reminders array. The document might look like this:
{
_id: "1234",
dates: {
start: ISODate(),
end: ISODate()
},
reminders: [{
sendAt: ISODate(),
status: 'closed'
}, {
sendAt: ISODate(),
status: 'open'
}]
}
Say the first one is before today and the next one is after today. What I want to do is get the array of all that come before today, OR, an empty array if none came before today. I tried the following aggregation
db.reminders.aggregate([
{ $match: { 'dates.end': { $gt: new Date } } },
{ $unwind: '$reminders' },
{
$match: {
reminders: {
$elemMatch: {
sendAt: { $lt: new Date() },
status: { $ne: 'open' }
}
}
}
}
])
However, if there are no reminders before today, it will fail and give nothing back.
Is there a way to construct this structure with mongodb aggregation?
NOTE: I can't use $filter because that is in 3.2
You can use the $redact operator, to filter out sub-documents for versions >=2.6 .
It also avoids the unnecessary $unwind stage.
$match all the documents that have their dates.end attribute greater than the search criteria.
$redact through all sub-documents and do the following, $$DESCEND into those documents, that match the conditions, else $$PRUNE.
sample code:
var endDateToMatch = ISODate("2014-01-01T00:00:00Z");
var currentDate = ISODate();
db.t.aggregate([
{
$match:{"dates.end":{$gt:endDateToMatch}}
},
{
$redact:{$cond:[
{$and:[
{$ne:[{$ifNull:["$status",""]},
"open"]},
{$lt:[{$ifNull:["$sendAt",currentDate-1]},
currentDate]}
]
},
"$$DESCEND","$$PRUNE"]}
}
])
This would give you one document per document that matches the $match stage. If you need to accumulate all the sub-documents, then you need to $unwind "reminders" and $group by _id as null.
So you basically want $filter behavior but need to do it in an earlier version, with your main case being returning documents even if the array content ends up empty.
For MongoDB 2.6 you can do "almost" the same thing with $map and $setDifference:
db.reminders.aggregate([
{ "$match": { "dates.end": { "$gt": new Date() } } },
{ "$project": {
"dates": 1,
"reminders": {
"$setDifference": [
{ "$map": {
"input": "$reminders",
"as": "reminder",
"in": {
"$cond": [
{ "$and": [
{ "$lt": [ "$$reminder.sendAt", new Date() ] },
{ "$ne": [ "$$reminder.status", "open" ] }
]},
"$$reminder",
false
]
}
}},
[false]
]
}
}}
])
And that is okay as long as the resulting "set" from $setDifference is all unqiuely identified items. So the $map method applies the test, either returning the content or false if there was no match to conditions. The $setDifferene essentially removes any false elements from the results, but of course as a "set" would count any items exactly the same as one.
If your MongoDB is less than 2.6 ( or the case of "sets" makes the above unusable), it just requires being a bit more careful when looking at the content to filter:
db.reminders.aggregate([
{ "$match": { "dates.end": { "$gt": new Date() } } },
{ "$unwind": "$reminders" },
// Count where condition matched
{ "$group": {
"_id": "$_id",
"dates": { "$first": "$dates" },
"reminders": { "$push": "$reminders" },
"matched": { "$sum": {
"$cond": [
{ "$and": [
{ "$lt": [ "$reminders.sendAt", new Date() ] },
{ "$ne": [ "$reminders.status", "open" ] }
]},
1,
0
]
}}
}},
// Substitute array where no count just for brevity
{ "$project": {
"dates": 1,
"reminders": { "$cond": [
{ "$eq": [ "$matched", 0 ] },
{ "$const": [false] },
"$reminders"
]},
"matched": 1
}},
// Unwind again
{ "$unwind": "$reminders" },
// Filter for matches "or" where there were no matches to keep
{ "$match": {
"$or": [
{
"reminder.sendAt": { "$lt": new Date() },
"reminder.status": { "$ne": "open" }
},
{ "matched": 0 }
]
}},
// Group again
{ "$group": {
"_id": "$_id",
"dates": { "$first": "$dates" },
"reminders": { "$push": "$reminders" }
}},
// Replace the [false] array with an empty one
{ "$project": {
"dates": 1,
"reminders": { "$cond": [
{ "$eq": [ "$reminders", [false] ] },
{ "$const": [] },
"$reminders"
]}
}}
])
It's a bit long winded, but it's basically doing the same thing.
Also note that $elemMatch does not apply after processing $unwind, since the content is in fact no longer an array. Simple dot notation applies to the elements that are now in individual documents.
Assuming I have a schema that looks something like this:
{
field: [{
subDoc: ObjectId,
...
}],
...
}
and I have some list of ObjectIds (user input), how would I get a count of those specific ObjectIds? For exmaple, if I have data like this:
[
{field: [ {subDoc: 123}, {subDoc: 234} ]},
{field: [ {subDoc: 234}, {subDoc: 345} ]},
{field: [ {subDoc: 123}, {subDoc: 345}, {subDoc: 456} ]}
]
and the list of ids given by the user is 123, 234, 345, I need to get a count the given ids, so a result approximating this:
{
123: 2,
234: 2,
345: 2
}
What would be the best way to go about this?
The aggregation framework itself if not going to dynamically name keys the way you have presented as a proposed output, and that probably is a good thing really. But you can probably just do a query like this:
db.collection.aggregate([
// Match documents that contain the elements
{ "$match": {
"field.subDoc": { "$in": [123,234,345] }
}},
// De-normalize the array field content
{ "$unwind": "$field" },
// Match just the elements you want
{ "$match": {
"field.subDoc": { "$in": [123,234,345] }
}},
// Count by the element as a key
{ "$group": {
"_id": "$field.subDoc",
"count": { "$sum": 1 }
}}
])
That gives you output like this:
{ "_id" : 345, "count" : 2 }
{ "_id" : 234, "count" : 2 }
{ "_id" : 123, "count" : 2 }
But if you really want to go nuts on this, you are specifying the "keys" that you want as part of your query, so you could form a pipeline like this:
db.collection.aggregate([
{ "$match": {
"field.subDoc": { "$in": [123,234,345] }
}},
{ "$unwind": "$field" },
{ "$match": {
"field.subDoc": { "$in": [123,234,345] }
}},
{ "$group": {
"_id": "$field.subDoc",
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": null,
"123": {
"$max": {
"$cond": [
{ "$eq": [ "$_id", 123 ] },
"$count",
0
]
}
},
"234": {
"$max": {
"$cond": [
{ "$eq": [ "$_id", 234 ] },
"$count",
0
]
}
},
"345": {
"$max": {
"$cond": [
{ "$eq": [ "$_id", 345 ] },
"$count",
0
]
}
}
}}
])
Which is a relatively simple thing to construct that last stage in code by just processing the list of arguments:
var list = [123,234,345];
var group2 = { "$group": { "_id": null } };
list.forEach(function(id) {
group2["$group"][id] = {
"$max": {
"$cond": [
{ "$eq": [ "$_id", id ] },
"$count",
0
]
}
};
});
And that comes out more or less how you want it.
{
"_id" : null,
"123" : 2,
"234" : 2,
"345" : 2
}
Not exactly what you're asking for but it can give you an idea:
db.test.aggregate([
{
$unwind: '$field'
},
{
$group: {
_id: {
subDoc: '$field.subDoc'
},
count: {
$sum: 1
}
}
},
{
$project: {
subDoc: '$subDoc.subDoc',
count: '$count'
}
}
]);
Output:
{
"result": [
{
"_id": {
"subDoc": 456
},
"count": 1
},
{
"_id": {
"subDoc": 345
},
"count": 2
},
{
"_id": {
"subDoc": 234
},
"count": 2
},
{
"_id": {
"subDoc": 123
},
"count": 2
}
],
"ok": 1
}