MongoDB Aggregate Sum Each Key on a Subdocument - javascript

I have multiple documents with this schema, each document is per product per day:
{
_id:{},
app_id:'DHJFK67JDSJjdasj909',
date:'2014-08-07',
event_count:32423,
event_count_per_type: {
0:322,
10:4234,
20:653,
30:7562
}
}
I would like to get the sum of each event_type for a particular date_range.
This is the output I am looking for where each event type has been summed across all the documents. The keys for event_count_per_type can be anything, so I need something that can loop through each of them as opposed to be having to be implicit with their names.
{
app_id:'DHJFK67JDSJjdasj909',
event_count:324236456,
event_count_per_type: {
0:34234222,
10:242354,
20:456476,
30:56756
}
}
I have been trying several queries so far, this is the best I have got so far but the sub document values are not summed:
db.events.aggregate(
{
$match: {app_id:'DHJFK67JDSJjdasj909'}
},
{
$group: {
_id: {
app_id:'$app_id',
},
event_count: {$sum:'$event_count'},
event_count_per_type: {$sum:'$event_count_per_type'}
}
},
{
$project: {
_id:0,
app_id:'$_id.app_id',
event_count:1,
event_count_per_type:1
}
}
)
The output I am seeing is a value of 0 for the event_count_per_type key, instead of an object. I could modify the schema so the keys are on the top level of the document but that will still mean that I need to have an entry in the group statement for each key, which as I do not know what the key names will be I cannot do.
Any help would be appreciated, I am willing to change my schema if need be and also to try mapReduce (although from the documentation it seems like the performance is bad.)

As stated, processing documents like this is not possible with the aggregation framework unless you are actually going to supply all of the keys, such as:
db.events.aggregate([
{ "$group": {
"_id": "$app_id",
"event_count": { "$sum": "$event_count" },
"0": { "$sum": "$event_count_per_type.0" },
"10": { "$sum": "$event_count_per_type.10" }
"20": { "$sum": "$event_count_per_type.20" }
"30": { "$sum": "$event_count_per_type.30" }
}}
])
But you do of course have to explicitly specify every key you wish to work on. This is true of both the aggregation framework and general query operations in MongoDB, as to access elements notated in this "sub-document" form you need to specify the "exact path" to the element in order to do anything with it.
The aggregation framework and general queries have no concept of "traversal", which mean they cannot process "each key" of a document. That requires a language construct in order to do which is not provided in these interfaces.
Generally speaking though, using a "key name" as a data point where it's name actually represents a "value" is a bit of an "anti-pattern". A better way to model this would be to use an array and represent your "type" as a value by itself:
{
"app_id": "DHJFK67JDSJjdasj909",
"date: ISODate("2014-08-07T00:00:00.000Z"),
"event_count": 32423,
"events": [
{ "type": 0, "value": 322 },
{ "type": 10, "value": 4234 },
{ "type": 20, "value": 653 },
{ "type": 30, "value": 7562 }
]
}
Also noting that the "date" is now a proper date object rather than a string, which is also something that is good practice to do. This sort of data though is easy to process with the aggregation framework:
db.events.aggregate([
{ "$unwind": "$events" },
{ "$group": {
"_id": {
"app_id": "$app_id",
"type": "$events.type"
},
"event_count": { "$sum": "$event_count" },
"value": { "$sum": "$value" }
}},
{ "$group": {
"_id": "$_id.app_id",
"event_count": { "$sum": "$event_count" },
"events": { "$push": { "type": "$_id.type", "value": "$value" } }
}}
])
That shows a two stage grouping that first gets the totals per "type" without specifying each "key" since you no longer have to, then returns as a single document per "app_id" with the results in an array as they were originally stored. This data form is generally much more flexible for looking at certain "types" or even the "values" within a certain range.
Where you cannot change the structure then your only option is mapReduce. This allows you to "code" the traversal of the keys, but since this requires JavaScript interpretation and execution it is not as fast as the aggregation framework:
db.events.mapReduce(
function() {
emit(
this.app_id,
{
"event_count": this.event_count,
"event_count_per_type": this.event_count_per_type
}
);
},
function(key,values) {
var reduced = { "event_count": 0, "event_count_per_type": {} };
values.forEach(function(value) {
for ( var k in value.event_count_per_type ) {
if ( !redcuced.event_count_per_type.hasOwnProperty(k) )
reduced.event_count_per_type[k] = 0;
reduced.event_count_per_type += value.event_count_per_type;
}
reduced.event_count += value.event_count;
})
},
{
"out": { "inline": 1 }
}
)
That will essentially traverse and combine the "keys" and sum up the values for each one found.
So you options are either:
Change the structure and work with standard queries and aggregation.
Stay with the structure and require JavaScript processing and mapReduce.
It depends on your actual needs, but in most cases restructuring yields benefits.

Related

MongoDB: Efficiency of operation pushing to a nested array or updating it when identifier found, using aggregation pipeline

I have a document that holds lists containing nested objects. The document simplified looks like this:
{
"username": "user",
"listOne": [
{
"name": "foo",
"qnty": 5
},
{
"name": "bar",
"qnty": 3
},
],
"listTwo": [
{
"id": 1,
"qnty": 13
},
{
"id": 2,
"qnty": 9
},
]
}
And I need to update quantity in lists based on an indentifier. For list one it was easy. I was doing something like this:
db.collection.findOneAndUpdate(
{
"username": "user",
"listOne.name": name
},
{
$inc: {
"listOne.$.qnty": qntyChange,
}
}
)
Then I would catch whenever find failed because there was no object in the list with that name and nothing was updated, and do a new operation with $push. Since this is a rarer case, it didn't bother me to do two queries in the database collection.
But now I had to also add list two to the document. And since the identifiers are not the same I would have to query them individually. Meaning four searches in the database collection, in the worst case scenario, if using the same strategy I was using before.
So, to avoid this, I wrote an update using an aggregation pipeline. What it does is:
Look if there is an object in the list one with the queried identifier.
If true, map through the entire array and:
2.1) Return the same object if the identifier is different.
2.2) Return object with the quantity changed when identifier matches.
If false, push a new object with this identifier to the list.
Repeat for list two
This is the pipeline for list one:
db.coll1.updateOne(
{
"username": "user"
},
[{
"$set": {
"listOne": {
"$cond": {
"if": {
"$in": [
name,
"$listOne.name"
]
},
"then": {
"$map": {
"input": "$listOne",
"as": "one",
"in": {
"$cond": {
"if": {
"$eq": [
"$$one.name",
name
]
},
"then": {
"$mergeObjects": [
"$$one",
{
"qnty": {
"$add": [
"$$one.qnty",
qntyChange
]
}
}
]
},
"else": "$$one"
}
}
}
},
"else": {
"$concatArrays": [
"$listOne",
[
{
"name": name,
"qnty": qntyChange
}
]
]
}
}
}
}
}]
);
Entire pipeline can be foun on this Mongo Playgorund.
So my question is about how efficient is this. As I am paying for server time, I would like to use an efficient solution to this problem. Querying the collection four times, or even just twice but at every call, seems like a bad idea, as the collection will have thousands of entries. The two lists, on the other hand, are not that big, and should not exceed a thousand elements each. But the way it's written it looks like it will iterate over each list about two times.
And besides, what worries me the most, is if when I use map to change the list and return the same object, in cases where the identifier does not match, does MongoDB rewrite these elements too? Because not only would that increase my time on the server rewriting the entire list with the same objects, but it would also count towards the bytes size of my write operation, which are also charged by MongoDB.
So if anyone has a better solution to this, I'm all ears.
According to this SO answer,
What you actually do inside of the document (push around an array, add a field) should not have any significant impact on the total cost of the operation
So, in your case, your array operations should not be causing a heavy impact on the total cost.

How to calculate difference of lowest date and highest date among array of sub documents?

I have the following sub-documents:
experiences: [
{
"workExperienceId" : ObjectId("59f8064e68d1f61441bec94a"),
"workType" : "Full Time",
"functionalArea" : "Law",
"company" : "Company A",
"title" : "new",
"from" : ISODate("2010-10-13T00:00:00.000Z"),
"to" : ISODate("2012-10-13T00:00:00.000Z"),
"_id" : ObjectId("59f8064e68d1f61441bec94b"),
"currentlyWorking" : false
},
...
...
{
"workExperienceId" : ObjectId("59f8064e68d1f61441bec94a"),
"workType" : "Full Time",
"functionalArea" : "Law",
"company" : "Company A",
"title" : "new",
"from" : ISODate("2014-10-14T00:00:00.000Z"),
"to" : ISODate("2015-12-13T00:00:00.000Z"),
"_id" : ObjectId("59f8064e68d1f61441bec94c"),
"currentlyWorking" : false
},
{
"workExperienceId" : ObjectId("59f8064e68d1f61441bec94a"),
"workType" : "Full Time",
"functionalArea" : "Law",
"company" : "Company A",
"title" : "new",
"from" : ISODate("2017-10-13T00:00:00.000Z"),
"to" : null,
"_id" : ObjectId("59f8064e68d1f61441bec94d"),
"currentlyWorking" : true
},
{
"workExperienceId" : ObjectId("59f8064e68d1f61441bec94a"),
"workType" : "Full Time",
"functionalArea" : "Law",
"company" : "Company A",
"title" : "new",
"from" : ISODate("2008-10-14T00:00:00.000Z"),
"to" : ISODate("2009-12-13T00:00:00.000Z"),
"_id" : ObjectId("59f8064e68d1f61441bec94c"),
"currentlyWorking" : false
},
]
As you see, there may not be date ordered within sequential date and maybe a non ordered date. Above data is for each user. So what I want is to get total years of experience for each user in year format. When to field is null and currentlyWorking is true then it means that I am currently working on that company.
Aggregation
Using the aggregation framework you could apply $indexOfArray where you have it available:
Model.aggregate([
{ "$addFields": {
"difference": {
"$subtract": [
{ "$cond": [
{ "$eq": [{ "$indexOfArray": ["$experiences.to", null] }, -1] },
{ "$max": "$experiences.to" },
new Date()
]},
{ "$min": "$experiences.from" }
]
}
}}
])
Failing that as long as the "latest" is always the last in the array, using $arrayElemAt:
Model.aggregate([
{ "$addFields": {
"difference": {
"$subtract": [
{ "$cond": [
{ "$eq": [{ "$arrayElemAt": ["$experiences.to", -1] }, null] },
new Date(),
{ "$max": "$experiences.to" }
]},
{ "$min": "$experiences.from" }
]
}
}}
])
That's pretty much the most efficient ways to do this, as a single pipeline stage applying $min and $max operators. For $indexOfArray you would need MongoDB 3.4 at least, and for simply using $arrayElemAt you can have MongoDB 3.2, which is the minimal version you should be running in production environments anyway.
One pass, means it gets done fast with little overhead.
The brief parts are the $min and $max allow you to extract the appropriate values directly from the array elements, being the "smallest" value of "from" and the largest value of "to" within the array. Where available the $indexOfArray operator can return the matched index from a provided array ( in this case from "to" values ) where a specified value ( as null here ) exists. If it's there the index of that value is returned, and where it is not the value of -1 is returned indicating that it is not found.
We use $cond which is a "ternary" or if..then..else operator to determine that when the null is not found then you want the $max value from "to". Of course when it is found this is the else where the value of the current Date which is fed into the aggregation pipeline as an external parameter on execution is returned instead.
The alternate case for a MongoDB 3.2 is that you instead "presume" the last element of your array is the most recent employment history item. In generally would be best practice to order these items so the most recent was either the "last" ( as seems to be indicated in your question ) or the "first" entry of the array. It would be logical to keep these entries in such order as opposed to relying on sorting the list at runtime.
So when using a "known" position such as "last", we can use the $arrayElemAt operator to return the value from the array at the specified position. Here it is -1 for the "last" element. The "first" element would be 0, and could arguably be applied to geting the "smallest" value of "from" as well, since you should have your array in order. Again $cond is used to transpose the values depending on whether null is returned. As an alternate to $max you can even use $ifNull to swap the values instead:
Model.aggregate([
{ "$addFields": {
"difference": {
"$subtract": [
{ "$ifNull": [{ "$arrayElemAt": ["$experiences.to", -1] }, new Date()] },
{ "$min": "$experiences.from" }
]
}
}}
])
That operator essentially switches out the values returned if the response of the first condition is null. So since we are grabbing the value from the "last" element already, we can "presume" that this does mean the "largest" value of "to".
The $subtract is what actually returns the "difference", since when you "subtract" one date from another the difference is returned as the milliseconds value between the two. This is how BSON Dates are actually internally stored, and it's the common internal date storage of date formats being the "milliseconds since epoch".
If you want the interval in a specific duration such as "years", then it's a simple matter of applying the "date math" to change from the milliseconds difference between the date values. So adjust by dividing out from the interval ( also showing $arrayElemAt on the "from" just for completeness ):
Model.aggregate([
{ "$addFields": {
"difference": {
"$floor": {
"$divide": [
{ "$subtract": [
{ "$ifNull": [{ "$arrayElemAt": ["$experiences.to", -1] }, new Date()] },
{ "$arrayElemAt": ["$experiences.from", 0] }
]},
1000 * 60 * 60 * 24 * 365
]
}
}
}}
])
That uses $divide as a math operator and 1000 milliseconds 60 for each of seconds and minutes, 24 hours and 365 days as the divisor value. The $floor "rounds down" the number from decimal places. You can do whatever you want there, but it "should" be used "inline" and not in separate stages, which simply add to processing overhead.
Of course, the presumption of 365 days is an "approximation" at best. If you want something more complete, then you can instead apply the date aggregation operators to the values to get a more accurate reading. So here, also applying $let to declare as "variables" for later manipulation:
Model.aggregate([
{ "$addFields": {
"difference": {
"$let": {
"vars": {
"to": { "$ifNull": [{ "$arrayElemAt": ["$experiences.to", -1] }, new Date()] },
"from": { "$arrayElemAt": ["$experiences.from", 0] }
},
"in": {
"years": {
"$subtract": [
{ "$subtract": [
{ "$year": "$$to" },
{ "$year": "$$from" }
]},
{ "$cond": {
"if": { "$gt": [{ "$month": "$$to" },{ "$month": "$$from" }] },
"then": 0,
"else": 1
}}
]
},
"months": {
"$add": [
{ "$subtract": [
{ "$month": "$$to" },
{ "$month": "$$from" }
]},
{ "$cond": {
"if": { "$gt": [{ "$month": "$$to" },{ "$month": "$$from" }] },
"then": 0,
"else": 12
}}
]
},
"days": {
"$add": [
{ "$subtract": [
{ "$dayOfYear": "$$to" },
{ "$dayOfYear": "$$from" }
]},
{ "$cond": {
"if": { "$gt": [{ "$month": "$$to" },{ "$month": "$$from" }] },
"then": 0,
"else": 365
}}
]
}
}
}
}
}}
])
Again that's a slight approximation on the days of the year. MongoDB 3.6 actually would allow you to test the "leap year" by implementing $dateFromParts to determine if 29th February was valid in the current year or not by assembling from the "pieces" we have available.
Work with returned data
Of course all the above is using the aggregation framework to determine the intervals from the array for each person. This would be the advised course if you were intending to "reduce" the data returned by essentially not returning the array items at all, or if you wanted these numbers for further aggregation in reporting to a larger "sum" or "average" statistic from the data.
If on the other hand you actually do want all the data returned for the person including the complete "experiences" array, then it's probably the best course of action to simply apply the calculations "after" all the data is returned from the server as you process each item returned.
The simple application of this would be to "merge" a new field into the results, just like $addFields does but on the "client" side instead:
Model.find().lean().cursor().map( doc =>
Object.assign(doc, {
"difference":
((doc.experiences.map(e => e.to).indexOf(null) === -1)
? Math.max.apply(null, doc.experiences.map(e => e.to))
: new Date() )
- Math.min.apply(null, doc.experiences.map(e => e.from)
})
).toArray((err, result) => {
// do something with result
})
That's just applying the same logic represented in the first aggregation example to a "client" side processing of the result cursor. Since you are using mongoose, the .cursor() method actually returns us a Cursor object from the underlying driver, of which mongoose normally hides away for "convenience". Here we want it because it gives us access to some handy methods.
The Cursor.map() is one such handy method which allows use to apply a "transform" on the content returned from the server. Here we use Object.assign() to "merge" a new property to the returned document. We could alternately use Array.map() on the "array" returned by mongoose by "default", but processing inline looks a little cleaner, as well as being a bit more efficient.
In fact Array.map() is the main tool here in manipulation since where we applied statements like "$experiences.to" in the aggregation statement, we apply on the "client" using doc.experiences.map(e => e.to), which does the same thing "transforming" the array of objects into an "array of values" for the specified field instead.
This allows the same checking using Array.indexOf() against the array of values, and also the Math.min() and Math.max() are used in the same way, implementing apply() to use those "mapped" array values as the argument values to the functions.
Finally of course since we still have a Cursor being returned, we convert this back into the more typical form you would work with mongoose results as an "array" using Cursor.toArray(), which is exactly what mongoose does "under the hood" for you on it's default requests.
The Query.lean() us a mongoose modifier which basically says to return and expect "plain JavaScript Objects" as opposed to "mongoose documents" matched to the schema with applied methods that are again the default return. We want that because we are "manipulating" the result. Again the alternate is to do the manipulation "after" the default array is returned, and convert via .toObject() which is present on all mongoose documents, in the event that "serializing virtual properties" is important to you.
So this is essentially a "mirror" of that first aggregation approach, yet applied to "client side" logic instead. As stated, it generally makes more sense to do it this way when you actually want ALL of the properties in the document in results anyway. The simple reason being that it makes no real since to add "additional" data to the results returned "before" you return those from the server. So instead, simply apply the transform "after" the database returns them.
Also much like above, the same client transformation approaches can be applied as was demonstrated in ALL the aggregation examples. You can even employ external libraries for date manipulation which give you "helpers" for some of the "raw math" approaches here.
you can achieve this with the aggregation framework like this:
db.collection.aggregate([
{
$unwind:"$experiences"
},
{
$sort:{
"experiences.from":1
}
},
{
$group:{
_id:null,
"from":{
$first:"$experiences.from"
},
"to":{
$last:{
$ifNull:[
"$to",
new Date()
]
}
}
}
},
{
$project:{
"diff":{
$subtract:[
"$to",
"$from"
]
}
}
}
])
This returns:
{ "_id" : null, "diff" : NumberLong("65357827142") }
Which is the difference in ms between the two dates, see $subtract for details
You can get the year by adding this additional stage to the end of the pipeline:
{
$project:{
"year":{
$floor:{
$divide:[
"$diff",
1000*60*60*24*365
]
}
}
}
}
This would then return:
{ "_id" : null, "year" : 2 }

Aggregate totals by Keys in a document

I have a mapreduce function I want to write in mongoDB to count how many times a character has been played with. The relevant part from my json looks like this:
"playerInfo": {
"Player 1": {
"info":{
"characterId":17
}
},
"Player 2": {
"info":{
"characterId":20
}
}
}
I want to count how many times every "characterId" persists in my documents, there are 10 players, from player 1 to player 10.
Two questions:
1. How do I use mapreduce in mongo when I have a number as a part of my key.
2. How do I concatinate string in mapreduce so the code that is shown lower can be correct?
db.LoL.mapReduce( function()
{
for (var i in this.playerInfo)
{
emit(this.playerInfo.'Player '+(i).info.characterId, 1);
}
},
function(keys, values) {
return Array.sum(values)
}, {out: { merge: "map_reduce_example5" } } )
Thank you very much for your answers!
So there are really a couple of things wrong with the structure here and you really "should" change it
The mapReduce is pretty simple since you can just iterate the key names via Object.keys()
db.LoL.mapReduce(
function() {
Object.keys(this.playerInfo).forEach(function(key) {
emit({ "player": key, "characterId": this.playerInfo[key].info.characterId }, 1)
})
},
function(values) { return Array.sum(values) },
{
"query": { "playerInfo": { "$exists": true } }
"out": { "inline": 1 }
}
)
If you instead change the data format to use an array, and properties with values instead of named keys:
{
"playerInfo": [
{ "player": "Player 1", "characterId": 17 },
{ "player": "Player 2", "characterId": 20 }
]
}
Then the .aggregate() method is much faster in processing this, and returns a cursor for large result sets:
db.collection.aggregate([
{ "$unwind": "$playerInfo" },
{ "$group": {
"_id": "$playerInfo",
"count": { "$sum": 1 }
}}
])
With MongoDB 3.4 and greater you can even use on your present structure
db.LoL.aggregate([
{ "$project": {
"playerInfo": { "$objectToArray": "$playerInfo" }
}},
{ "$unwind": "$playerInfo" },
{ "$group": {
"_id": {
"player": "$playerInfo.k",
"characterId": "$playerInfo.v.info.characterId"
},
"count": { "$sum": 1 }
}}
])
Which is basically the same as the mapReduce, only a lot faster due to the native operators used as opposed to JavaScript evaluation, which runs much slower.

Using mongo aggregate in meteor to total unlocked/locked in collection

So, I have one collection that stores docs related to a user with a structure like:
{_id: "hofjew233332j4", userId: "fhewojfw34324", achievementUnlocked: true };
What I want to do is use the aggregate and underscore to be able to group the docs by user id and then calculate what percentage of their records have unlocked set to true such that a resulting doc would look like:
{_id: "fhewojfw34324(userId)", unlockPercentage: 40 (achievementUnlocked: true / all docs) }
Would I be able to do this while only retrieving the documents once?
First group by count achievementUnlocked true count and used after that in project to calculate percentage as using below aggregation:
db.collectionName.aggregate([{
"$group": {
"_id": "$userId",
"achievementUnlockedTrueCount": {
"$sum": {
"$cond": {
"if": {
"$eq": ["$achievementUnlocked", true] //count achievementUnlocked = true count
},
"then": 1,
"else": 0
}
}
},
"totalCount": {
"$sum": 1 // get total count of grouped documents
}
}
}, {
"$project": {
"unlockPercentage": {
"$multiply": [{
"$divide": ["$achievementUnlockedTrueCount", "$totalCount"] //used this in project to caculate %
}, 100]
}
}
}]).pretty()
I would personally not even bother with aggregation here as the data would seem trivial. It would be far more efficient to maintain an array of "locked" and "unlocked" achievements per user and/or "game" data.
Take a document like this:
{
"_id": "hofjew233332j4",
"userId": "fhewojfw34324",
"gameId": "v3XWHHvFSHwYxxk6H",
"achievementsCount": 5,
"locked": ["One","Two","Four","Five"],
"lockedCount": 4,
"unlocked": ["Three"]
"unlockedCount": 1
}
So you would initialize per user and "game" here with all "locked" achievements normally, but in this case we will show one already within "unlocked". Also note the "count" fields reflect the number of entries present in each array.
To "unlock" another achievement, then you would simply perform an update to remove from the "locked" array and insert into the "unlocked" array, all while maintaining the "count" values:
Achievements.update(
{
"userId": "fhewojfw34324",
"gameId": "v3XWHHvFSHwYxxk6H",
"locked": "Four",
"unlocked": { "$ne": "Four" }
},
{
"$push": { "unlocked": "Four" },
"$pull": { "locked": "Four" },
"$inc": {
"lockedCount": -1,
"unlockedCount": 1
}
}
)
Which alters the document to this state:
{
"_id": "hofjew233332j4",
"userId": "fhewojfw34324",
"gameId": "v3XWHHvFSHwYxxk6H",
"achievementsCount": 5,
"locked": ["One","Two","Five"],
"lockedCount": 3,
"unlocked": ["Three","Four"]
"unlockedCount": 2
}
It's a very simple pattern to follow as each update maintains the correct values and data. If you wanted information such as a "percentage" then this is a simple matter of:
Achievements.aggregate([
{ "$project": {
"userId": 1,
"gameId": 1,
"percentUnlocked": { "$divide": [ "$unlockedCount", "$achivementsCount" ] }
])
Or just apply that math in client code instead.
This model also makes "real" aggregations you might want to do a lot simplier, and with a lot more scope to information. Plus, it's much more efficient to calculate as you go, than to require something to "add up" the data as a separate process.

Concatenate string values in array in a single field in MongoDB

Suppose that I have a series of documents with the following format:
{
"_id": "3_0",
"values": ["1", "2"]
}
and I would like to obtain a projection of the array's values concatenated in a single field:
{
"_id": "3_0",
"values": "1_2"
}
Is this possible? I have tried $concat but I guess I can't use $values as the array for $concat.
In Modern MongoDB releases you can. You still cannot "directly" apply an array to $concat, however you can use $reduce to work with the array elements and produce this:
db.collection.aggregate([
{ "$addFields": {
"values": {
"$reduce": {
"input": "$values",
"initialValue": "",
"in": {
"$cond": {
"if": { "$eq": [ { "$indexOfArray": [ "$values", "$$this" ] }, 0 ] },
"then": { "$concat": [ "$$value", "$$this" ] },
"else": { "$concat": [ "$$value", "_", "$$this" ] }
}
}
}
}
}}
])
Combining of course with $indexOfArray in order to not "concatenate" with the "_" underscore when it is the "first" index of the array.
Also my additional "wish" has been answered with $sum:
db.collection.aggregate([
{ "$addFields": {
"total": { "$sum": "$items.value" }
}}
])
This kind of gets raised a bit in general with aggregation operators that take an array of items. The distinction here is that it means an "array" of "aguments" provided in the coded representation a opposed to an "array element" present in the current document.
The only way you can really do the kind of concatenation of items within an array present in the document is to do some kind of JavaScript option, as with this example in mapReduce:
db.collection.mapReduce(
function() {
emit( this._id, { "values": this.values.join("_") } );
},
function() {},
{ "out": { "inline": 1 } }
)
Of course if you are not actually aggregating anything, then possibly the best approach is to simply do that "join" operation within your client code in post processing your query results. But if it needs to be used in some purpose across documents then mapReduce is going to be the only place you can use it.
I could add that "for example" I would love for something like this to work:
{
"items": [
{ "product": "A", "value": 1 },
{ "product": "B", "value": 2 },
{ "product": "C", "value": 3 }
]
}
And in aggregate:
db.collection.aggregate([
{ "$project": {
"total": { "$add": [
{ "$map": {
"input": "$items",
"as": "i",
"in": "$$i.value"
}}
]}
}}
])
But it does not work that way because $add expects arguments as opposed to an array from the document. Sigh! :(. Part of the "by design" reasoning for this could be argued that "just because" it is an array or "list" of singular values being passed in from the result of the transformation it is not "guaranteed" that those are actually "valid" singular numeric type values that the operator expects. At least not at the current implemented methods of "type checking".
That means for now we still have to do this:
db.collection.aggregate([
{ "$unwind": "$items" },
{ "$group": {
"_id": "$_id",
"total": { "$sum": "$items.value" }
}}
])
And also sadly there would be no way to apply such a grouping operator to concatenate strings either.
So you can hope for some sort of change on this, or hope for some change that allows an externally scoped variable to be altered within the scope of a $map operation in some way. Better yet a new $join operation would be welcome as well. But these do not exist as of writing, and probably will not for some time to come.
You can use the reduce operator together with the substr operator.
db.collection.aggregate([
{
$project: {
values: {
$reduce: {
input: '$values',
initialValue: '',
in: {
$concat: ['$$value', '_', '$$this']
}
}
}
}
},
{
$project: {
values: { $substr: ['$values', 1 , -1]}
}
}])
Starting in Mongo 4.4, the $function aggregation operator allows applying a custom javascript function to implement behaviour not supported by the MongoDB Query Language.
For instance, in order to concatenate an array of strings:
// { "_id" : "3_0", "values" : [ "1", "2" ] }
db.collection.aggregate(
{ $set:
{ "values":
{ $function: {
body: function(values) { return values.join('_'); },
args: ["$values"],
lang: "js"
}}
}
}
)
// { "_id" : "3_0", "values" : "1_2" }
$function takes 3 parameters:
body, which is the function to apply, whose parameter is the array to join.
args, which contains the fields from the record that the body function takes as parameter. In our case "$values".
lang, which is the language in which the body function is written. Only js is currently available.

Categories