Suppose that I have a series of documents with the following format:
{
"_id": "3_0",
"values": ["1", "2"]
}
and I would like to obtain a projection of the array's values concatenated in a single field:
{
"_id": "3_0",
"values": "1_2"
}
Is this possible? I have tried $concat but I guess I can't use $values as the array for $concat.
In Modern MongoDB releases you can. You still cannot "directly" apply an array to $concat, however you can use $reduce to work with the array elements and produce this:
db.collection.aggregate([
{ "$addFields": {
"values": {
"$reduce": {
"input": "$values",
"initialValue": "",
"in": {
"$cond": {
"if": { "$eq": [ { "$indexOfArray": [ "$values", "$$this" ] }, 0 ] },
"then": { "$concat": [ "$$value", "$$this" ] },
"else": { "$concat": [ "$$value", "_", "$$this" ] }
}
}
}
}
}}
])
Combining of course with $indexOfArray in order to not "concatenate" with the "_" underscore when it is the "first" index of the array.
Also my additional "wish" has been answered with $sum:
db.collection.aggregate([
{ "$addFields": {
"total": { "$sum": "$items.value" }
}}
])
This kind of gets raised a bit in general with aggregation operators that take an array of items. The distinction here is that it means an "array" of "aguments" provided in the coded representation a opposed to an "array element" present in the current document.
The only way you can really do the kind of concatenation of items within an array present in the document is to do some kind of JavaScript option, as with this example in mapReduce:
db.collection.mapReduce(
function() {
emit( this._id, { "values": this.values.join("_") } );
},
function() {},
{ "out": { "inline": 1 } }
)
Of course if you are not actually aggregating anything, then possibly the best approach is to simply do that "join" operation within your client code in post processing your query results. But if it needs to be used in some purpose across documents then mapReduce is going to be the only place you can use it.
I could add that "for example" I would love for something like this to work:
{
"items": [
{ "product": "A", "value": 1 },
{ "product": "B", "value": 2 },
{ "product": "C", "value": 3 }
]
}
And in aggregate:
db.collection.aggregate([
{ "$project": {
"total": { "$add": [
{ "$map": {
"input": "$items",
"as": "i",
"in": "$$i.value"
}}
]}
}}
])
But it does not work that way because $add expects arguments as opposed to an array from the document. Sigh! :(. Part of the "by design" reasoning for this could be argued that "just because" it is an array or "list" of singular values being passed in from the result of the transformation it is not "guaranteed" that those are actually "valid" singular numeric type values that the operator expects. At least not at the current implemented methods of "type checking".
That means for now we still have to do this:
db.collection.aggregate([
{ "$unwind": "$items" },
{ "$group": {
"_id": "$_id",
"total": { "$sum": "$items.value" }
}}
])
And also sadly there would be no way to apply such a grouping operator to concatenate strings either.
So you can hope for some sort of change on this, or hope for some change that allows an externally scoped variable to be altered within the scope of a $map operation in some way. Better yet a new $join operation would be welcome as well. But these do not exist as of writing, and probably will not for some time to come.
You can use the reduce operator together with the substr operator.
db.collection.aggregate([
{
$project: {
values: {
$reduce: {
input: '$values',
initialValue: '',
in: {
$concat: ['$$value', '_', '$$this']
}
}
}
}
},
{
$project: {
values: { $substr: ['$values', 1 , -1]}
}
}])
Starting in Mongo 4.4, the $function aggregation operator allows applying a custom javascript function to implement behaviour not supported by the MongoDB Query Language.
For instance, in order to concatenate an array of strings:
// { "_id" : "3_0", "values" : [ "1", "2" ] }
db.collection.aggregate(
{ $set:
{ "values":
{ $function: {
body: function(values) { return values.join('_'); },
args: ["$values"],
lang: "js"
}}
}
}
)
// { "_id" : "3_0", "values" : "1_2" }
$function takes 3 parameters:
body, which is the function to apply, whose parameter is the array to join.
args, which contains the fields from the record that the body function takes as parameter. In our case "$values".
lang, which is the language in which the body function is written. Only js is currently available.
Related
If I have a document that looks like this:
{
"firstName": "John",
"lastName": "Doe",
"favoriteFoods": [{"name": "Cheeseburgers"}, {"name": "Broccoli"}]
}
And I want to create a search expression in NodeJS to return just the element's of favoriteFood name matches req.body.term, how could I implement this? I have tried the code below, but this returns an entire document, which I don't want because I have to filter the array.
User.find({"favoriteFoods.title": {$regex: req.body.term, $options: "i"}})
.then((food) => {
res.status(200).send(food);
})
You can use Array.filter() to match the values.
res.status(200).send(food.favoriteFoods
.filter(food => food.title.match(new RegExp(req.body.term, 'i'))
);
You used name in the example JSON but title in the code, so make sure you're using whichever of those is actually correct
Also, allowing users to specify their own regular expressions could allow for Regex DOS attacks, so be warned of that.
I don't know at all the desired format result so here you have multiple ways to get that:
Using $elemMatch into the projection stage:
db.collection.find({},
{
"favoriteFoods": {
"$elemMatch": {
"name": {
"$regex": "chee",
"$options": "i"
}
}
}
})
But be careful, $elemMatch only return one result. Check this example.
Using $filter into an aggregation stage: This query will return an array called "food" only with objects that matches the regex.
db.collection.aggregate([
{
"$project": {
"food": {
"$filter": {
"input": "$favoriteFoods",
"cond": {
"$regexMatch": {
"input": "$$this.name",
"regex": "chee",
"options": "i"
}
}
}
}
}
}
])
It will return more than one objecs if they match. Example here
Using $unwind and $match: This query uses $unwind which is not the best step you want to use but it is very useful. Using it with $match and $project you can get the result into an object and not an array (keeping in mind that mongo result is always an array, but each object inside will have food property and will not be an array).
db.collection.aggregate([
{
"$unwind": "$favoriteFoods"
},
{
"$match": {
"favoriteFoods.name": {
"$regex": "chee",
"$options": "i"
}
}
},
{
"$project": {
"food": "$favoriteFoods.name"
}
}
])
Example here
Sorry, I didn't get the MongoDB aggregation well.
How can I achieve with an aggregation this:
[
{array: [1,2,3] },
{array: [4,5,6] },
{array: [7,8,9] }
]
desired result:
[1,2,3,4,5,6,7,8,9]
Does the performance change if instead of using MongoDB aggregation I consider documents as normal objects?
Aggregation is always a better option instead of using some language code and that is why database provides such type of relief to get the results in one go.
db.collection.aggregate([
{ "$group": {
"_id": null,
"data": { "$push": "$array" }
}},
{ "$project": {
"_id": 0,
"data": {
"$reduce": {
"input": "$data",
"initialValue": [],
"in": { "$concatArrays": ["$$this", "$$value"] }
}
}
}}
])
The only thing you have to take care here is the size of the returned result for single document should not exceed more 16MB Bson limit. More you can learn from here
You can $group by null to get an array of arrays as a single document and then you can run $reduce with $concatArrays to flatten that array:
db.col.aggregate([
{
$group: {
_id: null,
array: { $push: "$array" }
}
},
{
$project: {
_id: 0,
array: {
$reduce: {
input: "$array",
initialValue: [],
in: { $concatArrays: [ "$$value", "$$this" ] }
}
}
}
}
])
MongoDB Playground
I want to find all key names from a collection that partially match a certain string.
The closest I got was to check if a certain key exists, but that's an exact match:
db.collection.find({ "fkClientID": { $exists:1 }})
I'd like to get all keys that start with fk instead.
You can do that using mapReduce:
To get just the field names at root level:
db.collection.mapReduce(function () {
Object.keys(this).map(function(key) {
if (key.match(/^fk/)) emit(key, null);
// OR: key.indexOf("fk") === 0
});
}, function(/* key, values */) {
// No need for params or to return anything in the
// reduce, just pass an empty function.
}, { out: { inline: 1 }});
This will output something like this:
{
"results": [{
"_id": "fkKey1",
"value": null
}, {
"_id": "fkKey2",
"value": null
}, {
"_id": "fkKey3",
"value": null
}],
"timeMillis": W,
"counts": {
"input": X,
"emit": Y,
"reduce": Z,
"output": 3
},
"ok" : 1
}
To get field names and any or all (whole doc) its values:
db.test.mapReduce(function () {
var obj = this;
Object.keys(this).map(function(key) {
// With `obj[key]` you will get the value of the field as well.
// You can change `obj[key]` for:
// - `obj` to return the whole document.
// - `obj._id` (or any other field) to return its value.
if (key.match(/^fk/)) emit(key, obj[key]);
});
}, function(key, values) {
// We can't return values or an array directly yet:
return { values: values };
}, { out: { inline: 1 }});
This will output something like this:
{
"results": [{
"_id": "fkKey1",
"value": {
"values": [1, 4, 6]
}
}, {
"_id": "fkKey2",
"value": {
"values": ["foo", "bar"]
}
}],
"timeMillis": W,
"counts": {
"input": X,
"emit": Y,
"reduce": Z,
"output": 2
},
"ok" : 1
}
To get field names in subdocuments (without path):
To do that you will have to use store JavaScript functions on the Server:
db.system.js.save({ _id: "hasChildren", value: function(obj) {
return typeof obj === "object";
}});
db.system.js.save({ _id: "getFields", value: function(doc) {
Object.keys(doc).map(function(key) {
if (key.match(/^fk/)) emit(key, null);
if (hasChildren(doc[key])) getFields(doc[key])
});
}});
And change your map to:
function () {
getFields(this);
}
Now run db.loadServerScripts() to load them.
To get field names in subdocuments (with path):
The previous version will just return field names, not the whole path to get them, which you will need if what you want to do is rename those keys. To get the path:
db.system.js.save({ _id: "getFields", value: function(doc, prefix) {
Object.keys(doc).map(function(key) {
if (key.match(/^fk/)) emit(prefix + key, null);
if (hasChildren(doc[key]))
getFields(doc[key], prefix + key + '.')
});
}});
And change your map to:
function () {
getFields(this, '');
}
To exclude overlapping path matches:
Note that if you have a field fkfoo.fkbar, it will return fkfoo and fkfoo.fkbar. If you don't want overlapping path matches, then:
db.system.js.save({ _id: "getFields", value: function(doc, prefix) {
Object.keys(doc).map(function(key) {
if (hasChildren(doc[key]))
getFields(doc[key], prefix + key + '.')
else if (key.match(/^fk/)) emit(prefix + key, null);
});
}});
Going back to your question, renaming those fields:
With this last option, you get all the paths that include keys that start with fk, so you can use $rename for that.
However, $rename doesn't work for those that contain arrays, so for those you could use forEach to do the update. See MongoDB rename database field within array
Performance note:
MapReduce is not particularly fast thought, so you may want to specify { out: "fk_fields"} to output the results into a new collection called fk_fields and query those results later, but that will depend on your use case.
Possible optimisations for specific cases (consistent schema):
Also, note that if you know that the schema of your documents is always the same, then you just need to check one of them to get its fields, so you can do that adding limit: 1 to the options object or just retrieving one document with findOne and reading its fields in the application level.
If you have the latest MongoDB 3.4.4 then you can use $objectToArray in an aggregate statement with $redact as the the most blazing fast way this can possibly be done with native operators. Not that scanning the collection is "fast". but as fast as you get for this:
db[collname].aggregate([
{ "$redact": {
"$cond": {
"if": {
"$gt": [
{ "$size": { "$filter": {
"input": { "$objectToArray": "$$ROOT" },
"as": "doc",
"cond": {
"$eq": [ { "$substr": [ "$$doc.k", 0, 2 ] }, "fk" ]
}
}}},
0
]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
The presently undocumented $objectToArray translates an "object" into "key" and "value" form in an array. So this:
{ "a": 1, "b": 2 }
Becomes this:
[{ "k": "a", "v": 1 }, { "k": "b", "v": 2 }]
Used with $$ROOT which is a special variable referring to the current document "object", we translate to an array so the values of "k" can be inspected.
Then it's just a matter of applying $filter and using $substr to get the preceding characters of the "key" string.
For the record, this would be the MongoDB 3.4.4 optimal way of obtaining an unique list of the matching keys:
db[collname].aggregate([
{ "$redact": {
"$cond": {
"if": {
"$gt": [
{ "$size": { "$filter": {
"input": { "$objectToArray": "$$ROOT" },
"as": "doc",
"cond": {
"$eq": [ { "$substr": [ "$$doc.k", 0, 2 ] }, "fk" ]
}
}}},
0
]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
{ "$project": {
"j": {
"$filter": {
"input": { "$objectToArray": "$$ROOT" },
"as": "doc",
"cond": {
"$eq": [ { "$substr": [ "$$doc.k", 0, 2 ] }, "fk" ]
}
}
}
}},
{ "$unwind": "$j" },
{ "$group": { "_id": "$j.k" }}
])
That's the safe provision, which is considering that the key may not be present in all documents and that there could possibly be multiple keys in the document.
If you are absolutely certain that you "always" have the key present in the document and that there will only be one, then you can shorten to just $group:
db[colname].aggregate([
{ "$group": {
"_id": {
"$arrayElemAt": [
{ "$map": {
"input": { "$filter": {
"input": { "$objectToArray": "$$ROOT" },
"as": "doc",
"cond": {
"$eq": [ { "$substr": [ "$$doc.k", 0, 2 ] }, "fk" ]
}
}},
"as": "el",
"in": "$$el.k"
}},
0
]
}
}}
])
The most efficient way in earlier versions would be using the $where syntax that allows a JavaScript expression to evaluate. Not that anything that evaluates JavaScript is the "most" efficient thing you can do, but analyzing "keys" as opposed to "data" is not optimal for any data store:
db[collname].find(function() { return Object.keys(this).some( k => /^fk/.test(k) ) })
The inline function there is just shell shorthand and this could also be written as:
db[collname].find({ "$where": "return Object.keys(this).some( k => /^fk/.test(k) )" })
The only requirement for $where is that the expression returns a true value for any document you want to return, so the documents return unaltered.
I have a mongodb that has data as
{
"_id": "a",
"reply": "<",
"criterion": "story"
},
{
"_id": "b",
"reply": "<",
"criterion": "story"
},
{
"_id": "c",
"reply": ">",
"criterion": "story"
}
And I want the result as:
{
"criterion": "story",
"result" : {
">" : 1,
"<" : 2
}
}
I want to aggregate on "criterion". So if I do that there will be 1 document. However, I want to count the number of "<" and ">" and write that in the new key as shown in the json above. That is the logic behind this. Could anyone who has a good idea in mongodb help me with this?
You'd need to use the aggregation framework where you would run an aggregation pipeline that has a $group operator pipeline stage which aggregates the documents to create the desired counts using the accumulator operator $sum.
For the desired result, you would need to use a tenary operator like $cond to create the independent count fields since that will feed the number of documents to the $sum expression depending on the name value. The $cond operator can be used effectively to evaluate the counts based on the reply field value. It takes a logical condition as its first argument (if) and then returns the second argument where the evaluation is true (then) or the third argument where false (else). This converts the true/false boolean evaluated returns into 1 and 0 that will feed into $sum respectively:
"$cond": [
{ "$eq": ["$reply", ">"] },
1, 0
]
So, if within the document being processed the "$reply" field has a ">" value, the $cond operator feeds the value 1 to the $sum else it sums a zero value.
Use the $project as your final pipeline step as it allows you to reshape each document in the stream, include, exclude or rename fields, inject computed fields, create sub-document fields, using mathematical expressions, dates, strings and/or logical (comparison, boolean, control) expressions. It is similar to SELECT in SQL.
The following pipeline should return the desired result:
Model.aggregate([
{
"$group": {
"_id": "$criterion",
">": {
"$sum": {
"$cond": [
{ "$eq": [ "$reply", ">" ] },
1, 0
]
}
},
"<": {
"$sum": {
"$cond": [
{ "$eq": [ "$reply", "<" ] },
1, 0
]
}
}
}
},
{
"$project": {
"_id": 0,
"criterion": "$_id",
"result.>": "$>",
"result.<": "$<"
}
}
]).exec(function(err, result) {
console.log(JSON.stringify(result, null, 4));
});
Sample Console Output
{
"criterion" : "story",
"result" : {
">" : 1,
"<" : 2
}
}
Note: This approach takes into consideration the values for the $reply field are fixed and known hence it's not flexible where the values are dynamic and unknown.
For a more flexible alternative which executes much faster than the above, has better performance and also takes into consideration unknown values for the count fields, I would suggest running the pipeline as follows:
Model.aggregate([
{
"$group": {
"_id": {
"criterion": "$criterion",
"reply": "$reply"
},
"count": { "$sum": 1 }
}
},
{
"$group": {
"_id": "$_id.criterion",
"result": {
"$push": {
"reply": "$_id.reply",
"count": "$count"
}
}
}
}
]).exec(function(err, result) {
console.log(JSON.stringify(result, null, 4));
});
Sample Console Output
{
"_id" : "story",
"result" : [
{
"reply" : "<",
"count" : 2
},
{
"reply" : ">",
"count" : 1
}
]
}
I have multiple documents with this schema, each document is per product per day:
{
_id:{},
app_id:'DHJFK67JDSJjdasj909',
date:'2014-08-07',
event_count:32423,
event_count_per_type: {
0:322,
10:4234,
20:653,
30:7562
}
}
I would like to get the sum of each event_type for a particular date_range.
This is the output I am looking for where each event type has been summed across all the documents. The keys for event_count_per_type can be anything, so I need something that can loop through each of them as opposed to be having to be implicit with their names.
{
app_id:'DHJFK67JDSJjdasj909',
event_count:324236456,
event_count_per_type: {
0:34234222,
10:242354,
20:456476,
30:56756
}
}
I have been trying several queries so far, this is the best I have got so far but the sub document values are not summed:
db.events.aggregate(
{
$match: {app_id:'DHJFK67JDSJjdasj909'}
},
{
$group: {
_id: {
app_id:'$app_id',
},
event_count: {$sum:'$event_count'},
event_count_per_type: {$sum:'$event_count_per_type'}
}
},
{
$project: {
_id:0,
app_id:'$_id.app_id',
event_count:1,
event_count_per_type:1
}
}
)
The output I am seeing is a value of 0 for the event_count_per_type key, instead of an object. I could modify the schema so the keys are on the top level of the document but that will still mean that I need to have an entry in the group statement for each key, which as I do not know what the key names will be I cannot do.
Any help would be appreciated, I am willing to change my schema if need be and also to try mapReduce (although from the documentation it seems like the performance is bad.)
As stated, processing documents like this is not possible with the aggregation framework unless you are actually going to supply all of the keys, such as:
db.events.aggregate([
{ "$group": {
"_id": "$app_id",
"event_count": { "$sum": "$event_count" },
"0": { "$sum": "$event_count_per_type.0" },
"10": { "$sum": "$event_count_per_type.10" }
"20": { "$sum": "$event_count_per_type.20" }
"30": { "$sum": "$event_count_per_type.30" }
}}
])
But you do of course have to explicitly specify every key you wish to work on. This is true of both the aggregation framework and general query operations in MongoDB, as to access elements notated in this "sub-document" form you need to specify the "exact path" to the element in order to do anything with it.
The aggregation framework and general queries have no concept of "traversal", which mean they cannot process "each key" of a document. That requires a language construct in order to do which is not provided in these interfaces.
Generally speaking though, using a "key name" as a data point where it's name actually represents a "value" is a bit of an "anti-pattern". A better way to model this would be to use an array and represent your "type" as a value by itself:
{
"app_id": "DHJFK67JDSJjdasj909",
"date: ISODate("2014-08-07T00:00:00.000Z"),
"event_count": 32423,
"events": [
{ "type": 0, "value": 322 },
{ "type": 10, "value": 4234 },
{ "type": 20, "value": 653 },
{ "type": 30, "value": 7562 }
]
}
Also noting that the "date" is now a proper date object rather than a string, which is also something that is good practice to do. This sort of data though is easy to process with the aggregation framework:
db.events.aggregate([
{ "$unwind": "$events" },
{ "$group": {
"_id": {
"app_id": "$app_id",
"type": "$events.type"
},
"event_count": { "$sum": "$event_count" },
"value": { "$sum": "$value" }
}},
{ "$group": {
"_id": "$_id.app_id",
"event_count": { "$sum": "$event_count" },
"events": { "$push": { "type": "$_id.type", "value": "$value" } }
}}
])
That shows a two stage grouping that first gets the totals per "type" without specifying each "key" since you no longer have to, then returns as a single document per "app_id" with the results in an array as they were originally stored. This data form is generally much more flexible for looking at certain "types" or even the "values" within a certain range.
Where you cannot change the structure then your only option is mapReduce. This allows you to "code" the traversal of the keys, but since this requires JavaScript interpretation and execution it is not as fast as the aggregation framework:
db.events.mapReduce(
function() {
emit(
this.app_id,
{
"event_count": this.event_count,
"event_count_per_type": this.event_count_per_type
}
);
},
function(key,values) {
var reduced = { "event_count": 0, "event_count_per_type": {} };
values.forEach(function(value) {
for ( var k in value.event_count_per_type ) {
if ( !redcuced.event_count_per_type.hasOwnProperty(k) )
reduced.event_count_per_type[k] = 0;
reduced.event_count_per_type += value.event_count_per_type;
}
reduced.event_count += value.event_count;
})
},
{
"out": { "inline": 1 }
}
)
That will essentially traverse and combine the "keys" and sum up the values for each one found.
So you options are either:
Change the structure and work with standard queries and aggregation.
Stay with the structure and require JavaScript processing and mapReduce.
It depends on your actual needs, but in most cases restructuring yields benefits.