I have a mongoDB collection with documents like the one bellow. I want to cumulatively, over all documents, count how many subdocuments that the event field has, which is not null.
{
name: "name1",
events: {
created: {
timestamp: 1512477520951
},
edited: {
timestamp: 1512638551022
},
deleted: null
}
}
{
name: "name2",
events: {
created: {
timestamp: 1512649915779
},
edited: null,
deleted: null
}
}
So the result of the query on these two documents should return 3, because there are 3 events that is not null in the collection. I can not change the format of the document to have the event field be an array.
You want $objectToArray from MongoDB 3.4.7 or greater in order to do this as an aggregation statement:
db.collection.aggregate([
{ "$group": {
"_id": null,
"total": {
"$sum": {
"$size": {
"$filter": {
"input": {
"$objectToArray": "$events"
},
"cond": { "$ne": [ "$$this.v", null ] }
}
}
}
}
}}
])
That part is needed to look at the "events" object and translate each of the "key/value" pairs into array entries. In this way you can apply the $filter operation in order to remove the null "values" ( the "v" property ) and then use $size in order to count the matching list.
All of that is done under a $group pipeline stage using the $sum accumulator
Or if you don't have a supporting version, you need mapReduce and JavaScript execution in order to to the same "object to array" operation:
db.collection.mapReduce(
function() {
emit(null,
Object.keys(this.events).filter(k => this.events[k] != null).length);
},
function(key,values) {
return Array.sum(values);
},
{ out: { inline: 1 } }
)
That uses the same basic process by obtaining the object keys as an array and rejecting those where the value is found to be null, then obtaining the length of the resulting array.
Because of the JavaScript evaluation, this is much slower than the aggregation framework counterpart. But it's really a question of what server version you have available to support what you need.
Related
Am really new to MongoDB or NoSQL database.
I have this userSchema schema
const postSchema = {
title: String,
posted_on: Date
}
const userSchema = {
name: String,
posts: [postSchema]
}
I want to retrieve the posts by a user in given range(/api/users/:userId/posts?from=date&to=date&limit=limit) using mongodb query. In a relational database, we generally create two different sets of tables and query the second table(posts) using some condition and get the required result.
How can we achieve the same in mongodb? I have tried using $elemMatch by referring this but it doesn't seem to work.
2 ways to do it with aggregation framework, that can do much more than a find can do.
With find we mostly select documents from a collection, or project to keep some fields from a document that is selected, but here you need only some members of an array, so aggregation is used.
Local way (solution at document level) no unwind etc
Test code here
Query
filter the array and keep only posted_on >1 and <4
(i used numbers fro simplicity use dates its the same)
take the first 2 elements of the array (limit 2)
db.collection.aggregate([
{
"$match": {
"name": {
"$eq": "n1"
}
}
},
{
"$set": {
"posts": {
"$slice": [
{
"$filter": {
"input": "$posts",
"cond": {
"$and": [
{
"$gt": [
"$$this.posted_on",
1
]
},
{
"$lt": [
"$$this.posted_on",
5
]
}
]
}
}
},
2
]
}
}
}
])
Uwind solution (solution at collection level)
(its smaller a bit, but keeping things local is better, but in your case it doesn't matter)
Test code here
Query
match user
unwind the array, and make each member to be ROOT
match the dates >1 <4
limit 2
db.collection.aggregate([
{
"$match": {
"name": {
"$eq": "n1"
}
}
},
{
"$unwind": {
"path": "$posts"
}
},
{
"$replaceRoot": {
"newRoot": "$posts"
}
},
{
"$match": {
"$and": [
{
"posted_on": {
"$gt": 1
}
},
{
"posted_on": {
"$lt": 5
}
}
]
}
},
{
"$limit": 2
}
])
Is there a way to get index in aggregate pipeline, I have a result from long aggreagte query
[
{
"_id": "59ed949227ec482044b2671e",
"points": 300,
"fan_detail": [
{
"_id": "59ed949227ec482044b2671e",
"name": "mila ",
"email": "mila#gmail.com ",
"password": "$2a$10$J0.KfwVnZkaimxj/BiqGW.D40qXhvrDA952VV8x.xdefjNADaxnSW",
"username": "mila 0321",
"updated_at": "2017-10-23T07:04:50.004Z",
"created_at": "2017-10-23T07:04:50.004Z",
"celebrity_request_status": 0,
"push_notification": [],
"fan_array": [],
"fanLength": 0,
"celeb_bio": null,
"is_admin": 0,
"is_blocked": 2,
"notification_setting": [
1,
2,
3,
4,
5,
6,
7
],
"total_stars": 0,
"total_points": 134800,
"user_type": 2,
"poster_pic": null,
"profile_pic": "1508742289662.jpg",
"facebook_id": "alistnvU79vcc81PLW9o",
"is_user_active": 1,
"is_username_selected": "false",
"__v": 0
}
]
}
],
so I want to find the index of _id in aggregate query and above array can contain 100s of object in it.
Depending on the available version of MongoDB you have there are different approaches:
$indexOfArray - MongoDB 3.4
The best operator for this is simply $indexOfArray where you have it available. The name says it all really:
Model.aggregate([
{ "$match": { "fan_detail._id": mongoose.Types.ObjectId("59ed949227ec482044b2671e") } },
{ "$addFields": {
"fanIndex": {
"$indexOfArray": [
"$fan_detail._id",
mongoose.Types.ObjectId("59ed949227ec482044b2671e")
]
}
}}
])
$unwind with includeArrayIndex - MongoDB 3.2
Going back a version in releases, you can get the index from the array from the syntax of $unwind. But this does require you to $unwind the array:
Model.aggregate([
{ "$match": { "fan_detail._id": mongoose.Types.ObjectId("59ed949227ec482044b2671e") } },
{ "$unwind": { "path": "$fan_detail", "includeArrayIndex": true } },
{ "$match": { "fan_detail._id": mongoose.Types.ObjectId("59ed949227ec482044b2671e") } }
])
mapReduce - Earlier versions
Earlier versions of MongoDB to 3.2 don't have a way of returning an array index in an aggregation pipeline. So if you want the matched index instead of all the data, then you use mapReduce instead:
Model.mapReduce({
map: function() {
emit(
this._id,
this['fan_detail']
.map( f => f._id.valueOf() )
.indexOf("59ed949227ec482044b2671e")
)
},
reduce: function() {},
query: { "fan_detail._id": mongoose.Types.ObjectId("59ed949227ec482044b2671e") }
})
In all cases we essentially "query" for the presence of the element "somewhere" in the array beforehand. The "indexOf" variants would return -1 where nothing was found otherwise.
Also $addFields is here just for example. If it's your real intent to not return the array of 100's of items, then you're probably using $project or other output anyway.
I want to find all key names from a collection that partially match a certain string.
The closest I got was to check if a certain key exists, but that's an exact match:
db.collection.find({ "fkClientID": { $exists:1 }})
I'd like to get all keys that start with fk instead.
You can do that using mapReduce:
To get just the field names at root level:
db.collection.mapReduce(function () {
Object.keys(this).map(function(key) {
if (key.match(/^fk/)) emit(key, null);
// OR: key.indexOf("fk") === 0
});
}, function(/* key, values */) {
// No need for params or to return anything in the
// reduce, just pass an empty function.
}, { out: { inline: 1 }});
This will output something like this:
{
"results": [{
"_id": "fkKey1",
"value": null
}, {
"_id": "fkKey2",
"value": null
}, {
"_id": "fkKey3",
"value": null
}],
"timeMillis": W,
"counts": {
"input": X,
"emit": Y,
"reduce": Z,
"output": 3
},
"ok" : 1
}
To get field names and any or all (whole doc) its values:
db.test.mapReduce(function () {
var obj = this;
Object.keys(this).map(function(key) {
// With `obj[key]` you will get the value of the field as well.
// You can change `obj[key]` for:
// - `obj` to return the whole document.
// - `obj._id` (or any other field) to return its value.
if (key.match(/^fk/)) emit(key, obj[key]);
});
}, function(key, values) {
// We can't return values or an array directly yet:
return { values: values };
}, { out: { inline: 1 }});
This will output something like this:
{
"results": [{
"_id": "fkKey1",
"value": {
"values": [1, 4, 6]
}
}, {
"_id": "fkKey2",
"value": {
"values": ["foo", "bar"]
}
}],
"timeMillis": W,
"counts": {
"input": X,
"emit": Y,
"reduce": Z,
"output": 2
},
"ok" : 1
}
To get field names in subdocuments (without path):
To do that you will have to use store JavaScript functions on the Server:
db.system.js.save({ _id: "hasChildren", value: function(obj) {
return typeof obj === "object";
}});
db.system.js.save({ _id: "getFields", value: function(doc) {
Object.keys(doc).map(function(key) {
if (key.match(/^fk/)) emit(key, null);
if (hasChildren(doc[key])) getFields(doc[key])
});
}});
And change your map to:
function () {
getFields(this);
}
Now run db.loadServerScripts() to load them.
To get field names in subdocuments (with path):
The previous version will just return field names, not the whole path to get them, which you will need if what you want to do is rename those keys. To get the path:
db.system.js.save({ _id: "getFields", value: function(doc, prefix) {
Object.keys(doc).map(function(key) {
if (key.match(/^fk/)) emit(prefix + key, null);
if (hasChildren(doc[key]))
getFields(doc[key], prefix + key + '.')
});
}});
And change your map to:
function () {
getFields(this, '');
}
To exclude overlapping path matches:
Note that if you have a field fkfoo.fkbar, it will return fkfoo and fkfoo.fkbar. If you don't want overlapping path matches, then:
db.system.js.save({ _id: "getFields", value: function(doc, prefix) {
Object.keys(doc).map(function(key) {
if (hasChildren(doc[key]))
getFields(doc[key], prefix + key + '.')
else if (key.match(/^fk/)) emit(prefix + key, null);
});
}});
Going back to your question, renaming those fields:
With this last option, you get all the paths that include keys that start with fk, so you can use $rename for that.
However, $rename doesn't work for those that contain arrays, so for those you could use forEach to do the update. See MongoDB rename database field within array
Performance note:
MapReduce is not particularly fast thought, so you may want to specify { out: "fk_fields"} to output the results into a new collection called fk_fields and query those results later, but that will depend on your use case.
Possible optimisations for specific cases (consistent schema):
Also, note that if you know that the schema of your documents is always the same, then you just need to check one of them to get its fields, so you can do that adding limit: 1 to the options object or just retrieving one document with findOne and reading its fields in the application level.
If you have the latest MongoDB 3.4.4 then you can use $objectToArray in an aggregate statement with $redact as the the most blazing fast way this can possibly be done with native operators. Not that scanning the collection is "fast". but as fast as you get for this:
db[collname].aggregate([
{ "$redact": {
"$cond": {
"if": {
"$gt": [
{ "$size": { "$filter": {
"input": { "$objectToArray": "$$ROOT" },
"as": "doc",
"cond": {
"$eq": [ { "$substr": [ "$$doc.k", 0, 2 ] }, "fk" ]
}
}}},
0
]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
The presently undocumented $objectToArray translates an "object" into "key" and "value" form in an array. So this:
{ "a": 1, "b": 2 }
Becomes this:
[{ "k": "a", "v": 1 }, { "k": "b", "v": 2 }]
Used with $$ROOT which is a special variable referring to the current document "object", we translate to an array so the values of "k" can be inspected.
Then it's just a matter of applying $filter and using $substr to get the preceding characters of the "key" string.
For the record, this would be the MongoDB 3.4.4 optimal way of obtaining an unique list of the matching keys:
db[collname].aggregate([
{ "$redact": {
"$cond": {
"if": {
"$gt": [
{ "$size": { "$filter": {
"input": { "$objectToArray": "$$ROOT" },
"as": "doc",
"cond": {
"$eq": [ { "$substr": [ "$$doc.k", 0, 2 ] }, "fk" ]
}
}}},
0
]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
{ "$project": {
"j": {
"$filter": {
"input": { "$objectToArray": "$$ROOT" },
"as": "doc",
"cond": {
"$eq": [ { "$substr": [ "$$doc.k", 0, 2 ] }, "fk" ]
}
}
}
}},
{ "$unwind": "$j" },
{ "$group": { "_id": "$j.k" }}
])
That's the safe provision, which is considering that the key may not be present in all documents and that there could possibly be multiple keys in the document.
If you are absolutely certain that you "always" have the key present in the document and that there will only be one, then you can shorten to just $group:
db[colname].aggregate([
{ "$group": {
"_id": {
"$arrayElemAt": [
{ "$map": {
"input": { "$filter": {
"input": { "$objectToArray": "$$ROOT" },
"as": "doc",
"cond": {
"$eq": [ { "$substr": [ "$$doc.k", 0, 2 ] }, "fk" ]
}
}},
"as": "el",
"in": "$$el.k"
}},
0
]
}
}}
])
The most efficient way in earlier versions would be using the $where syntax that allows a JavaScript expression to evaluate. Not that anything that evaluates JavaScript is the "most" efficient thing you can do, but analyzing "keys" as opposed to "data" is not optimal for any data store:
db[collname].find(function() { return Object.keys(this).some( k => /^fk/.test(k) ) })
The inline function there is just shell shorthand and this could also be written as:
db[collname].find({ "$where": "return Object.keys(this).some( k => /^fk/.test(k) )" })
The only requirement for $where is that the expression returns a true value for any document you want to return, so the documents return unaltered.
Suppose that I have a series of documents with the following format:
{
"_id": "3_0",
"values": ["1", "2"]
}
and I would like to obtain a projection of the array's values concatenated in a single field:
{
"_id": "3_0",
"values": "1_2"
}
Is this possible? I have tried $concat but I guess I can't use $values as the array for $concat.
In Modern MongoDB releases you can. You still cannot "directly" apply an array to $concat, however you can use $reduce to work with the array elements and produce this:
db.collection.aggregate([
{ "$addFields": {
"values": {
"$reduce": {
"input": "$values",
"initialValue": "",
"in": {
"$cond": {
"if": { "$eq": [ { "$indexOfArray": [ "$values", "$$this" ] }, 0 ] },
"then": { "$concat": [ "$$value", "$$this" ] },
"else": { "$concat": [ "$$value", "_", "$$this" ] }
}
}
}
}
}}
])
Combining of course with $indexOfArray in order to not "concatenate" with the "_" underscore when it is the "first" index of the array.
Also my additional "wish" has been answered with $sum:
db.collection.aggregate([
{ "$addFields": {
"total": { "$sum": "$items.value" }
}}
])
This kind of gets raised a bit in general with aggregation operators that take an array of items. The distinction here is that it means an "array" of "aguments" provided in the coded representation a opposed to an "array element" present in the current document.
The only way you can really do the kind of concatenation of items within an array present in the document is to do some kind of JavaScript option, as with this example in mapReduce:
db.collection.mapReduce(
function() {
emit( this._id, { "values": this.values.join("_") } );
},
function() {},
{ "out": { "inline": 1 } }
)
Of course if you are not actually aggregating anything, then possibly the best approach is to simply do that "join" operation within your client code in post processing your query results. But if it needs to be used in some purpose across documents then mapReduce is going to be the only place you can use it.
I could add that "for example" I would love for something like this to work:
{
"items": [
{ "product": "A", "value": 1 },
{ "product": "B", "value": 2 },
{ "product": "C", "value": 3 }
]
}
And in aggregate:
db.collection.aggregate([
{ "$project": {
"total": { "$add": [
{ "$map": {
"input": "$items",
"as": "i",
"in": "$$i.value"
}}
]}
}}
])
But it does not work that way because $add expects arguments as opposed to an array from the document. Sigh! :(. Part of the "by design" reasoning for this could be argued that "just because" it is an array or "list" of singular values being passed in from the result of the transformation it is not "guaranteed" that those are actually "valid" singular numeric type values that the operator expects. At least not at the current implemented methods of "type checking".
That means for now we still have to do this:
db.collection.aggregate([
{ "$unwind": "$items" },
{ "$group": {
"_id": "$_id",
"total": { "$sum": "$items.value" }
}}
])
And also sadly there would be no way to apply such a grouping operator to concatenate strings either.
So you can hope for some sort of change on this, or hope for some change that allows an externally scoped variable to be altered within the scope of a $map operation in some way. Better yet a new $join operation would be welcome as well. But these do not exist as of writing, and probably will not for some time to come.
You can use the reduce operator together with the substr operator.
db.collection.aggregate([
{
$project: {
values: {
$reduce: {
input: '$values',
initialValue: '',
in: {
$concat: ['$$value', '_', '$$this']
}
}
}
}
},
{
$project: {
values: { $substr: ['$values', 1 , -1]}
}
}])
Starting in Mongo 4.4, the $function aggregation operator allows applying a custom javascript function to implement behaviour not supported by the MongoDB Query Language.
For instance, in order to concatenate an array of strings:
// { "_id" : "3_0", "values" : [ "1", "2" ] }
db.collection.aggregate(
{ $set:
{ "values":
{ $function: {
body: function(values) { return values.join('_'); },
args: ["$values"],
lang: "js"
}}
}
}
)
// { "_id" : "3_0", "values" : "1_2" }
$function takes 3 parameters:
body, which is the function to apply, whose parameter is the array to join.
args, which contains the fields from the record that the body function takes as parameter. In our case "$values".
lang, which is the language in which the body function is written. Only js is currently available.
I have multiple documents with this schema, each document is per product per day:
{
_id:{},
app_id:'DHJFK67JDSJjdasj909',
date:'2014-08-07',
event_count:32423,
event_count_per_type: {
0:322,
10:4234,
20:653,
30:7562
}
}
I would like to get the sum of each event_type for a particular date_range.
This is the output I am looking for where each event type has been summed across all the documents. The keys for event_count_per_type can be anything, so I need something that can loop through each of them as opposed to be having to be implicit with their names.
{
app_id:'DHJFK67JDSJjdasj909',
event_count:324236456,
event_count_per_type: {
0:34234222,
10:242354,
20:456476,
30:56756
}
}
I have been trying several queries so far, this is the best I have got so far but the sub document values are not summed:
db.events.aggregate(
{
$match: {app_id:'DHJFK67JDSJjdasj909'}
},
{
$group: {
_id: {
app_id:'$app_id',
},
event_count: {$sum:'$event_count'},
event_count_per_type: {$sum:'$event_count_per_type'}
}
},
{
$project: {
_id:0,
app_id:'$_id.app_id',
event_count:1,
event_count_per_type:1
}
}
)
The output I am seeing is a value of 0 for the event_count_per_type key, instead of an object. I could modify the schema so the keys are on the top level of the document but that will still mean that I need to have an entry in the group statement for each key, which as I do not know what the key names will be I cannot do.
Any help would be appreciated, I am willing to change my schema if need be and also to try mapReduce (although from the documentation it seems like the performance is bad.)
As stated, processing documents like this is not possible with the aggregation framework unless you are actually going to supply all of the keys, such as:
db.events.aggregate([
{ "$group": {
"_id": "$app_id",
"event_count": { "$sum": "$event_count" },
"0": { "$sum": "$event_count_per_type.0" },
"10": { "$sum": "$event_count_per_type.10" }
"20": { "$sum": "$event_count_per_type.20" }
"30": { "$sum": "$event_count_per_type.30" }
}}
])
But you do of course have to explicitly specify every key you wish to work on. This is true of both the aggregation framework and general query operations in MongoDB, as to access elements notated in this "sub-document" form you need to specify the "exact path" to the element in order to do anything with it.
The aggregation framework and general queries have no concept of "traversal", which mean they cannot process "each key" of a document. That requires a language construct in order to do which is not provided in these interfaces.
Generally speaking though, using a "key name" as a data point where it's name actually represents a "value" is a bit of an "anti-pattern". A better way to model this would be to use an array and represent your "type" as a value by itself:
{
"app_id": "DHJFK67JDSJjdasj909",
"date: ISODate("2014-08-07T00:00:00.000Z"),
"event_count": 32423,
"events": [
{ "type": 0, "value": 322 },
{ "type": 10, "value": 4234 },
{ "type": 20, "value": 653 },
{ "type": 30, "value": 7562 }
]
}
Also noting that the "date" is now a proper date object rather than a string, which is also something that is good practice to do. This sort of data though is easy to process with the aggregation framework:
db.events.aggregate([
{ "$unwind": "$events" },
{ "$group": {
"_id": {
"app_id": "$app_id",
"type": "$events.type"
},
"event_count": { "$sum": "$event_count" },
"value": { "$sum": "$value" }
}},
{ "$group": {
"_id": "$_id.app_id",
"event_count": { "$sum": "$event_count" },
"events": { "$push": { "type": "$_id.type", "value": "$value" } }
}}
])
That shows a two stage grouping that first gets the totals per "type" without specifying each "key" since you no longer have to, then returns as a single document per "app_id" with the results in an array as they were originally stored. This data form is generally much more flexible for looking at certain "types" or even the "values" within a certain range.
Where you cannot change the structure then your only option is mapReduce. This allows you to "code" the traversal of the keys, but since this requires JavaScript interpretation and execution it is not as fast as the aggregation framework:
db.events.mapReduce(
function() {
emit(
this.app_id,
{
"event_count": this.event_count,
"event_count_per_type": this.event_count_per_type
}
);
},
function(key,values) {
var reduced = { "event_count": 0, "event_count_per_type": {} };
values.forEach(function(value) {
for ( var k in value.event_count_per_type ) {
if ( !redcuced.event_count_per_type.hasOwnProperty(k) )
reduced.event_count_per_type[k] = 0;
reduced.event_count_per_type += value.event_count_per_type;
}
reduced.event_count += value.event_count;
})
},
{
"out": { "inline": 1 }
}
)
That will essentially traverse and combine the "keys" and sum up the values for each one found.
So you options are either:
Change the structure and work with standard queries and aggregation.
Stay with the structure and require JavaScript processing and mapReduce.
It depends on your actual needs, but in most cases restructuring yields benefits.