Well, this is my collection
{
"company" : "500010"
"eqtcorp" : {
"306113" : {
"DATE" : "2014-05-05 16:43:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "16:43"
},
"306118" : {
"DATE" : "2014-05-08 16:43:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "18:43"
},
"306114" : {
"DATE" : "2014-06-02 16:43:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "20:43"
}
"306116" : {
"DATE" : "2014-03-02 12:30:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "20:43"
}
"306115" : {
"DATE" : "2014-08-02 04:45:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "20:43"
}
"306117" : {
"DATE" : "2014-07-02 10:16:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "20:43"
}
.
.
.
.
.
}
}
if I query like
db.collection_name.find({"company": "500010"})
I will get the whole. Since there are many subdocuments under "eqtcorp", I need only 3 subdocuments with latest date. Simply need a reverse sort on the basis of the "DATE" field in every subdocuments under "eqtcorp" and take the first 3. It is realy a challenge since I am new with Mongodb and mapreduce.
What I am expecting as output is
{
"company" : "500010"
"eqtcorp" : {
"306113" : {
"DATE" : "2014-05-05 16:43:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "16:43"
},
"306118" : {
"DATE" : "2014-05-08 16:43:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "18:43"
},
"306116" : {
"DATE" : "2014-03-02 12:30:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "20:43"
}
}
}
Any shoot?
There are several things not really helping you here which essentially complicates what is an otherwise a simple operation.
You have dates that are actually strings which you really should change to be proper BSON date types. It will help you later where you will likely want it so you should change them. Fortunately they are at least lexically scoped by order of "YYYY-MM-DD" so they will sort, but just don't expect much other use out of them.
You also really should be using an array rather than nesting sub-documents by keys. These are really hard to query as you need to specify the exact path to the element. As such you are almost always restricted to JavaScript processing, which is much slower than the alternatives. I'll cover that later, but moving on:
You can approach this with mapReduce is as follows:
db.collection.mapReduce(
function () {
for ( var k in this.eqtcorp ) {
this.eqtcorp[k].key = k;
emit( 1, this.eqtcorp[k] );
}
},
function (key,values) {
var reduced = {};
values.sort(function(a,b) {
return (( a.DATE > b.DATE ) ? -1 : (( a.DATE < b.DATE ) ? 1 : 0));
}).slice(-3).forEach(function(doc) {
reduced[doc.key] = doc;
});
return reduced;
},
{
"query": { "company": "50010" },
"finalize": function(key,vaue) {
for (var k in value) {
delete value[k].key;
}
return value;
},
"out": { "inline": 1 },
})
)
In the mapper I am currently using an emitted key as 1. The reason for this is so the statement would work to "aggregate" all results across multiple documents. But if you really only wanted to do this per your "company" value then you can emit that as the key instead, as in:
emit( this.company, this.eqtcorp[k] );
Essentially the mapper is breaking apart each document to just output each sub-key of "eqtcorp" as it's own document. These are then passed on to the reducer.
The reducer, which can be invoked multiple times, takes it's input array of "values" for the same "key" and processes them first with a sort on that array. Once sorted (in ascending order) you then slice the last three items off the array and adds each of them to the reduced result.
As I say, the reducer can be invoked several times, so each pass does not necessarily get the "whole" list of values per grouping key. This is the essential part of the "reduce" phase in that it "incrementally" takes each input set and returns, eventually running the combination of results that have been reduced until there is only one "key" value that contains just the three results you want.
Then there is just the finalize function which cleans up some of the convenience housekeeping that was done to simplify the processing of the result by it's original sub-document keys. Other things are just the selection query and the choice of output, which depending on your needs may be to another collection. Or of course you can omit the selection query to process all documents.
As stated earlier the document structure does not help and would be better suited to arrays. So you should rather have a document like this:
{
"company" : "500010",
"eqtcorp" : [
{
"key": "306113"
"DATE" : "2014-05-05 16:43:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "16:43"
},
{
"key": "306118",
"DATE" : "2014-05-08 16:43:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "18:43"
},
{
"key": "306114",
"DATE" : "2014-06-02 16:43:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "20:43"
},
{
"key:"306116",
"DATE" : "2014-03-02 12:30:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "20:43"
},
{
"key": "306115",
"DATE" : "2014-08-02 04:45:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "20:43"
},
{
"key": "306117",
"DATE" : "2014-07-02 10:16:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "20:43"
}
]
}
Though leaving the date formats alone for now, this makes things much cleaner as you can simplify processing and indeed use things like the aggregation framework for much faster processing if you were intending to say "find the top three values" across the entire collection. This would be as simple as:
db.collection.aggregate([
// Unwind the array
{ "$unwind": "$eqtcorp" },
// Sort the results by the dates
{ "$sort": "eqtcorp.DATE" -1 },
// Limit the top three results
{ "$limit": 3 },
// Optionally group back as an array
{ "$group": {
"_id": null,
"eqtcorp": { "$push": "$eqtcorp" }
}}
])
That would be for the whole collection, getting the top three per company value is not impossible but a bit more involved as there is no equivalent of slice
db.collection.aggregate([
// Unwind the array
{ "$unwind": "$eqtcorp" },
// Sort the results by company and date
{ "$sort": "company": 1, "eqtcorp.DATE" -1 },
// Group back keeping the top value
{ "$group": {
"_id": "$company",
"all": { "$push": "$eqtcorp" },
"one": { "$first": "$eqtcorp" }
}},
// Unwind again
{ "$unwind": "$all" },
// match the "seen" value
{ "$project": {
"all": 1,
"one": 1,
"seen": {
"$eq": [ "$all", "$one" ]
}
}},
// Filter out "seen"
{ "$match": { "seen": false } },
// Group back keeping the new top
{ "$group": {
"_id": "$_id",
"all": { "$push": "$all },
"one": { "$first": "$one" },
"two": { "$first": "$all }
}},
// Unwind again
{ "$unwind": "$all" },
// Match the seen value
{ "$project": {
"all": 1,
"one": 1,
"two": 1,
"seen": {
"$eq": [ "$all", "$two" ]
}
}},
// Filter the seen value
{ "$match": { "seen": false } },
// Group back again
{ "$group": {
"_id": "$_id",
"one": { "$first": "$one" },
"two": { "$first": "$two },
"three": { "$first": "$three" }
}}
])
Or modify the map reduce above at the mapper since we are really only artificially producing the array:
function () {
this.eqtcorp.forEach(doc) {
emit( this.company, doc );
});
}
Which still makes sense to split this up when combining keys
Of course if there is no actual aggregation going on between documents and you basic intent is to just get the last three values of the array in each document, then the clear approach is to "sort" them as documents are updated and items are added to the array. So your approach to add new items becomes:
db.collection.update(
{ _id: document_id },
{
"$push": {
"eqtcorp": {
"$each": [ { new document }, { optionally more} ],
"$sort": { "DATE": 1 }
}
}
}
);
Prior to MongoDB 2.6 this also requires a $slice modifier which would basically impose an upper limit on the number of items in the array but this is no longer required. With earlier versions you might have to supply an upper limit value to this, such as 500 or other number larger than your expected results unless you actually wanted to "prune" results in which case set your limits.
The point being here that without any aggregation going on then when you just want the last three values of that array from a document then you just do this with projection and the $slice operator that is available there:
db.collection.find({},{ "eqtcorp": { "$slice": -3 } })
As the array items in the document are already sorted you just get the last three values, and your done.
So really, while you can process your existing document using mapReduce, unless you really want to aggregate results it is a much slower process. Changing the data to be arrays and maintaining the sort order will immediately get you the results you want with a very simple query that is fast.
Even if your intention is aggregation, then the options available to you when using arrays are much wider and it is generally easier to do more complex things.
if the subdocument "eqtcorp" is stored as array as mentioned below
{
"name" : "306113", // assigned it to a node to create an array
"DATE" : "2014-05-05 16:43:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "16:43"
}
To update a single document
db.collection_name.update(
{ company : "500010"},
{ $push : {
eqtcorp : {
$each: [ ],
$sort : { "DATE" : -1},
$slice : 3
}
}
})
To update all the documents
db.collection_name.update(
{}, // query all documents
{
$push : {
eqtcorp : {
$each: [ ],
$sort : { "DATE" : -1},
$slice : 3
}
}
},
false,
true // update multiple documents
)
Simplest Query would be Sort the subdocument array based on date ,and Use Slice operator to get required data via projection
db.collection_name.find({"company": "500010"},{ "eqtcorp": { "$slice": -3 } }).sort({"eqtcorp.DATE":-1})
Related
I have a document structure something along the lines of the following:
{
"_id" : "777",
"someKey" : "someValue",
"someArray" : [
{
"name" : "name1",
"someNestedArray" : [
{
"name" : "value"
},
{
"name" : "delete me"
}
]
}
]
}
I want to delete the nested array element with the value "delete me".
I know I can find documents which match this description using nested $elemMatch expressions. What is the query syntax for removing the element in question?
To delete the item in question you're actually going to use an update. More specifically you're going to do an update with the $pull command which will remove the item from the array.
db.temp.update(
{ _id : "777" },
{$pull : {"someArray.0.someNestedArray" : {"name":"delete me"}}}
)
There's a little bit of "magic" happening here. Using .0 indicates that we know that we are modifying the 0th item of someArray. Using {"name":"delete me"} indicates that we know the exact data that we plan to remove.
This process works just fine if you load the data into a client and then perform the update. This process works less well if you want to do "generic" queries that perform these operations.
I think it's easiest to simply recognize that updating arrays of sub-documents generally requires that you have the original in memory at some point.
In response to the first comment below, you can probably help your situation by changing the data structure a little
"someObjects" : {
"name1": {
"someNestedArray" : [
{
"name" : "value"
},
{
"name" : "delete me"
}
]
}
}
Now you can do {$pull : { "someObjects.name1.someNestedArray" : ...
Here's the problem with your structure. MongoDB does not have very good support for manipulating "sub-arrays". Your structure has an array of objects and those objects contain arrays of more objects.
If you have the following structure, you are going to have a difficult time using things like $pull:
array [
{ subarray : array [] },
{ subarray : array [] },
]
If your structure looks like that and you want to update subarray you have two options:
Change your structure so that you can leverage $pull.
Don't use $pull. Load the entire object into a client and use findAndModify.
MongoDB 3.6 added $[] operator that facilitates updates to arrays that contain embedded documents. So the problem can be solved by:
db.test.update(
{ _id : "777" },
{$pull : {"someArray.$[].someNestedArray" : {"name":"delete me"}}}
)
As #Melkor has commented (should probably be an answer as itself),
If you do not know the index use:
{
_id: TheMainID,
"theArray._id": TheArrayID
},
{
$pull: {
"theArray.$.theNestedArray": {
_id: theNestedArrayID
}
}
}
From MongoDB 3.6 on you can use arrayFilters to do this:
db.test.update(
{ _id: "777" },
{ $pull: { "someArray.$[elem].someNestedArray": { name: "delete me" } } },
{ arrayFilters: [{ "elem.name": "name1"}] }
)
see also https://docs.mongodb.com/manual/reference/operator/update/positional-filtered/index.html#update-all-documents-that-match-arrayfilters-in-an-array
Other example and usage could be like this:
{
"company": {
"location": {
"postalCode": "12345",
"Address": "Address1",
"city": "Frankfurt",
"state": "Hessen",
"country": "Germany"
},
"establishmentDate": "2019-04-29T14:12:37.206Z",
"companyId": "1",
"ceo": "XYZ"
},
"items": [{
"name": "itemA",
"unit": "kg",
"price": "10"
},
{
"name": "itemB",
"unit": "ltr",
"price": "20"
}
]
}
DELETE : Mongodb Query to delete ItemB:
db.getCollection('test').update(
{"company.companyId":"1","company.location.city":"Frankfurt"},
{$pull : {"items" : {"name":"itemB"}}}
)
FIND: Find query for itemB:
db.getCollection('test').find(
{"company.companyId":"1","company.location.city":"Frankfurt","items.name":"itemB"},
{ "items.$": 1 }
)
3.UPDATE : update query for itemB:
db.getCollection('test').update
(
{"company.companyId":"1","company.location.city":"Frankfurt","items.name":"itemB"},
{ $set: { "items.$[].price" : 90 }},
{ multi: true });
I have the following sub-documents:
experiences: [
{
"workExperienceId" : ObjectId("59f8064e68d1f61441bec94a"),
"workType" : "Full Time",
"functionalArea" : "Law",
"company" : "Company A",
"title" : "new",
"from" : ISODate("2010-10-13T00:00:00.000Z"),
"to" : ISODate("2012-10-13T00:00:00.000Z"),
"_id" : ObjectId("59f8064e68d1f61441bec94b"),
"currentlyWorking" : false
},
...
...
{
"workExperienceId" : ObjectId("59f8064e68d1f61441bec94a"),
"workType" : "Full Time",
"functionalArea" : "Law",
"company" : "Company A",
"title" : "new",
"from" : ISODate("2014-10-14T00:00:00.000Z"),
"to" : ISODate("2015-12-13T00:00:00.000Z"),
"_id" : ObjectId("59f8064e68d1f61441bec94c"),
"currentlyWorking" : false
},
{
"workExperienceId" : ObjectId("59f8064e68d1f61441bec94a"),
"workType" : "Full Time",
"functionalArea" : "Law",
"company" : "Company A",
"title" : "new",
"from" : ISODate("2017-10-13T00:00:00.000Z"),
"to" : null,
"_id" : ObjectId("59f8064e68d1f61441bec94d"),
"currentlyWorking" : true
},
{
"workExperienceId" : ObjectId("59f8064e68d1f61441bec94a"),
"workType" : "Full Time",
"functionalArea" : "Law",
"company" : "Company A",
"title" : "new",
"from" : ISODate("2008-10-14T00:00:00.000Z"),
"to" : ISODate("2009-12-13T00:00:00.000Z"),
"_id" : ObjectId("59f8064e68d1f61441bec94c"),
"currentlyWorking" : false
},
]
As you see, there may not be date ordered within sequential date and maybe a non ordered date. Above data is for each user. So what I want is to get total years of experience for each user in year format. When to field is null and currentlyWorking is true then it means that I am currently working on that company.
Aggregation
Using the aggregation framework you could apply $indexOfArray where you have it available:
Model.aggregate([
{ "$addFields": {
"difference": {
"$subtract": [
{ "$cond": [
{ "$eq": [{ "$indexOfArray": ["$experiences.to", null] }, -1] },
{ "$max": "$experiences.to" },
new Date()
]},
{ "$min": "$experiences.from" }
]
}
}}
])
Failing that as long as the "latest" is always the last in the array, using $arrayElemAt:
Model.aggregate([
{ "$addFields": {
"difference": {
"$subtract": [
{ "$cond": [
{ "$eq": [{ "$arrayElemAt": ["$experiences.to", -1] }, null] },
new Date(),
{ "$max": "$experiences.to" }
]},
{ "$min": "$experiences.from" }
]
}
}}
])
That's pretty much the most efficient ways to do this, as a single pipeline stage applying $min and $max operators. For $indexOfArray you would need MongoDB 3.4 at least, and for simply using $arrayElemAt you can have MongoDB 3.2, which is the minimal version you should be running in production environments anyway.
One pass, means it gets done fast with little overhead.
The brief parts are the $min and $max allow you to extract the appropriate values directly from the array elements, being the "smallest" value of "from" and the largest value of "to" within the array. Where available the $indexOfArray operator can return the matched index from a provided array ( in this case from "to" values ) where a specified value ( as null here ) exists. If it's there the index of that value is returned, and where it is not the value of -1 is returned indicating that it is not found.
We use $cond which is a "ternary" or if..then..else operator to determine that when the null is not found then you want the $max value from "to". Of course when it is found this is the else where the value of the current Date which is fed into the aggregation pipeline as an external parameter on execution is returned instead.
The alternate case for a MongoDB 3.2 is that you instead "presume" the last element of your array is the most recent employment history item. In generally would be best practice to order these items so the most recent was either the "last" ( as seems to be indicated in your question ) or the "first" entry of the array. It would be logical to keep these entries in such order as opposed to relying on sorting the list at runtime.
So when using a "known" position such as "last", we can use the $arrayElemAt operator to return the value from the array at the specified position. Here it is -1 for the "last" element. The "first" element would be 0, and could arguably be applied to geting the "smallest" value of "from" as well, since you should have your array in order. Again $cond is used to transpose the values depending on whether null is returned. As an alternate to $max you can even use $ifNull to swap the values instead:
Model.aggregate([
{ "$addFields": {
"difference": {
"$subtract": [
{ "$ifNull": [{ "$arrayElemAt": ["$experiences.to", -1] }, new Date()] },
{ "$min": "$experiences.from" }
]
}
}}
])
That operator essentially switches out the values returned if the response of the first condition is null. So since we are grabbing the value from the "last" element already, we can "presume" that this does mean the "largest" value of "to".
The $subtract is what actually returns the "difference", since when you "subtract" one date from another the difference is returned as the milliseconds value between the two. This is how BSON Dates are actually internally stored, and it's the common internal date storage of date formats being the "milliseconds since epoch".
If you want the interval in a specific duration such as "years", then it's a simple matter of applying the "date math" to change from the milliseconds difference between the date values. So adjust by dividing out from the interval ( also showing $arrayElemAt on the "from" just for completeness ):
Model.aggregate([
{ "$addFields": {
"difference": {
"$floor": {
"$divide": [
{ "$subtract": [
{ "$ifNull": [{ "$arrayElemAt": ["$experiences.to", -1] }, new Date()] },
{ "$arrayElemAt": ["$experiences.from", 0] }
]},
1000 * 60 * 60 * 24 * 365
]
}
}
}}
])
That uses $divide as a math operator and 1000 milliseconds 60 for each of seconds and minutes, 24 hours and 365 days as the divisor value. The $floor "rounds down" the number from decimal places. You can do whatever you want there, but it "should" be used "inline" and not in separate stages, which simply add to processing overhead.
Of course, the presumption of 365 days is an "approximation" at best. If you want something more complete, then you can instead apply the date aggregation operators to the values to get a more accurate reading. So here, also applying $let to declare as "variables" for later manipulation:
Model.aggregate([
{ "$addFields": {
"difference": {
"$let": {
"vars": {
"to": { "$ifNull": [{ "$arrayElemAt": ["$experiences.to", -1] }, new Date()] },
"from": { "$arrayElemAt": ["$experiences.from", 0] }
},
"in": {
"years": {
"$subtract": [
{ "$subtract": [
{ "$year": "$$to" },
{ "$year": "$$from" }
]},
{ "$cond": {
"if": { "$gt": [{ "$month": "$$to" },{ "$month": "$$from" }] },
"then": 0,
"else": 1
}}
]
},
"months": {
"$add": [
{ "$subtract": [
{ "$month": "$$to" },
{ "$month": "$$from" }
]},
{ "$cond": {
"if": { "$gt": [{ "$month": "$$to" },{ "$month": "$$from" }] },
"then": 0,
"else": 12
}}
]
},
"days": {
"$add": [
{ "$subtract": [
{ "$dayOfYear": "$$to" },
{ "$dayOfYear": "$$from" }
]},
{ "$cond": {
"if": { "$gt": [{ "$month": "$$to" },{ "$month": "$$from" }] },
"then": 0,
"else": 365
}}
]
}
}
}
}
}}
])
Again that's a slight approximation on the days of the year. MongoDB 3.6 actually would allow you to test the "leap year" by implementing $dateFromParts to determine if 29th February was valid in the current year or not by assembling from the "pieces" we have available.
Work with returned data
Of course all the above is using the aggregation framework to determine the intervals from the array for each person. This would be the advised course if you were intending to "reduce" the data returned by essentially not returning the array items at all, or if you wanted these numbers for further aggregation in reporting to a larger "sum" or "average" statistic from the data.
If on the other hand you actually do want all the data returned for the person including the complete "experiences" array, then it's probably the best course of action to simply apply the calculations "after" all the data is returned from the server as you process each item returned.
The simple application of this would be to "merge" a new field into the results, just like $addFields does but on the "client" side instead:
Model.find().lean().cursor().map( doc =>
Object.assign(doc, {
"difference":
((doc.experiences.map(e => e.to).indexOf(null) === -1)
? Math.max.apply(null, doc.experiences.map(e => e.to))
: new Date() )
- Math.min.apply(null, doc.experiences.map(e => e.from)
})
).toArray((err, result) => {
// do something with result
})
That's just applying the same logic represented in the first aggregation example to a "client" side processing of the result cursor. Since you are using mongoose, the .cursor() method actually returns us a Cursor object from the underlying driver, of which mongoose normally hides away for "convenience". Here we want it because it gives us access to some handy methods.
The Cursor.map() is one such handy method which allows use to apply a "transform" on the content returned from the server. Here we use Object.assign() to "merge" a new property to the returned document. We could alternately use Array.map() on the "array" returned by mongoose by "default", but processing inline looks a little cleaner, as well as being a bit more efficient.
In fact Array.map() is the main tool here in manipulation since where we applied statements like "$experiences.to" in the aggregation statement, we apply on the "client" using doc.experiences.map(e => e.to), which does the same thing "transforming" the array of objects into an "array of values" for the specified field instead.
This allows the same checking using Array.indexOf() against the array of values, and also the Math.min() and Math.max() are used in the same way, implementing apply() to use those "mapped" array values as the argument values to the functions.
Finally of course since we still have a Cursor being returned, we convert this back into the more typical form you would work with mongoose results as an "array" using Cursor.toArray(), which is exactly what mongoose does "under the hood" for you on it's default requests.
The Query.lean() us a mongoose modifier which basically says to return and expect "plain JavaScript Objects" as opposed to "mongoose documents" matched to the schema with applied methods that are again the default return. We want that because we are "manipulating" the result. Again the alternate is to do the manipulation "after" the default array is returned, and convert via .toObject() which is present on all mongoose documents, in the event that "serializing virtual properties" is important to you.
So this is essentially a "mirror" of that first aggregation approach, yet applied to "client side" logic instead. As stated, it generally makes more sense to do it this way when you actually want ALL of the properties in the document in results anyway. The simple reason being that it makes no real since to add "additional" data to the results returned "before" you return those from the server. So instead, simply apply the transform "after" the database returns them.
Also much like above, the same client transformation approaches can be applied as was demonstrated in ALL the aggregation examples. You can even employ external libraries for date manipulation which give you "helpers" for some of the "raw math" approaches here.
you can achieve this with the aggregation framework like this:
db.collection.aggregate([
{
$unwind:"$experiences"
},
{
$sort:{
"experiences.from":1
}
},
{
$group:{
_id:null,
"from":{
$first:"$experiences.from"
},
"to":{
$last:{
$ifNull:[
"$to",
new Date()
]
}
}
}
},
{
$project:{
"diff":{
$subtract:[
"$to",
"$from"
]
}
}
}
])
This returns:
{ "_id" : null, "diff" : NumberLong("65357827142") }
Which is the difference in ms between the two dates, see $subtract for details
You can get the year by adding this additional stage to the end of the pipeline:
{
$project:{
"year":{
$floor:{
$divide:[
"$diff",
1000*60*60*24*365
]
}
}
}
}
This would then return:
{ "_id" : null, "year" : 2 }
After performing some aggregation magic, I have arrived at this data:
{ "_id" : "5700edfe03fcdb000347bebb", "size" : 3, "count" : 2 }
{ "_id" : "5700edfe03fcdb000347bebf", "size" : 2, "count" : 2 }
Now, I want to eliminate all the entries where size is equal to count.
So I ran this aggregation instruction:
match3 = { "$match" : { "size" : { "$ne" : "count"} } }
But it doesn't eliminate anything and returns the two lines as it is.
I want the result to be just this one line as it is the only one where size is not equal to count:
{ "_id" : "5700edfe03fcdb000347bebb", "size" : 3, "count" : 2 }
You need to add a $redact stage to your aggregation pipeline:
{ "$redact": {
"$cond": [
{ "$eq": [ "$size", "$count" ] },
"$$PRUNE",
"$$KEEP"
]
}}
You can use the $where operator for this
db.collection.find({ $where: "this.size != this.count" })
db.collection.remove({ $where: "this.size != this.count" })
UPDATE:
After I got downvoted I decided to compare the 2 solutions.
Both use a COLLSCAN and both return the same results.
So please enlighten me what is so wrong about my solution? :)
I have following data. I want to get objects from od array based on some condition. Along with that I want to get em and name field as well.
I am not very much familiar with aggregate of mongodb. So I need help to solve my problem.
{
_id : 1,
em : 'abc#12s.net',
name : 'NewName',
od :
[
{
"oid" : ObjectId("1234"),
"ca" : ISODate("2016-05-05T13:20:10.718Z")
},
{
"oid" : ObjectId("2345"),
"ca" : ISODate("2016-05-11T13:20:10.718Z")
},
{
"oid" : ObjectId("57766"),
"ca" : ISODate("2016-05-13T13:20:10.718Z")
}
]
},
{
_id : 2,
em : 'ab6c#xyz.net',
name : 'NewName2',
od :
[
{
"oid" : ObjectId("1234"),
"ca" : ISODate("2016-05-11T13:20:10.718Z")
},
{
"oid" : ObjectId("2345"),
"ca" : ISODate("2016-05-12T13:20:10.718Z")
},
{
"oid" : ObjectId("57766"),
"ca" : ISODate("2016-05-05T13:20:10.718Z")
}
]
}
I have tried using $match, $project and $unwind of aggregate to get the desired result. My query is as given below : -
db.collection.aggregate([
{
$match : {
"od.ca" : {
'$gte': '10/05/2016',
'$lte': '15/05/2016'
}
}
},
{
$project:{
_id: '$_id',
em: 1,
name : 1,
od : 1
}
},
{
$unwind : "$od"
},
{
$match : {
"od.ca" : {
'$gte': '10/05/2016',
'$lte': '15/05/2016'
}
}
}])
The result I got is with em and name and od array with one of the object from od, i.e. there are multiple records for same email id.
{
_id : 1,
em : 'abc#12s.net',
name : 'NewName',
od :
[
{
"oid" : ObjectId("57766"),
"ca" : ISODate("2016-05-13T13:20:10.718Z")
}
]
}
{
_id : 1,
em : 'abc#12s.net',
name : 'NewName',
od :
[
{
"oid" : ObjectId("2345"),
"ca" : ISODate("2016-05-11T13:20:10.718Z")
}
]
}
But What I want is return result will be for each email id, inside od array all the objects matching the condition. One sample out put that I want is :-
{
_id : 1,
em : 'abc#12s.net',
name : 'NewName',
od :
[
{
"oid" : ObjectId("2345"),
"ca" : ISODate("2016-05-11T13:20:10.718Z")
},
{
"oid" : ObjectId("57766"),
"ca" : ISODate("2016-05-13T13:20:10.718Z")
}
]
}
Any thing wrong I am doing in the query? If the query suppose to return like this, how I can get the result I want? Can someone tell me what should I try or what changes in the query can help me getting the result I want?
You don't necessarily need a cohort of those aggregation operators except when your MongoDB version is older than the 2.6.X releases. The $filter operator will do the job just fine.
Consider the following example where the $filter operator when applied in the $project pipeline stage will filter the od array to only include documents that have a ca date greater than or equal to '2016-05-10' and less than or equal to '2016-05-15':
var start = new Date('2016-05-10'),
end = new Date('2016-05-15');
db.collection.aggregate([
{
"$match": {
"od.ca": { "$gte": start, "$lte": end }
}
},
{
"$project": {
"em": 1,
"name": 1,
"od": {
"$filter": {
"input": "$od",
"as": "o",
"cond": {
"$and": [
{ "$gte": [ "$$o.ca", start ] },
{ "$lte": [ "$$o.ca", end ] }
]
}
}
}
}
}
])
Bear in mind this operator is only available for MongoDB versions 3.2.X and newer.
Otherwise, for versions 2.6.X up to 3.0.X, you can combine the use of the $map and $setDifference operators to "filter" the documents in the ca array.
The $map operator basically maps some values evaluated by the $cond operator to a set of either false values or the documents which pass the given condition. The $setDifference operator then returnns the difference of the sets from the previous computation. Check how this pans out with the preceding example:
var start = new Date('2016-05-10'),
end = new Date('2016-05-15');
db.collection.aggregate([
{
"$match": {
"od.ca": { "$gte": start, "$lte": end }
}
},
{
"$project": {
"em": 1,
"name": 1,
"od": {
"$setDifference": [
{
"$map": {
"input": "$od",
"as": "o",
"in": {
"$cond": [
{
"$and": [
{ "$gte": [ "$$o.ca", start ] },
{ "$lte": [ "$$o.ca", end ] }
]
},
"$$o",
false
]
}
}
},
[false]
]
}
}
}
])
Fo versions 2.4.X and older, you may have to use the concotion of $match, $unwind and $group operators to achieve the same where the above operators do not exist.
The preceding example demonstrates this, which is what you were attempting but just left short of a $group pipeline step to group all the flattened documents into the original document schema, albeit minus the filtered array elements:
db.collection.aggregate([
{
"$match": {
"od.ca": { "$gte": start, "$lte": end }
}
},
{ "$unwind": "$od" },
{
"$match": {
"od.ca": { "$gte": start, "$lte": end }
}
},
{
"$group": {
"$_id": "$_id",
"em": { "$first": "$em" },
"name": { "$first": "$name" },
"od": { "$push": "$od" }
}
}
])
Here is my document:
{
"_id" : "2",
"account" : "1234",
"positions" : {
"APPL" : { "quantity" : "13", "direction" : "long" },
"GOOG" : { "quantity" : "24", "direction" : "long" }
}
}
I would like to get the whole positions object, but only the quantity field and ignore the direction field. Is it possible to do that? Or should I consider this other schema (array of objects):
{
"_id" : "2",
"account" : "1234",
"positions" : [
{ "name" : "APPL", "quantity" : "13", "direction" : "long" },
{ "name" : "GOOG", "quantity" : "24", "direction" : "long" }
]
}
Many thanks!
For the "array" form, all you really need to do is specify the field using "dot notation" to the array member:
db.collection.find({}, { "positions.quantity": 1, })
Which would return:
{
"_id" : "2",
"positions" : [
{ "quantity" : "13" },
{ "quantity" : "24" }
]
}
Or for multiple fields but excluding the "direction" just use both in projection:
db.collection.find({},{ "positions.name": 1, "positions.quantity": 1 })
Which returns the named fields still:
{
"_id" : "2",
"positions" : [
{
"name" : "APPL",
"quantity" : "13"
},
{
"name" : "GOOG",
"quantity" : "24"
}
]
}
For the "named keys" form you need to specify each path:
db.collection.find({},{ "positions.APPL.quantity": 1, "positions.GOOG.quantity": 1 })
Which would return of course:
{
"_id" : "2",
"positions" : {
"APPL" : {
"quantity" : "13"
},
"GOOG" : {
"quantity" : "24"
}
}
}
And that kind of "nastiness" is pervasive with basically ALL MongoDB operations, query or projection or otherwise. When you used "named keys" the "database" has no sane option other than to require you to "name the path". Doing that is of course not really a practical exercise, when the names of keys are likely to differ between documents in the collection.
Traversing keys can only really be done in JavaScript evaluation from a MongoDB standpoint. Since JavaScript evaluation requires interpreter cost in launching and translating data from BSON to a workable JavaScript format, and not to mention the actual cost of evaluating the coded expressions themselves, that is not an ideal approach.
Moreover, from a "query" perspective such handling requires the use of $where to evaluate such an expression where you just want to look for things under each "key" of the "positions" data. This is a "bad" thing, since such an expression cannot possible use an "index" to optimize the query search. Only with a "directly named path" can you actually use or even "create" an index under those conditions.
From a "projection" perspective, the usage of "named keys" means that by similar "traversal" concepts, you actually need JavaScript processing again to do so. And the only mechanism in which MongoDB can use a JavaScript expression to "alter" the output document is by using mapReduce so again this is "super horrible" and you would be using this "aggregation method" for nothing more than document manipulation in this case:
db.collection.mapReduce(
function() {
var id = this._id;
delete this._id;
Object.keys(this.positions).forEach(function(el) {
delete el.direction;
});
emit(id,this);
},
function() {}, // reducer never gets called when all are unique
{ "out": { "inline": 1 } }
)
Even after you did that to just avoid naming paths, the output of mapReduce cannot be a "cursor". So this limits you to either the size of a BSON document in response or actually outputting to a "collection" instead. So this is as far from "practical" as you can get.
There are numerous reasons "why" using an array with "common paths" is so much better than a "named keys" structure that are also far too broad to go into here. The one thing you should accept is that "named keys are bad, okay!" and just move forward with consistent object naming that actually makes quite a lot of sense.