How to exclude a document if two fields are the same?

How to exclude a document if two fields are the same? - javascript

After performing some aggregation magic, I have arrived at this data:
{ "_id" : "5700edfe03fcdb000347bebb", "size" : 3, "count" : 2 }
{ "_id" : "5700edfe03fcdb000347bebf", "size" : 2, "count" : 2 }
Now, I want to eliminate all the entries where size is equal to count.
So I ran this aggregation instruction:
match3 = { "$match" : { "size" : { "$ne" : "count"} } }
But it doesn't eliminate anything and returns the two lines as it is.
I want the result to be just this one line as it is the only one where size is not equal to count:
{ "_id" : "5700edfe03fcdb000347bebb", "size" : 3, "count" : 2 }

You need to add a $redact stage to your aggregation pipeline:
{ "$redact": {
"$cond": [
{ "$eq": [ "$size", "$count" ] },
"$$PRUNE",
"$$KEEP"
]
}}

You can use the $where operator for this
db.collection.find({ $where: "this.size != this.count" })
db.collection.remove({ $where: "this.size != this.count" })
UPDATE:
After I got downvoted I decided to compare the 2 solutions.
Both use a COLLSCAN and both return the same results.
So please enlighten me what is so wrong about my solution? :)

Related

Expand a variable in a MongoDB aggregation pipeline

In a Typesctipt code, I would like to use a varible value in an aggregation pipeline in MongoDB; the problem is that the "keyToCheck" field is a variable that is set by the Typescript code and, therefore, can change based by many conditions.
Is there a way to expand the variable "keyToCheck"?
I have tried $$keyToCheck, $keyToCheck with no result (compilation errors).
Thanks.
...
const pipeline = [
{
$match: {
[this.countryOriginFieldName!]: {
$in: members
},
**keyToCheck**: {
$nin: dictionaryNotAbsoluteFieldList
}
}
},
...
UPDATE: try with this example:
var keyToCheck = "indicator";
var queryMatch = {"`$${keyToCheck}`": "US$millions"}
printjson(queryMatch);
db.getCollection("temp_collection").aggregate([
{
$match: queryMatch
},
{$project: {indicator: 1, value: 1}}
]
);
db.getCollection("temp_collection").insertMany([
{
"indicator" : "US$millions",
"value" : 1.0
},
{
"indicator" : "US$millions",
"value" : 2.0
},
{
"indicator" : "EUROmillions",
"value" : 3
}
]);
Desired output:
{
"indicator" : "US$millions",
"value" : 1.0
}
{
"indicator" : "US$millions",
"value" : 2.0
}

Query
the [keyToCheck] is to take the value of the variable, its not an array
here its assumed that you want to project also the keyToCheck, and not always project the indicator
var keyToCheck = "indicator";
db.getCollection("temp_collection").aggregate([
{
$match: {[keyToCheck]: "US$millions"}
},
{$project: {[keyToCheck]: 1, value: 1}}
]
);
This will work, key will be just a string,and in project also just a string.
You dont need $ or $$ with this query.

mongodb project count number of sizes objects with condition

Hello I have a posts collection in mongodb where there's an authors field and when I run the following command:
db.posts.aggregate( [ {$project:{ size: {$size: {$ifNull:["$authors", []] }}}} ] )
I get a result like that:
{ "_id" : ObjectId("58c917fe48ad625ee8f49714"), "size" : 30 }
{ "_id" : ObjectId("58c91b83895efc5f0f67ba1a"), "size" : 0 }
{ "_id" : ObjectId("58c91cfd2971c05f310fccb8"), "size" : 30 }
{ "_id" : ObjectId("58c91eb7a826965f85571656"), "size" : 30 }
{ "_id" : ObjectId("58c921a1cb2bc85fa77e593a"), "size" : 30 }
How can I count the number of times when size is not equal to 0?
So in that case the result would be 4.
I tried "db.posts.aggregate( [ {$project:{ size: {$size: {$not:{"$authors": 0} }}}} ] )
" with no success...

You want to use $match and $count aggregation methods, something like the below. See more MongoDB Aggregate $count
db.posts.aggregate(
[ {
$match:{ size: {$gt: 0}}
}, {
$count:'total'
} ] )
This should return something like:
{ "total" : 4 }

Merging MongoDB fields of documents into one document

I'm using MongoDB 2.6.6
I have these documents in a MongoDB collection and here is an example:
{ ..., "field3" : { "one" : [ ISODate("2014-03-18T05:47:33Z"),ISODate("2014-06-02T20:00:25Z") ] }, ...}
{ ..., "field3" : { "two" : [ ISODate("2014-03-18T05:47:33Z"),ISODate("2014-06-02T20:00:25Z") ] }, ...}
{ ..., "field3" : { "three" : [ ISODate("2014-03-18T05:47:39Z"),ISODate("2014-03-19T20:18:38Z") ] }, ... }
I would like the merge these documents in one field. For an example, I would like the new result to be as follows:
{ "field3", : { "all" : [ ISODate("2014-03-18T05:47:39Z"),ISODate("2014-03-19T20:18:38Z"),...... ] },}
I'm just not sure any more how to have that result!

Doesn't really leave much to go on here but you can arguably get the kind of merged result with mapReduce:
db.collection.mapReduce(
function() {
var field = this.field3;
Object.keys(field).forEach(function(key) {
field[key].forEach(function(date) {
emit( "field3", { "all": [date] } )
});
});
},
function (key,values) {
var result = { "all": [] };
values.forEach(function(value) {
value.all.forEach(function(date) {
result.all.push( date );
});
});
result.all.sort(function(a,b) { return a.valueOf()-b.valueOf() });
return result;
},
{ "out": { "inline": 1 } }
)
Which being mapReduce is not exactly in the same output format given it's own restrictions for doing things:
{
"results" : [
{
"_id" : "field3",
"value" : {
"all" : [
ISODate("2014-03-18T05:47:33Z"),
ISODate("2014-03-18T05:47:33Z"),
ISODate("2014-03-18T05:47:39Z"),
ISODate("2014-03-19T20:18:38Z"),
ISODate("2014-06-02T20:00:25Z"),
ISODate("2014-06-02T20:00:25Z")
]
}
}
],
"timeMillis" : 86,
"counts" : {
"input" : 3,
"emit" : 6,
"reduce" : 1,
"output" : 1
},
"ok" : 1
}
Since the aggregation here into a single document is fairly arbitrary you could pretty much argue that you simply take the same kind of approach in client code.
At any rate this is only going to be useful over a relatively small set of data with next to the same sort of restrictions on the client processing. More than the 16MB BSON limit for MongoDB, but certainly limited by memory to be consumed.
So I presume you would need to add a "query" argument but it's not really clear from your question. Either using mapReduce or your client code, you are basically going to need to follow this sort of process to "mash" the arrays together.
I would personally go with the client code here.

reducing the output of the mongodb by taking latest documents

Well, this is my collection
{
"company" : "500010"
"eqtcorp" : {
"306113" : {
"DATE" : "2014-05-05 16:43:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "16:43"
},
"306118" : {
"DATE" : "2014-05-08 16:43:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "18:43"
},
"306114" : {
"DATE" : "2014-06-02 16:43:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "20:43"
}
"306116" : {
"DATE" : "2014-03-02 12:30:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "20:43"
}
"306115" : {
"DATE" : "2014-08-02 04:45:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "20:43"
}
"306117" : {
"DATE" : "2014-07-02 10:16:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "20:43"
}
.
.
.
.
.
}
}
if I query like
db.collection_name.find({"company": "500010"})
I will get the whole. Since there are many subdocuments under "eqtcorp", I need only 3 subdocuments with latest date. Simply need a reverse sort on the basis of the "DATE" field in every subdocuments under "eqtcorp" and take the first 3. It is realy a challenge since I am new with Mongodb and mapreduce.
What I am expecting as output is
{
"company" : "500010"
"eqtcorp" : {
"306113" : {
"DATE" : "2014-05-05 16:43:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "16:43"
},
"306118" : {
"DATE" : "2014-05-08 16:43:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "18:43"
},
"306116" : {
"DATE" : "2014-03-02 12:30:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "20:43"
}
}
}
Any shoot?

There are several things not really helping you here which essentially complicates what is an otherwise a simple operation.
You have dates that are actually strings which you really should change to be proper BSON date types. It will help you later where you will likely want it so you should change them. Fortunately they are at least lexically scoped by order of "YYYY-MM-DD" so they will sort, but just don't expect much other use out of them.
You also really should be using an array rather than nesting sub-documents by keys. These are really hard to query as you need to specify the exact path to the element. As such you are almost always restricted to JavaScript processing, which is much slower than the alternatives. I'll cover that later, but moving on:
You can approach this with mapReduce is as follows:
db.collection.mapReduce(
function () {
for ( var k in this.eqtcorp ) {
this.eqtcorp[k].key = k;
emit( 1, this.eqtcorp[k] );
}
},
function (key,values) {
var reduced = {};
values.sort(function(a,b) {
return (( a.DATE > b.DATE ) ? -1 : (( a.DATE < b.DATE ) ? 1 : 0));
}).slice(-3).forEach(function(doc) {
reduced[doc.key] = doc;
});
return reduced;
},
{
"query": { "company": "50010" },
"finalize": function(key,vaue) {
for (var k in value) {
delete value[k].key;
}
return value;
},
"out": { "inline": 1 },
})
)
In the mapper I am currently using an emitted key as 1. The reason for this is so the statement would work to "aggregate" all results across multiple documents. But if you really only wanted to do this per your "company" value then you can emit that as the key instead, as in:
emit( this.company, this.eqtcorp[k] );
Essentially the mapper is breaking apart each document to just output each sub-key of "eqtcorp" as it's own document. These are then passed on to the reducer.
The reducer, which can be invoked multiple times, takes it's input array of "values" for the same "key" and processes them first with a sort on that array. Once sorted (in ascending order) you then slice the last three items off the array and adds each of them to the reduced result.
As I say, the reducer can be invoked several times, so each pass does not necessarily get the "whole" list of values per grouping key. This is the essential part of the "reduce" phase in that it "incrementally" takes each input set and returns, eventually running the combination of results that have been reduced until there is only one "key" value that contains just the three results you want.
Then there is just the finalize function which cleans up some of the convenience housekeeping that was done to simplify the processing of the result by it's original sub-document keys. Other things are just the selection query and the choice of output, which depending on your needs may be to another collection. Or of course you can omit the selection query to process all documents.
As stated earlier the document structure does not help and would be better suited to arrays. So you should rather have a document like this:
{
"company" : "500010",
"eqtcorp" : [
{
"key": "306113"
"DATE" : "2014-05-05 16:43:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "16:43"
},
{
"key": "306118",
"DATE" : "2014-05-08 16:43:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "18:43"
},
{
"key": "306114",
"DATE" : "2014-06-02 16:43:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "20:43"
},
{
"key:"306116",
"DATE" : "2014-03-02 12:30:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "20:43"
},
{
"key": "306115",
"DATE" : "2014-08-02 04:45:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "20:43"
},
{
"key": "306117",
"DATE" : "2014-07-02 10:16:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "20:43"
}
]
}
Though leaving the date formats alone for now, this makes things much cleaner as you can simplify processing and indeed use things like the aggregation framework for much faster processing if you were intending to say "find the top three values" across the entire collection. This would be as simple as:
db.collection.aggregate([
// Unwind the array
{ "$unwind": "$eqtcorp" },
// Sort the results by the dates
{ "$sort": "eqtcorp.DATE" -1 },
// Limit the top three results
{ "$limit": 3 },
// Optionally group back as an array
{ "$group": {
"_id": null,
"eqtcorp": { "$push": "$eqtcorp" }
}}
])
That would be for the whole collection, getting the top three per company value is not impossible but a bit more involved as there is no equivalent of slice
db.collection.aggregate([
// Unwind the array
{ "$unwind": "$eqtcorp" },
// Sort the results by company and date
{ "$sort": "company": 1, "eqtcorp.DATE" -1 },
// Group back keeping the top value
{ "$group": {
"_id": "$company",
"all": { "$push": "$eqtcorp" },
"one": { "$first": "$eqtcorp" }
}},
// Unwind again
{ "$unwind": "$all" },
// match the "seen" value
{ "$project": {
"all": 1,
"one": 1,
"seen": {
"$eq": [ "$all", "$one" ]
}
}},
// Filter out "seen"
{ "$match": { "seen": false } },
// Group back keeping the new top
{ "$group": {
"_id": "$_id",
"all": { "$push": "$all },
"one": { "$first": "$one" },
"two": { "$first": "$all }
}},
// Unwind again
{ "$unwind": "$all" },
// Match the seen value
{ "$project": {
"all": 1,
"one": 1,
"two": 1,
"seen": {
"$eq": [ "$all", "$two" ]
}
}},
// Filter the seen value
{ "$match": { "seen": false } },
// Group back again
{ "$group": {
"_id": "$_id",
"one": { "$first": "$one" },
"two": { "$first": "$two },
"three": { "$first": "$three" }
}}
])
Or modify the map reduce above at the mapper since we are really only artificially producing the array:
function () {
this.eqtcorp.forEach(doc) {
emit( this.company, doc );
});
}
Which still makes sense to split this up when combining keys
Of course if there is no actual aggregation going on between documents and you basic intent is to just get the last three values of the array in each document, then the clear approach is to "sort" them as documents are updated and items are added to the array. So your approach to add new items becomes:
db.collection.update(
{ _id: document_id },
{
"$push": {
"eqtcorp": {
"$each": [ { new document }, { optionally more} ],
"$sort": { "DATE": 1 }
}
}
}
);
Prior to MongoDB 2.6 this also requires a $slice modifier which would basically impose an upper limit on the number of items in the array but this is no longer required. With earlier versions you might have to supply an upper limit value to this, such as 500 or other number larger than your expected results unless you actually wanted to "prune" results in which case set your limits.
The point being here that without any aggregation going on then when you just want the last three values of that array from a document then you just do this with projection and the $slice operator that is available there:
db.collection.find({},{ "eqtcorp": { "$slice": -3 } })
As the array items in the document are already sorted you just get the last three values, and your done.
So really, while you can process your existing document using mapReduce, unless you really want to aggregate results it is a much slower process. Changing the data to be arrays and maintaining the sort order will immediately get you the results you want with a very simple query that is fast.
Even if your intention is aggregation, then the options available to you when using arrays are much wider and it is generally easier to do more complex things.

if the subdocument "eqtcorp" is stored as array as mentioned below
{
"name" : "306113", // assigned it to a node to create an array
"DATE" : "2014-05-05 16:43:00.000",
"subsection_name" : "CORPORATE NEWS",
"time" : "16:43"
}
To update a single document
db.collection_name.update(
{ company : "500010"},
{ $push : {
eqtcorp : {
$each: [ ],
$sort : { "DATE" : -1},
$slice : 3
}
}
})
To update all the documents
db.collection_name.update(
{}, // query all documents
{
$push : {
eqtcorp : {
$each: [ ],
$sort : { "DATE" : -1},
$slice : 3
}
}
},
false,
true // update multiple documents
)

Simplest Query would be Sort the subdocument array based on date ,and Use Slice operator to get required data via projection
db.collection_name.find({"company": "500010"},{ "eqtcorp": { "$slice": -3 } }).sort({"eqtcorp.DATE":-1})

How to extract an array of fields from an array of JSON documents?

I have 2 mongodb collections, stu_creds and stu_profile. I first want to retrieve all the student records from stu_creds where stu_pref_contact is the email and then for those stu_ids I want to retrieve the complete profile from stu_profile. The problem is, the first query returns an array of JSON documents, with each document holding one field, the stu_id. Here is my query and the result:
db.stu_creds.find({"stu_pref_contact" : "email"}, {'_id' : 1})
Result:
[{ "_id" : ObjectId("51927cc93080baac04000001") },
{ "_id" : ObjectId("51927d7b3080baac04000002") },
{ "_id" : ObjectId("519bb011c5c5035b2a000002") },
{ "_id" : ObjectId("519ce3d09f047a192b000010") },
{ "_id" : ObjectId("519e6dc0f919cfdc66000003") },
{ "_id" : ObjectId("51b39be0c74f0e3d23000012") },
{ "_id" : ObjectId("51b39ca9c74f0e3d23000014") },
{ "_id" : ObjectId("51b39cb7c74f0e3d23000016") },
{ "_id" : ObjectId("51b39e87c74f0e3d23000018") },
{ "_id" : ObjectId("51b39f2fc74f0e3d2300001a") },
{ "_id" : ObjectId("51b39f47c74f0e3d2300001c") },
{ "_id" : ObjectId("518d454deb1e3a525e000009") },
{ "_id" : ObjectId("51bc8381dd10286e5b000002") },
{ "_id" : ObjectId("51bc83f7dd10286e5b000004") },
{ "_id" : ObjectId("51bc85cbdd10286e5b000006") },
{ "_id" : ObjectId("51bc8630dd10286e5b000008") },
{ "_id" : ObjectId("51bc8991dd10286e5b00000a") },
{ "_id" : ObjectId("51bc8a43dd10286e5b00000c") },
{ "_id" : ObjectId("51bc8a7ddd10286e5b00000e") },
{ "_id" : ObjectId("51bc8acadd10286e5b000010") }]
The thing is, I don't think I can use the above array as part of an $in clause for my second query to retrieve the student profiles. I have to walk through the array and and create a new array which is just an array of object ids rather than an array of JSON docs.
Is there an easier way to do this?

Use Array.map (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/map). This allows you to perform a transform on each element of the array, returning you a new array of the transformed items.
var arrayOfIds = result.map(function(item){ return item._id; });
Array.map was introduced in ECMAScript 5. If you're using node.js, a modern browser, or an Array polyfill, it should be available to use.

Ummm, am I missing something or is all you want the following:
var results = [];
for(var i = 0; i < yourArray.length; i++) {
results.push(yourArray[i]._id);
}

You could use $or:
db.stu_profile.find({ $or : results }) // `results` is your list of ObjectId's
But it's considerably slower than $in, so I would suggest using one of the other answers ;)

We Keep Coding

JavaScript is the programming language of the Web.

How to exclude a document if two fields are the same? - javascript

You need to add a $redact stage to your aggregation pipeline: { "$redact": { "$cond": [ { "$eq": [ "$size", "$count" ] }, "$$PRUNE", "$$KEEP" ] }}

Related

Expand a variable in a MongoDB aggregation pipeline

mongodb project count number of sizes objects with condition

Merging MongoDB fields of documents into one document

reducing the output of the mongodb by taking latest documents

How to extract an array of fields from an array of JSON documents?

Categories

Resources