how to $bucket only for unique document based on some field - javascript

I am using a Mongo aggregated framework, suppose if I am having collection structure like this
{
{
_id: ObjectId(123)
name: john,
sessionDuration: 29
},
{
_id: ObjectId(456)
name: moore,
sessionDuration: 45
},
{
_id: ObjectId(789)
name: john,
sessionDuration: 25
},
{
_id: ObjectId(910)
name: john,
sessionDuration: 45
},
etc...
}
user with the same name is the one who is using different sessions like in the following example: John is using service from three device with 3 sessions durations are: 2 less than 30 (29,25) and 1 less than 50(45).
I want to do a bucket query for boundaries [0,30,50] but in the range it must only count the user with a unique names, no same name user with less than 30 or 50 duration count more than one, means the result should look like this
{
time: Unique_Name_Users_Only_Lies_In_This_Boundary,
‘30’: 1,
‘50’: 2,
}
so john was having 2 sessions less than 30 duration so we only need 1 from these two.
What I tried:
I group all the docs first with unique name only, then apply bucket. but this approach will also skip the john with 45 sessionDuration.
How can I only get the unique name document count in a particular duration of $bucket boundary?

One option is to use the $bucket with $addToSet and then use $group with $arrayToObject to get your formatting:
db.collection.aggregate([
{$bucket: {
groupBy: "$sessionDuration",
boundaries: [0, 30, 50],
default: "Other",
output: {res: {$addToSet: "$name"}}
}},
{$group: {
_id: 0,
res: {$push: {k: {$toString: "$_id"}, v: {$size: "$res"}}}
}},
{$replaceRoot: {newRoot: {$arrayToObject: "$res"}}}
])
See how it works on the playground example
Notice that the _id of a bucket is its lower boundary. You can manipulate this if you really want, but I don't recommend it

Related

how to get array of fields in mongo aggregate

I am using a Mongo aggregated framework, suppose if I am having collection structure like this
{
{
_id: ObjectId(123)
name: john,
sessionDuration: 29
},
{
_id: ObjectId(456)
name: moore,
sessionDuration: 45
},
{
_id: ObjectId(789)
name: cary,
sessionDuration: 25
},
}
I want to query and create a pipeline such that it return something like this:
{
durationsArr: [29, 49, 25, '$sessionDuration_Field_From_Document' ];
}
I am doing this because I want to get average of durations from all the documents, so first adding all of it into an array, then I will add last stage where I do the $avg operation.
Any idea of how can I get the array of sessionDurationField. or do you have any other best approach to calculate the sessionDuration Average from the collection? please thoroughly explain am new to mongo aggregation.
$group - Group all documents.
1.1. $avg - Calculate the average of sessionDuration for all documents.
db.collection.aggregate([
{
$group: {
_id: null,
avgSessionDuration: {
$avg: "$sessionDuration"
}
}
}
])
Demo # Mongo Playground

How to remove duplicates based on a condition in Mongodb?

{
"_id" : ObjectId("5d3acf79ea99ef80dca9bcca"),
"memberId" : "123",
"generatedId" : "00000d2f-9922-457a-be23-731f5fefeb14",
"memberType" : "premium"
},
{
"_id" : ObjectId("5e01554cea99eff7f98d7eed"),
"memberId" : "123",
"generatedId" : "34jkd2092sdlk02kl23kl2309k2309kr",
"memberType" : "premium"
}
I have 1 million docs like this format and how can i remove duplicated docs based on "memberId".
I need to be remove the duplicated docs where the "generatedId" value do not contain "-". In this example it should be deleted the bottom doc since it does not contains "-" in the "generatedId" value.
Can someone share any idea how to do this.
Well, there can be a strategy, but still, it depends on your data a lot.
Let's say you take your docs. Group them by their Id's for counting (duplicates), and then from the duplicates separate out all those entries where generatedId does not contain hyphens "-". When you get these docs which are duplicates and also does not contain - in their generatedId, you can delete them.
const result = await Collection.aggregate([
{
$project: {
_id: 1, // keep the _id field where it is anyway
doc: "$$ROOT", // store the entire document in the "doc" field
},
},
{
$group: {
_id: "$doc.memberId", // group by the documents by memeberId
count: { $sum: 1 }, // count the number of documents in this group
generatedId: { $first: "$doc.generatedId" }, // for keeping these values to be passed to other stages
memberType: { $first: "$doc.memberType" }, // for keeping these values to be passed to other stages
},
},
{
$match: {
count: { $gt: 1 }, // only show what's duplicated because it'll have count greater than 1
// It'll match all those documents not having - in them
generatedId: { $regex: /^((?!-).)*$/g } / g,
},
},
]);
Now in the result, you'll have docs which were memberId duplicates and does not have - in their generatedId. You can query them for deletion.
Warning:
Depending on your data it's possible certain duplicated memberId does not have '-' at all in their generatedIds, so you might delete all docs.
Always take backup before performing operations that might behave uncertain way.
db.collection.aggregate([
{
// first match all records with having - in generatedId
"$match" : { "generatedId" : { "$regex": "[-]"} } },
// then group them
{
"$group": {
"_id": "$memberId",
}}
])

How to find next N elements from a cursor with MongoDB, without _id and on a sorted cursor

Let's say I have three person documents in a MongoDB, inserted in a random order.
{
"firstName": "Hulda",
"lastName": "Lamb",
},
{
"firstName": "Austin",
"lastName": "Todd",
},
{
"firstName": "John",
"lastName": "Doe",
}
My goal is to obtain, let's say, the next person after Austin when the list is in alphabetical order. So I would like to get the person with firstName = Hulda.
We can assume that I know Austin's _id.
My first attempt was to rely on the fact that _id is incremental, but it won't work because the persons can be added in any order in the database. Hulda's _id field has a value less than Austin's. I cannot do something like {_id: {$gt: <Austin's _id here>}};
And I also need to limit the number of returned elements, so N is a dynamic value.
Here is the code I have now, but as I mentioned, the ID trick is not working.
let cursor: any = this.db.collection(collectionName).find({_id: {$gt:
cursor = cursor.sort({firstName: 1});
cursor = cursor.limit(limit);
return cursor.toArray();
Some clarifications:
startId is a valid, existing _id of an object
limit is a variable holding an positive integer value
sorting and limit works as expected, just the selection of the next elements is wrong, so the {_id: {$gt: startId}}; messes up the selection.
Every MongoDB's Aggregation Framework operation's context is restricted to a single document. There's no mechanism like window functions in SQL. Your only way is to use $group to get an array which contains all your documents and then get Austin's index to be able to apply $slice:
db.collection.aggregate([
{
$sort: { firstName: 1 }
},
{
$group: {
_id: null,
docs: { $push: "$$ROOT" }
}
},
{
$project: {
nextNPeople: {
$slice: [ "$docs", { $add: [ { $indexOfArray: [ "$docs.firstName", "Austin" ] }, 1 ] }, 1 ]
}
}
},
{ $unwind: "$nextNPeople" },
{
$replaceRoot: {
newRoot: "$nextNPeople"
}
}
])
Mongo Playground
Depending on your data size / MongoDB performance, above solution may or may not be acceptable - it's up to you to decide if you want to deploy such code on production since $group operation can be pretty heavy.

Is it possible to find random documents in collection, without same fields? (monogdb\node.js)

For example, I have a collection users with the following structure:
{
_id: 1,
name: "John",
from: "Amsterdam"
},
{
_id: 2,
name: "John",
from: "Boston"
},
{
_id: 3,
name: "Mia",
from: "Paris"
},
{
_id: 4,
name: "Kate",
from: "London"
},
{
_id: 5,
name: "Kate",
from: "Moscow"
}
How can I get 3 random documents in which names will not be repeated?
Using the function getFourNumbers(1, 5), I get array with 3 non-repeating numbers and search by _id
var random_nums = getThreeNumbersnumbers(1, 5); // [2,3,1]
users.find({_id: {$in: random_nums}, function (err, data) {...} //[John, Mia, John]
But it can consist two Johns or two Kates, what is unwanted behavior. How can I get three random documents ( [John, Mia, Kate]. Not [John, Kate, Kate] or [John, Mia, John]) with 1 or maximum 2 queries? Kate or John (duplicated names) should be random, but should not be repeated.
There you go - see the comments in the code for further explanation of what the stages do:
users.aggregate(
[
{ // eliminate duplicates based on "name" field and keep track of first document of each group
$group: {
"_id": "$name",
"doc": { $first: "$$ROOT" }
}
},
{
// restore the original document structure
$replaceRoot: {
newRoot: "$doc"
}
},
{
// select 3 random documents from the result
$sample: {
size:3
}
}
])
As always with the aggrgation framework you can run the query with more or less stages added in order to see the transformations step by step.
I think what you are looking for is the $group aggregator, which will give you the distinct value of the collection. It can be used as:
db.users.aggregate( [ { $group : { name : "$name" } } ] );
MongoDB docs: Retrieve Distinct Values

How could I aggregate among multiple collections at once with loop

Suppose Each collection has these common fields birthday, gender
How could I get the grouped birthday with year by group
expected result
group r01 {id: 1987, count: 21121}, {id: 1988, count: 22}, ...
The output should count for user_vip_r01 and user_general_r01
group r15 {id: 1986, count: 2121}, {id: 1985, count: 220}, ...
The output should count for user_vip_r15 and user_general_r15
I know how to write the group by year query,
But don't know how to write an loop to iterate all my collection with javascript.
And what if the collection name is part of irregular,
Something like user_old_r01, user_new_r01, user_bad_r01, should all be processed in group r01,
Is it possible to use regex to get it ?
group by year
pipeline = [
{ '$group':
'_id': '$year': '$birthday'
'count': '$sum': 1
}
{ '$sort': '_id': 1 }
]
cur = db[source_collection].runCommand('aggregate',
pipeline: pipeline_work.concat([{ '$out': output_collection_name}])
allowDiskUse: true)
collection list
"user_vip_r01",
"user_vip_r15",
"user_vip_r16",
"user_vip_r17",
"user_vip_r18",
"user_vip_r19",
"user_vip_r20",
"user_vip_r201",
....
"user_general_r01",
"user_general_r15",
"user_general_r16",
"user_general_r17",
"user_general_r18",
"user_general_r19",
"user_general_r20",
"user_general_r201",
...

Categories