Mongodb: Get sample of nested array using Aggreagate

Mongodb: Get sample of nested array using Aggreagate - javascript

$slice allows me to get a slice of nested array. I used it successfully like this:
const user = await User.aggregate([
{ $match: { _id: ObjectId(user_id) } },
{
$lookup: {
from: "users",
let: { friends: "$friends" },
pipeline: [
{ $match: { $expr: { $in: ["$_id", "$$friends"] } } },
{
$lookup: {
from: "profiles",
localField: "profile",
foreignField: "_id",
as: "profile",
},
},
{
$match: {
"profile.online": true,
},
},
{
$project: {
name: "$name",
surname: "$surname",
profile: { $arrayElemAt: ["$profile", 0] },
},
},
],
as: "friends",
},
},
{
$addFields: {
friends: {
$slice: ["$friends", skip, limit],
},
},
},
]);
Now, instead of taking a slice, I would like to take a random sample of the array field friends.
I could not find a way to do this. But, in the group stage I can use something like this:
const pipeline = [
{
$lookup: {
from: "profiles",
let: { profiles_id: "$profile" },
pipeline: [
{
$match: {
online: true,
$expr: { $eq: ["$_id", "$$profiles_id"] },
},
},
],
as: "profile",
},
},
{ $unwind: "$profile" },
{ $sample: { size: 10 } },
];
const users = await User.aggregate(pipeline);

Change the last $addFields stage to this.
Pros: It "works."
Cons: You are not guaranteed unique random entries in the list. To get that is a lot more work. If you have a LOT more friends than the range then you are probably OK.
,{$addFields: {friends: {$reduce: { // overwrite friends array...
// $range is the number of things you want to pick:
input: {$range:[0,4]},
initialValue: [],
in: {
$let: {
// qq will be a random # between 0 and size-1 thanks to mult
// and floor, so we do not have to do qq-1 to get to zero-based
// indexing on the $friends array
vars: {qq: {$floor:{$multiply:[{$rand: {}},{$size:"$friends"}]}} },
// $concat only works for strings, but $concatArrays can be used
// (creatively) on other types. Here $slice returns an array of
// 1 item which we easily pass to $concatArrays to build the
// the overall result:
in: {$concatArrays: [ "$$value", {$slice:["$friends","$$qq",1]} ]}
}}
}}
UPDATED
This version exploits keeping state in the $reduce chain and will not pick dupes. It does so by iteratively shrinking the input candidate list of items as each item is randomly chosen. The output is a little nested (i.e. friends is not set to picked random sample but rather to an object containing picks and the remnant aa list) but this is something easily reformatted after the fact. In MongoDB 5.0 we could finish it off with:
{$addFields: {friends: {$getField: {field: "$friends.picks", input: {$reduce: {
but many people are not yet on 5.0.
{$addFields: {friends: {$reduce: {
// $range is the number of things you want to pick:
input: {$range:[0,6]},
// This is classic use of $reduce to iterate over something AND
// preserve state. We start with picks as empty and aa being the
// original friends array:
initialValue: {aa: "$friends", picks: []},
in: {
$let: {
// idx will be a random # between 0 and size-1 thanks to mult
// and floor, so we do not have to do idx-1 to get to zero-based
// indexing on the $friends array. idx and sz will be eval'd
// each time reduce turns the crank through the input range:
vars: {idx: {$floor:{$multiply:[{$rand: {}},{$size:"$$value.aa"}]}},
// cannot set sz and then use it in same vars; oh well
sz: {$size:"$$value.aa"}
},
in: {
// Add to our picks list:
picks: {$concatArrays: [ "$$value.picks", {$slice:["$$value.aa","$$idx",1]} ]},
// And now shrink up the input candidate array.
// Sadly, we cannot do $slice:[array,pos,0] to yield an empty
// array and keep the $concat logic tight; thus we have to test
// for front and end special conditions.
// This whole bit is to extract the chosen item from the aa
// array by splicing together a new one MINUS the target.
// This will change the value of $sz (-1) as we crank thru
// the picks. This ensures we only pick UNPICKED items from
// $$value.aa!
aa: {$cond: [{$eq:["$$idx",0]}, // if
// idx 0: Take from idx 1 and count size - 1:
{$slice:["$$value.aa",1,{$subtract:["$$sz",1]}]}, // then
// idx last: Take from idx 0 and ALSO count size - 1:
{$cond: [ // else
{$eq:["$$idx",{$subtract:["$$sz",1]}]}, // if
{$slice:["$$value.aa",0,{$subtract:["$$sz",1]}]}, // then
// else not 0 or last item, i.e. idx = 3
{$concatArrays: [
// Start at 0, count idx; this will land
// us BEFORE the target item (because idx
// is n-1:
{$slice:["$$value.aa",0,"$$idx"]},
// Jump over the target (+1), and go n-2
// (1 for idx/n conversion, and 1 for the
// fact we jumped over:
{$slice:["$$value.aa",{$add:["$$idx",1]},{$subtract:["$$sz",2]}]}
]}
]}
]}
}
}}
}}
}}
]);
Starting in MongoDB v4.4 (Jan 2021), you may opt to use the $function operator. The splice function in javascript does all the work of the multiple $slice operations in the previous example.
{$addFields: {friends: {$function: {
body: function(candidates, npicks) {
var picks = []
for(var i = 0; i < npicks; i++) {
var idx = Math.floor(Math.random() * candidates.length);
picks.push(candidates.splice(idx,1)[0]);
}
return picks;
},
args: [ "$friends", 4], // 4 is num to pick
lang: "js"
}}

Related

Mongodb Random Data from aggregate function with Bias

Does mongodb aggragate function support biases or will favor values that are specified, for example i have an array of values for a variable
genre = [science, math, english]
and I want to get (5) random document from the database where the document has a either 1 of the genres in the specified array, I want to get these data from biases since if ever that only 2 documents matched the specified condition, i want the other 3 to be randomized instead, thus completing the 5 random documents that i need.
Heres what i've gotten so far, but it only gets random data without any values
const book = await Book.aggregate([
{
$match: { genre: { $type: "string" } }
},
{
$sample: { size: 6 }
},
{
$set: {
genre: {
$cond: {
if: { $eq: [{ $type: "$genre" }, "string"] },
then: ["$genre"],
else: "$genre"
}
}
}
},
]);

How to remove duplicates based on a condition in Mongodb?

{
"_id" : ObjectId("5d3acf79ea99ef80dca9bcca"),
"memberId" : "123",
"generatedId" : "00000d2f-9922-457a-be23-731f5fefeb14",
"memberType" : "premium"
},
{
"_id" : ObjectId("5e01554cea99eff7f98d7eed"),
"memberId" : "123",
"generatedId" : "34jkd2092sdlk02kl23kl2309k2309kr",
"memberType" : "premium"
}
I have 1 million docs like this format and how can i remove duplicated docs based on "memberId".
I need to be remove the duplicated docs where the "generatedId" value do not contain "-". In this example it should be deleted the bottom doc since it does not contains "-" in the "generatedId" value.
Can someone share any idea how to do this.

Well, there can be a strategy, but still, it depends on your data a lot.
Let's say you take your docs. Group them by their Id's for counting (duplicates), and then from the duplicates separate out all those entries where generatedId does not contain hyphens "-". When you get these docs which are duplicates and also does not contain - in their generatedId, you can delete them.
const result = await Collection.aggregate([
{
$project: {
_id: 1, // keep the _id field where it is anyway
doc: "$$ROOT", // store the entire document in the "doc" field
},
},
{
$group: {
_id: "$doc.memberId", // group by the documents by memeberId
count: { $sum: 1 }, // count the number of documents in this group
generatedId: { $first: "$doc.generatedId" }, // for keeping these values to be passed to other stages
memberType: { $first: "$doc.memberType" }, // for keeping these values to be passed to other stages
},
},
{
$match: {
count: { $gt: 1 }, // only show what's duplicated because it'll have count greater than 1
// It'll match all those documents not having - in them
generatedId: { $regex: /^((?!-).)*$/g } / g,
},
},
]);
Now in the result, you'll have docs which were memberId duplicates and does not have - in their generatedId. You can query them for deletion.
Warning:
Depending on your data it's possible certain duplicated memberId does not have '-' at all in their generatedIds, so you might delete all docs.
Always take backup before performing operations that might behave uncertain way.

db.collection.aggregate([
{
// first match all records with having - in generatedId
"$match" : { "generatedId" : { "$regex": "[-]"} } },
// then group them
{
"$group": {
"_id": "$memberId",
}}
])

How to extract values of the Keys from an array of collections in Javascript

I want to get the values of the keys in an array from a series of collection. Please find the snippet of code and expected output. I would
be grateful if you can help me with what should go into the aggregate phase to get the desired output. Please note there will be more than 5000 collections
db.collection1.insertMany([{
item: "journal",
qty: 25,
tags: blank
},
{
item: "mat",
qty: 85,
tags: gray
},
{
item: "mousepad",
qty: 25,
tags: gel
}
])
db.collection2.insertMany([{
abc: "paplu",
qiity: 01,
thugs: red
},
{
abc: "mat",
qiity: 85,
thugs: gray
},
{
abc: "mousepad",
qiity: 25,
thugs: gel
}
])
var a = ["collection1", "collection2"];
for (var i = 0; i < a.length; i++) {
db[a[i]].aggregate([])};
Expected Output:
collection1
{
item : ["journal","mat","mousepad"],
qty : [25,85,25],
tags : [blank,gray,gel]
}
collection2
{
abc : ["paplu","mat","mousepad"],
qiity : [01,85,25],
thugs : [red,gray, gel]
}
Please note, I'm trying to achieve this using MongoDB/JavaScript
It would be great if someone can help me with this!

You can start with $objectToArray to get keys from ROOT object into an array. Then you can run $unwind and $group by null with $addToSet to get single document which contains unique key names from entire collection. In the last step you need to convert an array back to single document using $map, $arrayToObject and $replaceRoot:
db.collection.aggregate([
{
$project: { kv: { $objectToArray: "$$ROOT" } }
},
{ $unwind: "$kv" },
{
$group: {
_id: null,
keys: { $addToSet: "$kv.k" }
}
},
{
$replaceRoot: {
newRoot: {
$arrayToObject: {
$map: { input: "$keys", in: [ "$$this", 1 ] }
}
}
}
}
])
Mongo Playground

Is it possible to find random documents in collection, without same fields? (monogdb\node.js)

For example, I have a collection users with the following structure:
{
_id: 1,
name: "John",
from: "Amsterdam"
},
{
_id: 2,
name: "John",
from: "Boston"
},
{
_id: 3,
name: "Mia",
from: "Paris"
},
{
_id: 4,
name: "Kate",
from: "London"
},
{
_id: 5,
name: "Kate",
from: "Moscow"
}
How can I get 3 random documents in which names will not be repeated?
Using the function getFourNumbers(1, 5), I get array with 3 non-repeating numbers and search by _id
var random_nums = getThreeNumbersnumbers(1, 5); // [2,3,1]
users.find({_id: {$in: random_nums}, function (err, data) {...} //[John, Mia, John]
But it can consist two Johns or two Kates, what is unwanted behavior. How can I get three random documents ( [John, Mia, Kate]. Not [John, Kate, Kate] or [John, Mia, John]) with 1 or maximum 2 queries? Kate or John (duplicated names) should be random, but should not be repeated.

There you go - see the comments in the code for further explanation of what the stages do:
users.aggregate(
[
{ // eliminate duplicates based on "name" field and keep track of first document of each group
$group: {
"_id": "$name",
"doc": { $first: "$$ROOT" }
}
},
{
// restore the original document structure
$replaceRoot: {
newRoot: "$doc"
}
},
{
// select 3 random documents from the result
$sample: {
size:3
}
}
])
As always with the aggrgation framework you can run the query with more or less stages added in order to see the transformations step by step.

I think what you are looking for is the $group aggregator, which will give you the distinct value of the collection. It can be used as:
db.users.aggregate( [ { $group : { name : "$name" } } ] );
MongoDB docs: Retrieve Distinct Values

Cannot get correct result when using MongoDB aggregation in meteor

I am using MongoDB aggregation in meteor.
The items in database look like this:
// item1
{
products: {
aaa: 100,
bbb: 200
}
}
// item2
{
products: {
aaa: 300,
bbb: 400
}
}
My pipeline looks like this
let pipeline = [{
$limit: 10
}, {
$group: {
_id: {
// …
},
total: {
$sum: "$products.aaa"
}
}
}];
And it is working perfect. But when I change my database structure to this
// item1
{
products: [
{code: "aaa", num: 100},
{code: "bbb", num: 200}
]
}
// item2
{
products: [
{code: "aaa", num: 300},
{code: "bbb", num: 400}
]
}
The results I got for total is always 0, I think my pipeline is wrong. Please see the comment inside:
let pipeline = [{
$limit: 10
}, {
$group: {
_id: {
// …
},
total: {
$sum: "$products.0.num" // Neither this nor "$products[0].num" works
}
}
}];
So how can I write it correctly? Thanks

With MongoDB 3.2 ( which won't be the bundled server with meteor, but there is noting stopping you using a seperate server instance. And actually would be recommended ) you can use $arrayElemAt with $map:
let pipeline = [
{ "$limit": 10 },
{ "$group": {
"_id": {
// …
},
"total": {
"$sum": { "$arrayElemAt": [
{ "$map": {
"input": "$products",
"as": "product",
"in": "$$product.num"
}},
0
]}
}
}}
];
With older versions, use "two" $group stages and the $first operator after processing with $unwind. And that's just for the "first" index value:
let pipeline = [
{ "$limit": 10 },
{ "$unwind": "$products" },
{ "$group": {
"_id": "$_id", // The document _id
"otherField": { "$first": "$eachOtherFieldForGroupingId" },
"productNum": { "$first": "$products.num" }
}},
{ "$group": {
"_id": {
// …
},
"total": {
"$sum": "$productNum"
}
}}
];
So in the latter case, after you $unwind you just want to use $first to get the "first" index from the array, and it would also be used to get every field you want to use as part of the grouping key from the original document. All elements would be copied for each array member after $unwind.
In the former case, $map just extracts the "num" values for each array member, then $arrayElemAt just retrieves the wanted index position.
Naturally the newer method for MongoDB 3.2 is better. If you wanted another array index then you would need to repeatedly get the $first element from the array and keep filtering it out from the array results until you reached the required index.
So whilst it's possible in earlier versions, it's a lot of work to get there.

We Keep Coding

JavaScript is the programming language of the Web.

Mongodb: Get sample of nested array using Aggreagate - javascript

Related

Mongodb Random Data from aggregate function with Bias

How to remove duplicates based on a condition in Mongodb?

How to extract values of the Keys from an array of collections in Javascript

Is it possible to find random documents in collection, without same fields? (monogdb\node.js)

Cannot get correct result when using MongoDB aggregation in meteor

Categories

Resources