Mongodb Random Data from aggregate function with Bias - javascript

Does mongodb aggragate function support biases or will favor values that are specified, for example i have an array of values for a variable
genre = [science, math, english]
and I want to get (5) random document from the database where the document has a either 1 of the genres in the specified array, I want to get these data from biases since if ever that only 2 documents matched the specified condition, i want the other 3 to be randomized instead, thus completing the 5 random documents that i need.
Heres what i've gotten so far, but it only gets random data without any values
const book = await Book.aggregate([
{
$match: { genre: { $type: "string" } }
},
{
$sample: { size: 6 }
},
{
$set: {
genre: {
$cond: {
if: { $eq: [{ $type: "$genre" }, "string"] },
then: ["$genre"],
else: "$genre"
}
}
}
},
]);

Related

Mongodb: Get sample of nested array using Aggreagate

$slice allows me to get a slice of nested array. I used it successfully like this:
const user = await User.aggregate([
{ $match: { _id: ObjectId(user_id) } },
{
$lookup: {
from: "users",
let: { friends: "$friends" },
pipeline: [
{ $match: { $expr: { $in: ["$_id", "$$friends"] } } },
{
$lookup: {
from: "profiles",
localField: "profile",
foreignField: "_id",
as: "profile",
},
},
{
$match: {
"profile.online": true,
},
},
{
$project: {
name: "$name",
surname: "$surname",
profile: { $arrayElemAt: ["$profile", 0] },
},
},
],
as: "friends",
},
},
{
$addFields: {
friends: {
$slice: ["$friends", skip, limit],
},
},
},
]);
Now, instead of taking a slice, I would like to take a random sample of the array field friends.
I could not find a way to do this. But, in the group stage I can use something like this:
const pipeline = [
{
$lookup: {
from: "profiles",
let: { profiles_id: "$profile" },
pipeline: [
{
$match: {
online: true,
$expr: { $eq: ["$_id", "$$profiles_id"] },
},
},
],
as: "profile",
},
},
{ $unwind: "$profile" },
{ $sample: { size: 10 } },
];
const users = await User.aggregate(pipeline);
Change the last $addFields stage to this.
Pros: It "works."
Cons: You are not guaranteed unique random entries in the list. To get that is a lot more work. If you have a LOT more friends than the range then you are probably OK.
,{$addFields: {friends: {$reduce: { // overwrite friends array...
// $range is the number of things you want to pick:
input: {$range:[0,4]},
initialValue: [],
in: {
$let: {
// qq will be a random # between 0 and size-1 thanks to mult
// and floor, so we do not have to do qq-1 to get to zero-based
// indexing on the $friends array
vars: {qq: {$floor:{$multiply:[{$rand: {}},{$size:"$friends"}]}} },
// $concat only works for strings, but $concatArrays can be used
// (creatively) on other types. Here $slice returns an array of
// 1 item which we easily pass to $concatArrays to build the
// the overall result:
in: {$concatArrays: [ "$$value", {$slice:["$friends","$$qq",1]} ]}
}}
}}
UPDATED
This version exploits keeping state in the $reduce chain and will not pick dupes. It does so by iteratively shrinking the input candidate list of items as each item is randomly chosen. The output is a little nested (i.e. friends is not set to picked random sample but rather to an object containing picks and the remnant aa list) but this is something easily reformatted after the fact. In MongoDB 5.0 we could finish it off with:
{$addFields: {friends: {$getField: {field: "$friends.picks", input: {$reduce: {
but many people are not yet on 5.0.
{$addFields: {friends: {$reduce: {
// $range is the number of things you want to pick:
input: {$range:[0,6]},
// This is classic use of $reduce to iterate over something AND
// preserve state. We start with picks as empty and aa being the
// original friends array:
initialValue: {aa: "$friends", picks: []},
in: {
$let: {
// idx will be a random # between 0 and size-1 thanks to mult
// and floor, so we do not have to do idx-1 to get to zero-based
// indexing on the $friends array. idx and sz will be eval'd
// each time reduce turns the crank through the input range:
vars: {idx: {$floor:{$multiply:[{$rand: {}},{$size:"$$value.aa"}]}},
// cannot set sz and then use it in same vars; oh well
sz: {$size:"$$value.aa"}
},
in: {
// Add to our picks list:
picks: {$concatArrays: [ "$$value.picks", {$slice:["$$value.aa","$$idx",1]} ]},
// And now shrink up the input candidate array.
// Sadly, we cannot do $slice:[array,pos,0] to yield an empty
// array and keep the $concat logic tight; thus we have to test
// for front and end special conditions.
// This whole bit is to extract the chosen item from the aa
// array by splicing together a new one MINUS the target.
// This will change the value of $sz (-1) as we crank thru
// the picks. This ensures we only pick UNPICKED items from
// $$value.aa!
aa: {$cond: [{$eq:["$$idx",0]}, // if
// idx 0: Take from idx 1 and count size - 1:
{$slice:["$$value.aa",1,{$subtract:["$$sz",1]}]}, // then
// idx last: Take from idx 0 and ALSO count size - 1:
{$cond: [ // else
{$eq:["$$idx",{$subtract:["$$sz",1]}]}, // if
{$slice:["$$value.aa",0,{$subtract:["$$sz",1]}]}, // then
// else not 0 or last item, i.e. idx = 3
{$concatArrays: [
// Start at 0, count idx; this will land
// us BEFORE the target item (because idx
// is n-1:
{$slice:["$$value.aa",0,"$$idx"]},
// Jump over the target (+1), and go n-2
// (1 for idx/n conversion, and 1 for the
// fact we jumped over:
{$slice:["$$value.aa",{$add:["$$idx",1]},{$subtract:["$$sz",2]}]}
]}
]}
]}
}
}}
}}
}}
]);
Starting in MongoDB v4.4 (Jan 2021), you may opt to use the $function operator. The splice function in javascript does all the work of the multiple $slice operations in the previous example.
{$addFields: {friends: {$function: {
body: function(candidates, npicks) {
var picks = []
for(var i = 0; i < npicks; i++) {
var idx = Math.floor(Math.random() * candidates.length);
picks.push(candidates.splice(idx,1)[0]);
}
return picks;
},
args: [ "$friends", 4], // 4 is num to pick
lang: "js"
}}

How to remove duplicates based on a condition in Mongodb?

{
"_id" : ObjectId("5d3acf79ea99ef80dca9bcca"),
"memberId" : "123",
"generatedId" : "00000d2f-9922-457a-be23-731f5fefeb14",
"memberType" : "premium"
},
{
"_id" : ObjectId("5e01554cea99eff7f98d7eed"),
"memberId" : "123",
"generatedId" : "34jkd2092sdlk02kl23kl2309k2309kr",
"memberType" : "premium"
}
I have 1 million docs like this format and how can i remove duplicated docs based on "memberId".
I need to be remove the duplicated docs where the "generatedId" value do not contain "-". In this example it should be deleted the bottom doc since it does not contains "-" in the "generatedId" value.
Can someone share any idea how to do this.
Well, there can be a strategy, but still, it depends on your data a lot.
Let's say you take your docs. Group them by their Id's for counting (duplicates), and then from the duplicates separate out all those entries where generatedId does not contain hyphens "-". When you get these docs which are duplicates and also does not contain - in their generatedId, you can delete them.
const result = await Collection.aggregate([
{
$project: {
_id: 1, // keep the _id field where it is anyway
doc: "$$ROOT", // store the entire document in the "doc" field
},
},
{
$group: {
_id: "$doc.memberId", // group by the documents by memeberId
count: { $sum: 1 }, // count the number of documents in this group
generatedId: { $first: "$doc.generatedId" }, // for keeping these values to be passed to other stages
memberType: { $first: "$doc.memberType" }, // for keeping these values to be passed to other stages
},
},
{
$match: {
count: { $gt: 1 }, // only show what's duplicated because it'll have count greater than 1
// It'll match all those documents not having - in them
generatedId: { $regex: /^((?!-).)*$/g } / g,
},
},
]);
Now in the result, you'll have docs which were memberId duplicates and does not have - in their generatedId. You can query them for deletion.
Warning:
Depending on your data it's possible certain duplicated memberId does not have '-' at all in their generatedIds, so you might delete all docs.
Always take backup before performing operations that might behave uncertain way.
db.collection.aggregate([
{
// first match all records with having - in generatedId
"$match" : { "generatedId" : { "$regex": "[-]"} } },
// then group them
{
"$group": {
"_id": "$memberId",
}}
])

How to find next N elements from a cursor with MongoDB, without _id and on a sorted cursor

Let's say I have three person documents in a MongoDB, inserted in a random order.
{
"firstName": "Hulda",
"lastName": "Lamb",
},
{
"firstName": "Austin",
"lastName": "Todd",
},
{
"firstName": "John",
"lastName": "Doe",
}
My goal is to obtain, let's say, the next person after Austin when the list is in alphabetical order. So I would like to get the person with firstName = Hulda.
We can assume that I know Austin's _id.
My first attempt was to rely on the fact that _id is incremental, but it won't work because the persons can be added in any order in the database. Hulda's _id field has a value less than Austin's. I cannot do something like {_id: {$gt: <Austin's _id here>}};
And I also need to limit the number of returned elements, so N is a dynamic value.
Here is the code I have now, but as I mentioned, the ID trick is not working.
let cursor: any = this.db.collection(collectionName).find({_id: {$gt:
cursor = cursor.sort({firstName: 1});
cursor = cursor.limit(limit);
return cursor.toArray();
Some clarifications:
startId is a valid, existing _id of an object
limit is a variable holding an positive integer value
sorting and limit works as expected, just the selection of the next elements is wrong, so the {_id: {$gt: startId}}; messes up the selection.
Every MongoDB's Aggregation Framework operation's context is restricted to a single document. There's no mechanism like window functions in SQL. Your only way is to use $group to get an array which contains all your documents and then get Austin's index to be able to apply $slice:
db.collection.aggregate([
{
$sort: { firstName: 1 }
},
{
$group: {
_id: null,
docs: { $push: "$$ROOT" }
}
},
{
$project: {
nextNPeople: {
$slice: [ "$docs", { $add: [ { $indexOfArray: [ "$docs.firstName", "Austin" ] }, 1 ] }, 1 ]
}
}
},
{ $unwind: "$nextNPeople" },
{
$replaceRoot: {
newRoot: "$nextNPeople"
}
}
])
Mongo Playground
Depending on your data size / MongoDB performance, above solution may or may not be acceptable - it's up to you to decide if you want to deploy such code on production since $group operation can be pretty heavy.

How to count number of subdocuments with condition

I have a mongoDB collection with documents like the one bellow. I want to cumulatively, over all documents, count how many subdocuments that the event field has, which is not null.
{
name: "name1",
events: {
created: {
timestamp: 1512477520951
},
edited: {
timestamp: 1512638551022
},
deleted: null
}
}
{
name: "name2",
events: {
created: {
timestamp: 1512649915779
},
edited: null,
deleted: null
}
}
So the result of the query on these two documents should return 3, because there are 3 events that is not null in the collection. I can not change the format of the document to have the event field be an array.
You want $objectToArray from MongoDB 3.4.7 or greater in order to do this as an aggregation statement:
db.collection.aggregate([
{ "$group": {
"_id": null,
"total": {
"$sum": {
"$size": {
"$filter": {
"input": {
"$objectToArray": "$events"
},
"cond": { "$ne": [ "$$this.v", null ] }
}
}
}
}
}}
])
That part is needed to look at the "events" object and translate each of the "key/value" pairs into array entries. In this way you can apply the $filter operation in order to remove the null "values" ( the "v" property ) and then use $size in order to count the matching list.
All of that is done under a $group pipeline stage using the $sum accumulator
Or if you don't have a supporting version, you need mapReduce and JavaScript execution in order to to the same "object to array" operation:
db.collection.mapReduce(
function() {
emit(null,
Object.keys(this.events).filter(k => this.events[k] != null).length);
},
function(key,values) {
return Array.sum(values);
},
{ out: { inline: 1 } }
)
That uses the same basic process by obtaining the object keys as an array and rejecting those where the value is found to be null, then obtaining the length of the resulting array.
Because of the JavaScript evaluation, this is much slower than the aggregation framework counterpart. But it's really a question of what server version you have available to support what you need.

Is it possible to find random documents in collection, without same fields? (monogdb\node.js)

For example, I have a collection users with the following structure:
{
_id: 1,
name: "John",
from: "Amsterdam"
},
{
_id: 2,
name: "John",
from: "Boston"
},
{
_id: 3,
name: "Mia",
from: "Paris"
},
{
_id: 4,
name: "Kate",
from: "London"
},
{
_id: 5,
name: "Kate",
from: "Moscow"
}
How can I get 3 random documents in which names will not be repeated?
Using the function getFourNumbers(1, 5), I get array with 3 non-repeating numbers and search by _id
var random_nums = getThreeNumbersnumbers(1, 5); // [2,3,1]
users.find({_id: {$in: random_nums}, function (err, data) {...} //[John, Mia, John]
But it can consist two Johns or two Kates, what is unwanted behavior. How can I get three random documents ( [John, Mia, Kate]. Not [John, Kate, Kate] or [John, Mia, John]) with 1 or maximum 2 queries? Kate or John (duplicated names) should be random, but should not be repeated.
There you go - see the comments in the code for further explanation of what the stages do:
users.aggregate(
[
{ // eliminate duplicates based on "name" field and keep track of first document of each group
$group: {
"_id": "$name",
"doc": { $first: "$$ROOT" }
}
},
{
// restore the original document structure
$replaceRoot: {
newRoot: "$doc"
}
},
{
// select 3 random documents from the result
$sample: {
size:3
}
}
])
As always with the aggrgation framework you can run the query with more or less stages added in order to see the transformations step by step.
I think what you are looking for is the $group aggregator, which will give you the distinct value of the collection. It can be used as:
db.users.aggregate( [ { $group : { name : "$name" } } ] );
MongoDB docs: Retrieve Distinct Values

Categories