grouping and counting occurrence values in a collection MongoDB?

grouping and counting occurrence values in a collection MongoDB? - javascript

hope you are well. I am trying to make a graph from my collection but I've run into an issue. How can I group the locations and then count the number of locations in the collection? so for the example below I have 5 kids, I want to know all their locations and how many kids share the same locations, (A = 2, B = 2, and C = 1) that way I can plot Location vs the number of kids in that location. So to summarize, what locations are there and how many kids in each location.
"name": "Tom",
"location": "A'
"name": "Sarah",
"location": "B'
"name": "Jane",
"location": "C'
"name": "HIllary",
"location": "A'
"name": "Mat",
"location": "B'
Edit here is my code
router.get('/contact', function (req, res) {
const locations = Kids.aggregate([
{
$group: {
_id: {
continent: "$locations",
},
count: {
$sum: 1,
},
},
}
])
locations.forEach(function(item) {
console.log(`${item._id.province} num of kids is: ${item.count}`);
});
res.render('contact');
});

db.kids.aggregate([ {$group:{_id:"$location" , count:{$sum:1} } } ])

"The logic" (What to do with the values) not related to the steps (Store the values in variables and do "something").
HOW TO (OUTLINE)
Example - Simple countries collection group by continents using compass
(Great to use compass to learn this topic).
Our data (4 countries: Italy, England, Nigeria, Brazil, and their continents).
Step 1/3 - explore your aggregate
Output 3 groups (Europe (count 2), Africa (1), South America (1))
{
_id: {
continent: "$continent"
},
count: {
$sum: 1,
}
}
Step 2/3 - export code
From GUI to code
export
copy paste
Step 3/3 - aggregate() method & cursoer.forEach()
The related code (Assuming you connect a MongoDB instance using node Connection Guide)
const aggregate_cursor = db.collection("test_listing").aggregate([
{
$group: {
_id: {
continent: "$continent",
},
count: {
$sum: 1,
},
},
}
])
And loop throw the cursor:
aggregate_cursor .forEach(function(item) {
console.log(`${item._id.continent} num of countries is: ${item.count}`);
});
Output:
Related docs:
aggregation: https://docs.mongodb.com/manual/reference/operator/aggregation/group/
cursor.forEach: https://docs.mongodb.com/manual/reference/method/cursor.forEach/

Related

how to get array of fields in mongo aggregate

I am using a Mongo aggregated framework, suppose if I am having collection structure like this
{
{
_id: ObjectId(123)
name: john,
sessionDuration: 29
},
{
_id: ObjectId(456)
name: moore,
sessionDuration: 45
},
{
_id: ObjectId(789)
name: cary,
sessionDuration: 25
},
}
I want to query and create a pipeline such that it return something like this:
{
durationsArr: [29, 49, 25, '$sessionDuration_Field_From_Document' ];
}
I am doing this because I want to get average of durations from all the documents, so first adding all of it into an array, then I will add last stage where I do the $avg operation.
Any idea of how can I get the array of sessionDurationField. or do you have any other best approach to calculate the sessionDuration Average from the collection? please thoroughly explain am new to mongo aggregation.

$group - Group all documents.
1.1. $avg - Calculate the average of sessionDuration for all documents.
db.collection.aggregate([
{
$group: {
_id: null,
avgSessionDuration: {
$avg: "$sessionDuration"
}
}
}
])
Demo # Mongo Playground

How to find next N elements from a cursor with MongoDB, without _id and on a sorted cursor

Let's say I have three person documents in a MongoDB, inserted in a random order.
{
"firstName": "Hulda",
"lastName": "Lamb",
},
{
"firstName": "Austin",
"lastName": "Todd",
},
{
"firstName": "John",
"lastName": "Doe",
}
My goal is to obtain, let's say, the next person after Austin when the list is in alphabetical order. So I would like to get the person with firstName = Hulda.
We can assume that I know Austin's _id.
My first attempt was to rely on the fact that _id is incremental, but it won't work because the persons can be added in any order in the database. Hulda's _id field has a value less than Austin's. I cannot do something like {_id: {$gt: <Austin's _id here>}};
And I also need to limit the number of returned elements, so N is a dynamic value.
Here is the code I have now, but as I mentioned, the ID trick is not working.
let cursor: any = this.db.collection(collectionName).find({_id: {$gt:
cursor = cursor.sort({firstName: 1});
cursor = cursor.limit(limit);
return cursor.toArray();
Some clarifications:
startId is a valid, existing _id of an object
limit is a variable holding an positive integer value
sorting and limit works as expected, just the selection of the next elements is wrong, so the {_id: {$gt: startId}}; messes up the selection.

Every MongoDB's Aggregation Framework operation's context is restricted to a single document. There's no mechanism like window functions in SQL. Your only way is to use $group to get an array which contains all your documents and then get Austin's index to be able to apply $slice:
db.collection.aggregate([
{
$sort: { firstName: 1 }
},
{
$group: {
_id: null,
docs: { $push: "$$ROOT" }
}
},
{
$project: {
nextNPeople: {
$slice: [ "$docs", { $add: [ { $indexOfArray: [ "$docs.firstName", "Austin" ] }, 1 ] }, 1 ]
}
}
},
{ $unwind: "$nextNPeople" },
{
$replaceRoot: {
newRoot: "$nextNPeople"
}
}
])
Mongo Playground
Depending on your data size / MongoDB performance, above solution may or may not be acceptable - it's up to you to decide if you want to deploy such code on production since $group operation can be pretty heavy.

Elasticsearch - find closest number when scoring results

I need a way to match the closest number of an elasticsearch document.
I'm wanting to use elastic search to filter quantifiable attributes and have been able to achieve hard limits using range queries accept that results that are outside of that result set are skipped. I would prefer to have the closest results to multiple filters match.
const query = {
query: {
bool: {
should: [
{
range: {
gte: 5,
lte: 15
}
},
{
range: {
gte: 1979,
lte: 1989
}
}
]
}
}
}
const results = await client.search({
index: 'test',
body: query
})
Say I had some documents that had year and sales. In the snippet is a little example of how it would be done in javascript. It runs through the entire list and calculates a score, then based on that score it sorts them, at no point are results filtered out, they are just organized by relevance.
const data = [
{ "item": "one", "year": 1980, "sales": 20 },
{ "item": "two", "year": 1982, "sales": 12 },
{ "item": "three", "year": 1986, "sales": 6 },
{ "item": "four", "year": 1989, "sales": 4 },
{ "item": "five", "year": 1991, "sales": 6 }
]
const add = (a, b) => a + b
const findClosestMatch = (filters, data) => {
const scored = data.map(item => ({
...item,
// add the score to a copy of the data
_score: calculateDifferenceScore(filters, item)
}))
// mutate the scored array by sorting it
scored.sort((a, b) => a._score.total - b._score.total)
return scored
}
const calculateDifferenceScore = (filters, item) => {
const result = Object.keys(filters).reduce((acc, x) => ({
...acc,
// calculate the absolute difference between the filter and data point
[x]: Math.abs(filters[x] - item[x])
}), {})
// sum the total diffences
result.total = Object.values(result).reduce(add)
return result
}
console.log(
findClosestMatch({ sales: 10, year: 1984 }, data)
)
<script src="https://codepen.io/synthet1c/pen/KyQQmL.js"></script>
I'm trying to achieve the same thing in elasticsearch but having no luck when using a function_score query. eg
const query = {
query: {
function_score: {
functions: [
{
linear: {
"year": {
origin: 1984,
},
"sales": {
origin: 10,
}
}
}
]
}
}
}
const results = await client.search({
index: 'test',
body: query
})
There is no text to search, I'm using it for filtering by numbers only, am I doing something wrong or is this not what elastic search is made for and are there any better alternatives?
Using the above every document still has a default score, and I have not been able to get any filter to apply any modifiers to the score.
Thanks for any help, I new to elasticsearch links to articles or areas of the documentation are appreciated!

You had the right idea, you're just missing a few fields in your query to make it work.
It should look like this:
{
"query": {
function_score: {
functions: [
{
linear: {
"year": {
origin: 1984,
scale: 1,
decay: 0.999
},
"sales": {
origin: 10,
scale: 1,
decay: 0.999
}
}
},
]
}
}
}
The scale field is mandatory as it tells elastic how to decay the score, without it the query just fails.
The decay field is not mandatory, however without it elastic does not really know how to calculate the new score to documents so it will end up giving a default score only to documents in the range of origin + scale which is not useful for us.
source docs.
I also recommend you limit the result size to 1 if you want the top scoring document, otherwise you'll have add a sort phase (either in elastic or in code).
EDIT: (AVOID NULLS)
You can add a filter above the functions like so:
{
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"bool": {
"filter": [
{
"bool": {
"must": [
{
"exists": {
"field": "year"
}
},
{
"exists": {
"field": "sales"
}
},
]
}
}
]
}
},
{
"match_all": {}
}
]
}
},
"functions": [
{
"linear": {
"year": {
"origin": 1999,
"scale": 1,
"decay": 0.999
},
"sales": {
"origin": 50,
"scale": 1,
"decay": 0.999
}
}
}
]
}
}
}
Notice i have a little hack going on using match_all query, this is due to filter query setting the score to 0 so by using the match all query i reset it back to 1 for all matched documents.
This can also be achieved in a more "proper" way by altering the functions, a path i choose not to take.

Is it possible to find random documents in collection, without same fields? (monogdb\node.js)

For example, I have a collection users with the following structure:
{
_id: 1,
name: "John",
from: "Amsterdam"
},
{
_id: 2,
name: "John",
from: "Boston"
},
{
_id: 3,
name: "Mia",
from: "Paris"
},
{
_id: 4,
name: "Kate",
from: "London"
},
{
_id: 5,
name: "Kate",
from: "Moscow"
}
How can I get 3 random documents in which names will not be repeated?
Using the function getFourNumbers(1, 5), I get array with 3 non-repeating numbers and search by _id
var random_nums = getThreeNumbersnumbers(1, 5); // [2,3,1]
users.find({_id: {$in: random_nums}, function (err, data) {...} //[John, Mia, John]
But it can consist two Johns or two Kates, what is unwanted behavior. How can I get three random documents ( [John, Mia, Kate]. Not [John, Kate, Kate] or [John, Mia, John]) with 1 or maximum 2 queries? Kate or John (duplicated names) should be random, but should not be repeated.

There you go - see the comments in the code for further explanation of what the stages do:
users.aggregate(
[
{ // eliminate duplicates based on "name" field and keep track of first document of each group
$group: {
"_id": "$name",
"doc": { $first: "$$ROOT" }
}
},
{
// restore the original document structure
$replaceRoot: {
newRoot: "$doc"
}
},
{
// select 3 random documents from the result
$sample: {
size:3
}
}
])
As always with the aggrgation framework you can run the query with more or less stages added in order to see the transformations step by step.

I think what you are looking for is the $group aggregator, which will give you the distinct value of the collection. It can be used as:
db.users.aggregate( [ { $group : { name : "$name" } } ] );
MongoDB docs: Retrieve Distinct Values

Cannot get correct result when using MongoDB aggregation in meteor

I am using MongoDB aggregation in meteor.
The items in database look like this:
// item1
{
products: {
aaa: 100,
bbb: 200
}
}
// item2
{
products: {
aaa: 300,
bbb: 400
}
}
My pipeline looks like this
let pipeline = [{
$limit: 10
}, {
$group: {
_id: {
// …
},
total: {
$sum: "$products.aaa"
}
}
}];
And it is working perfect. But when I change my database structure to this
// item1
{
products: [
{code: "aaa", num: 100},
{code: "bbb", num: 200}
]
}
// item2
{
products: [
{code: "aaa", num: 300},
{code: "bbb", num: 400}
]
}
The results I got for total is always 0, I think my pipeline is wrong. Please see the comment inside:
let pipeline = [{
$limit: 10
}, {
$group: {
_id: {
// …
},
total: {
$sum: "$products.0.num" // Neither this nor "$products[0].num" works
}
}
}];
So how can I write it correctly? Thanks

With MongoDB 3.2 ( which won't be the bundled server with meteor, but there is noting stopping you using a seperate server instance. And actually would be recommended ) you can use $arrayElemAt with $map:
let pipeline = [
{ "$limit": 10 },
{ "$group": {
"_id": {
// …
},
"total": {
"$sum": { "$arrayElemAt": [
{ "$map": {
"input": "$products",
"as": "product",
"in": "$$product.num"
}},
0
]}
}
}}
];
With older versions, use "two" $group stages and the $first operator after processing with $unwind. And that's just for the "first" index value:
let pipeline = [
{ "$limit": 10 },
{ "$unwind": "$products" },
{ "$group": {
"_id": "$_id", // The document _id
"otherField": { "$first": "$eachOtherFieldForGroupingId" },
"productNum": { "$first": "$products.num" }
}},
{ "$group": {
"_id": {
// …
},
"total": {
"$sum": "$productNum"
}
}}
];
So in the latter case, after you $unwind you just want to use $first to get the "first" index from the array, and it would also be used to get every field you want to use as part of the grouping key from the original document. All elements would be copied for each array member after $unwind.
In the former case, $map just extracts the "num" values for each array member, then $arrayElemAt just retrieves the wanted index position.
Naturally the newer method for MongoDB 3.2 is better. If you wanted another array index then you would need to repeatedly get the $first element from the array and keep filtering it out from the array results until you reached the required index.
So whilst it's possible in earlier versions, it's a lot of work to get there.

We Keep Coding

JavaScript is the programming language of the Web.

grouping and counting occurrence values in a collection MongoDB? - javascript

db.kids.aggregate([ {$group:{_id:"$location" , count:{$sum:1} } } ])

Related

how to get array of fields in mongo aggregate

How to find next N elements from a cursor with MongoDB, without _id and on a sorted cursor

Elasticsearch - find closest number when scoring results

Is it possible to find random documents in collection, without same fields? (monogdb\node.js)

Cannot get correct result when using MongoDB aggregation in meteor

Categories

Resources