Related
I have this simple collection of views:
Views:
[
{
title: "cartoons",
views: 1,
created_at: 2022-10-03 12:00:00.000Z
},
{
title: "songs",
views: 4,
created_at: 2022-10-04 12:00:00.000Z
},
{
title: "lectures",
views: 3,
created_at: 2022-10-10 12:00:00.000Z
},
{
title: "news",
views: 2,
created_at: 2022-10-05 12:00:00.000Z
},
{
title: "movies",
views: 6,
created_at: 2022-10-07 12:00:00.000Z
},
{
title: "tv series",
views: 6,
created_at: 2022-10-12 12:00:00.000Z
}
]
Here I need to see how many views I got on each day of week in e.g 2 years
Expected Result:
{
"monday": 4,
"tuesday": 4,
"wednesday": 8,
"thursday": 0,
"friday": 6,
"saturday": 0,
"sunday": 0,
}
Since I am very new to mongodb, Is this possible to perform such operation using query? If yes then can I get some help regarding this?
What about this?
// select some random mongo database for testing
use("stack")
// at first clean collection
db.data.drop()
// populate with initial data
db.data.insertMany([
{
title: "cartoons",
views: 1,
created_at: ISODate("2022-10-03 12:00:00.000Z"),
},
{
title: "songs",
views: 4,
created_at: ISODate("2022-10-04 12:00:00.000Z"),
},
{
title: "lectures",
views: 3,
created_at: ISODate("2022-10-10 12:00:00.000Z"),
},
{
title: "news",
views: 2,
created_at: ISODate("2022-10-05 12:00:00.000Z"),
},
{
title: "movies",
views: 6,
created_at: ISODate("2022-10-07 12:00:00.000Z"),
},
{
title: "tv series",
views: 6,
created_at: ISODate("2022-10-12 12:00:00.000Z"),
}
])
// get results
p = [
// get day of week for each record based on created_at date
{
$project: {
weekDay: {
$arrayElemAt: [
// mongo returns day numbers from 1 to 7, Sunday being 1
["sunday", "monday", "tuesday", "wednesday", "thursday", "friday", "saturday"],
{ $add: [ {$dayOfWeek: "$created_at"}, -1 ] }
]
},
views: 1,
_id: 0,
}
},
// count sum of views numbers for each weekday
{
$group: { _id: "$weekDay", total_views: {$sum: "$views"} }
},
// reshape current results to make them easily convertable to one final object
{
$replaceRoot: {
newRoot: { k: "$_id", v: "$total_views" }
}
},
// step required to get just 1 document at the end
{
$group: {
_id: 0,
merged: { $push: "$$ROOT" }
}
},
// fill in missing week days with 0 values and follow sorting order that we want
{
$project: {
merged: {
$mergeObjects: [
{
"monday": 0,
"tuesday": 0,
"wednesday": 0,
"thursday": 0,
"friday": 0,
"saturday": 0,
"sunday": 0,
},
{$arrayToObject: "$merged"},
]
}
}
},
// return field value that we want directly
{
$replaceRoot: { newRoot: "$merged"}
}
]
// Run
db.data.aggregate(p)
And the result is
[
{
"monday": 4,
"tuesday": 4,
"wednesday": 8,
"thursday": 0,
"friday": 6,
"saturday": 0,
"sunday": 0
}
]
https://mongoplayground.net/p/QutCGjKiy6z
await db.collectionName.aggregate([{
$addFields: {
days: {
$dayOfWeek: {
$toDate: '$created_at'
}
}
}
}, {
$group: {
_id: {
days: '$days'
},
totalReview: {
$sum: '$views'
},
daysCount: {
$sum: 1
}
}
}, {
$project: {
_id: 0,
totalReview: 1,
day: {
$switch: {
branches: [
{
'case': {
$eq: [
'$_id.days',
1
]
},
then: 'sunday'
},
{
'case': {
$eq: [
'$_id.days',
2
]
},
then: 'monday'
},
{
'case': {
$eq: [
'$_id.days',
3
]
},
then: 'tuesday'
},
{
'case': {
$eq: [
'$_id.days',
4
]
},
then: 'wednesday'
},
{
'case': {
$eq: [
'$_id.days',
5
]
},
then: 'thursday'
},
{
'case': {
$eq: [
'$_id.days',
6
]
},
then: 'friday'
},
{
'case': {
$eq: [
'$_id.days',
7
]
},
then: 'saturday'
}
],
'default': 'day unknown'
}
}
}
}]);
You can do it like this:
$set and $isoDayOfWeek - to calculate the day of week based on created_at property
$group and $sum - to sum all views for each day of the week
db.collection.aggregate([
{
"$set": {
"dayOfWeek": {
"$isoDayOfWeek": "$created_at"
}
}
},
{
"$group": {
"_id": "$dayOfWeek",
"count": {
"$sum": "$views"
}
}
}
])
Note: In the response, 1 is Sunday and 7 is Saturday.
Working example
I'm trying to merge a few complicated documents via the mongodb.collection.aggregate() command.
Let's say that I want to merge x of the collection's document (in the following example: x=2):
[
{
"_id": 1,
"Data": {
"children": {
"1": {
"name": "appear_only_in_first_doc",
"cost": 1,
"revenue": 4.5,
"grandchildren": {
"1t9dsqdqdvoj8pdppxjk": {
"cost": 0,
"revenue": 1.5
}
}
},
"2": {
"name": "appear_in_both_docs",
"cost": 2,
"revenue": 7,
"grandchildren": {
"jesrdt5qwef2222dgt": {
"cost": 1,
"revenue": 3
},
"klh352hk5367kf": {
"cost": 2,
"revenue": 7
}
}
}
}
}
},
{
"_id": 2,
"Data": {
"children": {
"2": {
"name": "appear_in_both_docs___but_diff_name",
"cost": 9,
"revenue": 7,
"grandchildren": {
"aaaaaaaaa": {
"cost": 3,
"revenue": 2
},
"jesrdt5qwef2222dgt": {
"cost": 6,
"revenue": 5
}
}
},
"3": {
"name": "appear_only_in_last_doc",
"cost": 4,
"revenue": 2,
"grandchildren": {
"cccccccccccc": {
"cost": 4,
"revenue": 2
}
}
}
}
}
}
]
Challenges:
The keys under the "children" and "grandchildren" keys are dynamic and unknown while writing the query.
If a child or grandchild appears only in one doc (e.g. "1", "3", "1t9dsqdqdvoj8pdppxjk", "klh352hk5367kf", "aaaaaaaaa" and "cccccccccccc") - it should also appear in the final result.
If a child appears in more than one docs (e.g. "2" and "jesrdt5qwef2222dgt") - it should appear as one in the final result. The fields "cost" and "revenue" should be summed, and the last "name" field should be taken.
I've seen the following solutions:
unionWith - Irrelevant, unions 2 different collections.
merge - Irrelevant, cannot sum values of fields that appear in more than once (it takes the last instead).
mergeObjects - Irrelevant, cannot sum values of fields that appear in more than once (it takes the last instead).
The final result should look like this:
{
"Data": {
"children": {
"1": {
"name": "appear_only_in_first_doc",
"cost": 1,
"revenue": 4.5,
"grandchildren": {
"1t9dsqdqdvoj8pdppxjk": {
"cost": 0,
"revenue": 1.5
}
}
},
"2": {
"name": "appear_in_both_docs___but_diff_name",
"cost": 11,
"revenue": 14,
"grandchildren": {
"aaaaaaaaa": {
"cost": 3,
"revenue": 2
},
"jesrdt5qwef2222dgt": {
"cost": 7,
"revenue": 8
},
"klh352hk5367kf": {
"cost": 2,
"revenue": 7
}
}
},
"3": {
"name": "appear_only_in_last_doc",
"cost": 4,
"revenue": 2,
"grandchildren": {
"cccccccccccc": {
"cost": 4,
"revenue": 2
}
}
}
}
}
}
This is little lengthy process, might be there will be some easy one, I am just sharing the process,
$project to convert children to array format (k,v)
$unwind deconstruct children array
$group by children key and do sum of cost and revenue, and get last name using $last
$unwind deconstruct grandchildren array
$addFields to convert grandchildren to array format (k,v)
$unwind deconstruct grandchildren array
db.collection.aggregate([
{ $project: { "Data.children": { $objectToArray: "$Data.children" } } },
{ $unwind: "$Data.children" },
{
$group: {
_id: "$Data.children.k",
name: { $last: "$Data.children.v.name" },
cost: { $sum: "$Data.children.v.cost" },
revenue: { $sum: "$Data.children.v.revenue" },
grandchildren: { $push: "$Data.children.v.grandchildren" }
}
},
{ $unwind: "$grandchildren" },
{ $addFields: { grandchildren: { $objectToArray: "$grandchildren" } } },
{ $unwind: "$grandchildren" },
$group by children key and grandchildren key, count the sum of cost and revenue of grandchildren
{
$group: {
_id: {
ck: "$_id",
gck: "$grandchildren.k"
},
cost: { $first: "$cost" },
revenue: { $first: "$revenue" },
name: { $first: "$name" },
grandchildren_cost: { $sum: "$grandchildren.v.cost" },
grandchildren_revenue: { $sum: "$grandchildren.v.revenue" }
}
},
$group by children key and re-construct grandchildren array
{
$group: {
_id: "$_id.ck",
cost: { $first: "$cost" },
revenue: { $first: "$revenue" },
name: { $last: "$name" },
grandchildren: {
$push: {
k: "$_id.gck",
v: {
cost: "$grandchildren_cost",
revenue: "$grandchildren_revenue"
}
}
}
}
},
$group by null and re-construct children array and convert grandchildren to object from (k,v) array using $arrayToObject
{
$group: {
_id: null,
children: {
$push: {
k: "$_id",
v: {
name: "$name",
cost: "$cost",
revenue: "$revenue",
grandchildren: { $arrayToObject: "$grandchildren" }
}
}
}
}
},
$project to convert children to object using $arrayToObject
{
$project: {
_id: 0,
"Data.children": { $arrayToObject: "$children" }
}
}
])
Playground
I'm converting MongoDB Query to Elasticsearch in NodeJS platform. While developing I'm facing some difficulties with grouping and filtering data (getting nested objects like hits.hits._source) within Elasticsearch Query like we doing in MongoDB Query.
Example:-
UserModel.aggregate([
{
$match: {
uId: req.body.uId, timestamp: { $gte: req.body.date, $lte: new Date() }
},
},
{
$group: {
_id: "$eId",
location: {
$push: {
time: "$timestamp", lat: "$lat"
}
},
timestamp: {
$push: "$timestamp"
},
testId: { $first: "$testId" },
}
},
{
$project: {
eId: 1, location: 1, testId: 1, max: { $max: "$timestamp" }
}
},
{ $unwind: { path: "$location", preserveNullAndEmptyArrays: true } },
{
$redact: {
$cond: {
if: { $eq: ["$location.time", "$max"] },
then: "$$DESCEND",
else: "$$PRUNE"
}
}
},
{
$project: {
eId: 1, latitude: "$location.lat", testId: 1
}
},
]).exec(function (err, result) {
console.log(result)
});
What will be the equivalent query in Elasticsearch?
I'm looking for solution with grouping, unwinding and projecting (MongoDB concepts to Elasticsearch) required data with minimal nested response.
Thanks in Advance.
EDIT:-
Adding Elasticsearch Document:-
{
"timestamp": "2019-10-08T:02:50:15.54Z",
"status" : 1,
"eId": "5d5d7ce0c89852e7bad4a407",
"location": [
2.000,
34.5664111801
],
"zId": "5d5d7ce0c89852e7bad4a4ef"
},
{
"timestamp": "2019-10-09T:02:50:15.54Z",
"status" : 1,
"eId": "5d5d7ce0c89852e7bad4a408",
"location": [
2.100,
35.5664111801
],
"zId": "5d5d7ce0c89852e7bad4a4ef"
},
{
"timestamp": "2019-10-09T:03:50:15.54Z",
"status" : 1,
"eId": "5d5d7ce0c89852e7bad4a407",
"location": [
4.100,
35.5664111801
],
"zId": "5d5d7ce0c89852e7bad4a4ef"
},
{
"timestamp": "2019-10-09T:03:40:15.54Z",
"status" : 1,
"eId": "5d5d7ce0c89852e7bad4a407",
"location": [
2.100,
35.5664111801
],
"zId": "5d5d7ce0c89852e7bad4a4e1"
},
{
"timestamp": "2019-10-10T:03:40:15.54Z",
"status" : 1,
"eId": "5d5d7ce0c89852e7bad4a407",
"location": [
3.100,
35.5664111801
],
"zId": "5d5d7ce0c89852e7bad4a4e1"
}
Match with status =1, and Group By eId
With that results, group by timestamp and get max timestamp value
Expected Result:-
[
{
"_id": "5d5d7ce0c89852e7bad4a407",
"max": "2019-10-10T:03:40:15.54Z", // max timestamp
"zId": [
"5d5d7ce0c89852e7bad4a4e1",
"5d5d7ce0c89852e7bad4a4ef"
]
},
{
"_id": "5d5d7ce0c89852e7bad4a408",
"max": "2019-10-09T:02:50:15.54Z",
"zId": [
"5d5d7ce0c89852e7bad4a4ef"
]
}, // ...etc
]
Thanks for the documents. Sadly, I do not know any way to retrieve only the documents having the max timestamp field value.
The following query will allow you to filter by status and group by eId then get the max timestamp value, but it will not return the documents having the max timestamp value.
{
"size": 0,
"query": {
"term": {
"status": 1
}
},
"aggregations": {
"eId_group": {
"terms": {
"field": "eId"
},
"aggregations": {
"max_timestamp": {
"max": {
"field": "timestamp"
}
}
}
}
}
}
This second query use a top_hits aggregation to retrieve the documents grouped by eId. The returned documents are sorted by decreasing timestamp value so the documents having the max timestamp will be firsts, but you may also get documents with different timestamps.
{
"size": 0,
"query": {
"term": {
"status": 1
}
},
"aggregations": {
"eId_group": {
"terms": {
"field": "eId"
},
"aggregations": {
"max_timestamp": {
"max": {
"field": "timestamp"
}
},
"top_documents": {
"top_hits": {
"size": 20,
"sort": { "timestamp": "desc"}
}
}
}
}
}
}
I used the following mapping for the index
PUT /test_index
{
"mappings": {
"properties": {
"timestamp": {
"type": "date"
},
"eId": {
"type": "keyword"
},
"zId": {
"type": "keyword"
},
"status": {
"type": "keyword"
}
}
}
}
Given the following data set of Event objects:
[
{
"_id": ObjectId("4fda05cb322b1c95b531ac26",
"title": "BUTTON CLICKED",
"createdAt": ISODate("2017-01-12T01:00:00+01:00")
},
{
"_id": ObjectId("1235h1k235h1kl325h1v31gv",
"title": "BUTTON CLICKED",
"createdAt": ISODate("2017-01-14T01:00:00+01:00")
},
{
"_id": ObjectId("c2n890904cn897qnxp23hjk1",
"title": "PAGE VIEWED",
"createdAt": ISODate("2017-01-12T02:00:00+01:00")
}
]
How would I group them by date then by name?
The desired result would look like this:
[
{
_id: { year: 2017, month: 1, day: 11 },
events: [ {
title: "BUTTON PRESSED",
count: 3
}, {
title: "PAGE VIEWED",
count: 2
}
]
},
{
_id: { year: 2017, month: 1, day: 24 },
events: [ {
title: "BUTTON PRESSED",
count: 1
}
]
}
]
Any help on this issue would be greatly appreciated, so thank you!
you can try this query
db.collectionName.aggregate([
$group: {
_id : {
year : { $year : "$createdAt" },
month : { $month : "$createdAt" },
day : { $dayOfMonth : "$createdAt" },
title: "$title"
},
count:{$sum:1}
}
},
{
$group:{
_id:{
year: "$_id.year",
month: "$_id.month",
day: "$_id.day"
},
data:{
$push: {
name:"$_id.title",
count:"$count"
}
}
}
}
])
I've been tasked with generating averages for day, week, month, and year for a rather large set of documents in MongoDB.
All of the jobs have a created field, and I need to base the average values off of the outputs array...
Here's what a document looks like:
{
__v: 0,
_id: ObjectId("535837911393fd0200d8e1eb"),
created: ISODate("2014-04-23T21:58:41.446Z"),
output: [
{
ref: {
img: false
},
type: "image/png",
methods: [
{
options: {
height: 200,
width: 200
},
method: "resize"
}
]
},
{
ref: {
img: false
},
type: "image/png",
methods: [
{
options: {
height: 400,
width: 400
},
method: "resize"
}
]
}
]
}
And here is what my current script looks like:
JobModel.aggregate([
{
$unwind: '$output'
},
{
$group: {
_id: { $dayOfYear: '$created' },
day: { $sum: 1 }
}
},
{
$group: {
_id: null,
avgDay: { $avg: '$day' }
}
},
{
$project: {
_id: 0,
average: {
day: '$avgDay'
}
}
}
],
function(err, data) {
if (err) {
console.log(err);
return;
}
res.send(data);
next();
});
I cannot seem to figure out the right order for this. Any suggestions?
Really not that sure what you are after here. You say that you want "multiple" averages but that brings up the question of "muliple" over what basis? The average "output" entries over a individual day would be different from the average output entries per month or even per daily average per month. So the scale changes with each selection and is not really a single query for "daily", "monthly" and "yearly"
I would seem that you really was "discrete" totals which would be best approached by first finding the "size" of the output entries and then applying an average per scale:
JobModel.aggregate(
[
{ "$unwind": "$output" },
// Count the array entries on the record
{ "$group": {
"_id": "$_id",
"created": { "$first": "$created" },
"count": { "$sum": 1 }
}},
// Now get the average per day
{ "$group": {
"_id": { "$dayOfYear": "$created" },
"avg": { "$avg": "$count" }
}}
],
function(err,result) {
}
);
Or actually with MongoDB 2.6 and greater you can just use the $size operator on the array:
JobModel.aggregate(
[
// Now get the average per day
{ "$group": {
"_id": { "$dayOfYear": "$created" },
"avg": { "$avg": { "$size": "$output" } }
}}
],
function(err,result) {
}
);
So the logical thing is to run each of those within your required $match range other your aggregation key of either "day", "month" or "year"
You could do something like combining the daily averages per day, with the daily average per month and then daily for year by combining results into arrays, otherwise you would just be throwing items away, which can be alternately done if you "just" wanted the daily average for the year, but as full results:
JobModel.aggregate(
[
// Now get the average per day
{ "$group": {
"_id": {
"year": { "$year": "$created" },
"month": { "$month": "$created" },
"day": { "$dayOfYear": "$created" }
},
"dayAvg": { "$avg": { "$size": "$output" } }
}},
// Group for month
{ "$group": {
"_id": {
"year": "$_id.year",
"month": "$_id.month"
},
"days": {
"$push": {
"day": "$_id.day",
"avg": "$dayAvg"
}
},
"monthAvg": { "$avg": "$dayAvg" }
}},
// Group for the year
{ "$group": {
"_id": "$_id.year",
"daily": { "$avg": "$monthAvg" },
"months": {
"$push": {
"month": "$_id.month",
"daily": "$monthAvg",
"days": "$days"
}
}
}}
],
function(err,result) {
}
);
However you want to apply that, but the main thing missing from your example is finding the "size" or "count" of the original "output" array per document from which to obtain an average.