Related
So the data set looks like this:
screenshot of the data structure
{
"YearWeekISO": "2020-W53",
"FirstDose": 0,
"FirstDoseRefused": "",
"SecondDose": 0,
"DoseAdditional1": 0,
"DoseAdditional2": 0,
"UnknownDose": 0,
"NumberDosesReceived": 0,
"NumberDosesExported": 0,
"Region": "AT",
"Population": "8901064",
"ReportingCountry": "AT",
"TargetGroup": "ALL",
"Vaccine": "JANSS",
"Denominator": 7388778
}, {
"YearWeekISO": "2020-W53",
"FirstDose": 0,
"FirstDoseRefused": "",
"SecondDose": 0,
"DoseAdditional1": 0,
"DoseAdditional2": 0,
"UnknownDose": 8,
"NumberDosesReceived": 0,
"NumberDosesExported": 0,
"Region": "AT",
"Population": "8901064",
"ReportingCountry": "AT",
"TargetGroup": "ALL",
"Vaccine": "UNK",
"Denominator": 7388778
},
link to the data set
The query parameters will look like :
GET /vaccine-summary?c=AT&dateFrom=2020-W10&dateTo=2020-W53&range=5
where
c, country code to get report for
dateFrom, yyyy-Www, eg. 2020-W10 (Including)
dateTo, yyyy-Www, eg, 2020-W20 (Excluding)
rangeSize, number, eg, the period for which to calculate metrics
After applying the aggregation, you should have a transformed data set that looks like :
{
"summary": [{
"weekStart": "2020-W10",
"weekEnd": "2020-W15",
"NumberDosesReceived": 1000
},
{
"weekStart": "2020-W15",
"weekEnd": "2020-W20"
"NumberDosesReceived": 2000
}, …
till end of range(dateTo)
]
}
}
Notice how the weekStart incremental from 2020-W10 to 2020-W15, similar with weekEnd.
NumberDosesReceived is the sum of NumberDosesReceived fileld within that range
So was able to come up with a working solution using a mongo aggregate method called bucket, but one of the problem is that if you want an aggregation of like week 1 - week 20 in chunks of 5, i.e, 1-5 (1 included, 5 excluded), 5- 10,10-15 and 15-20, you will have to give it an array like; boundaries: [1,5,10,15,20] as part of the argument and from the question, i have to create a JS function to return an array of numbers between start week and end week with the range given also. Written in typescript, the return array from this question would look like : [2020-W01,2020-W05,2020-W10,2020-W15,2020-W20], Also there are certain edge cases you have to account for since all the parameters are dynamic, like if the week spans more than one year, also, the fact that mongo to the best of my knowledge don't have date format like "2020-W10" makes it a bit more complex
export function customLoop(
startWeek: number,
endWeek: number,
rangeNum: number,
year: number
): returnData {
const boundaryRange: string[] = [];
let skip = 0;
for (let i = startWeek; i <= endWeek; i += rangeNum) {
const currentNum: string = i < 10 ? `0${i}` : `${i}`;
const result = `${year}-W${currentNum}`;
boundaryRange.push(result);
//if all the weeks in a year, Check where the last loop stops to determine skip
if (endWeek === 53 && i + rangeNum > 53) {
skip = i + rangeNum - 53 - 1;
}
}
return {
skip,
theRange: boundaryRange,
};
}
After this i opened my mongo compass on local to construct and chain aggregate methods and function to satisfy the task given:
const result = await VaccinationModel.aggregate([
{
$match: {
ReportingCountry: c,
},
},
{
$bucket: {
groupBy: "$YearWeekISO",
boundaries: [...boundaryRange],
default: "others",
output: {
NumberDosesReceived: {
$sum: "$NumberDosesReceived",
},
},
},
},
{
$addFields: {
range: rangeNum,
},
},
{
$addFields: {
weekStart: "$_id",
weekEnd: {
$function: {
body: "function(id,range) {\n const arr = id.split('-')\n const year = arr[0];\n let week;\n let result=''\n if(arr[1]){\n week = arr[1].slice(1);\n result=`${year}-W${Number(week) + range}`\n\n }\n\n return result\n }",
args: ["$_id", "$range"],
lang: "js",
},
},
},
},
{
$unset: ["_id", "range"],
},
{
$match: {
weekEnd: {
$ne: "",
},
},
},
{
$sort: {
weekStart: 1,
},
},
]);
In that aggregation:
Match the country code.
I basically called the bucket aggregation with the array of boundaries, then summing the results of each chunk/range using its NumberDosesReceived field while naming it NumberDosesReceived.
since i needed extra two fields to complete the number of fields to return, namely weekStart and weekEnd that isnt in the dataset, the weekStart is the _id field from the bucket aggregation, to get the weekEnd, i added the range as a field.
If for instance the current mongo iteration is 2020-W5, which would be the in the _id, that means the weekend would be 5 + range = 10, so i used the mongo function method to extract that passing _id and range as argument.
Used the unset method to remove the _id and range field as it wouldn't be part of the return data.
Get this new weekEnd field excluding empty ones.
sort using it.
here is the link to the repo: link
It should work
const YearWeekISO = { $toDate: "$YearWeekISO" };
{
$project: {
fiveWeekperiod: {
$subtract: [
{ $week: YearWeekISO },
{ $mod: [{ $week: YearWeekISO }, 5] },
],
},
date: YearWeekISO,
NumberDosesReceived: 1,
},
},
{
$group: {
_id: {
year: { $year: "$date" },
fiveWeek: "$fiveWeekperiod",
},
weekStart: { $min: "$date" },
weekEnd: { $max: "$date" },
NumberDosesReceived: { $sum: "$NumberDosesReceived" },
},
},
{
$project: {
_id: 0,
weekStart: {
$dateToString: {
date: "$weekStart",
format: "%G-W%V",
},
},
weekEnd: {
$dateToString: {
date: {
$dateAdd: {
startDate: "$weekEnd",
unit: "week",
amount: 1,
},
},
format: "%G-W%V",
},
},
NumberDosesReceived: 1,
},
}
I have this example of activities row collection
{
"_id" : ObjectId("5ec90b5258a37c002509b27d"),
"user_hash" : "asdsc4be9fe7xxx",
"type" : "Expense",
"name" : "Lorem",
"amount" : 10000,
"date_created" : 1590233938
}
I'd like to collect the sum amount of the activity with this aggregate code
db.activities.aggregate(
[
{
$group:
{
_id: "$id",
total: { $sum: "$amount" }
}
},
{
$match: { type: "Expense", "user_hash": "asdsc4be9fe7xxx" }
}
]
)
Expected result : {_id: null, total: xxxxx }
Actual result:
Any solution for this? Thank you in Advance
There're 2 problems with your query:
You making the sum aggregation on each individual document instead doing it on the whole collection because you specify _id: "$id", while you need to specify _id: null.
You're performing the match stage in the aggregating after the group stage. But you need to perform it before because after you group the result will be something like:
{
"_id": null,
"total": 15
}
As you can see this object doesn't have any of the fields that the original objects have therefore 0 results will be matched. The order of stages is important because essentially each stage performs some operation based on the result of the previous stage (there're some exceptions when mongodb automatically optimizes stages but different order in these stages doesn't produce different results).
So the query should be:
db.activities.aggregate(
[
{
$match: { type: "Expense", "user_hash": "asdsc4be9fe7xxx" }
},
{
$group:
{
_id: null,
total: { $sum: "$amount" }
}
},
]
)
In my mongDB backend, I have a view that, after multiple aggregation stages, outputs info that looks like this:
{
"_id" : 25k3ejfjyi32132f9z3,
"customer_id" : 15cgrd582950jj493g5,
"openBalance": 24,
// other data...
},
{
"_id" : 35g6ejfjfj32132f8s4,
"customer_id" : 23gtrd684563jj494f4
"openBalance": 20,
// other data...
}
What I need to do, as a last step, is total up all of the "openBalance" amounts for all records, and output that number in a new field along with the other data. So, in other words, based on the above data, I want to return 44 in the a field titled totalOpenBalance.
Is there a way I can handle this aggregation logic in a mongo view? I'm not sure how to do this, because I'm not wanting to add a field to each record returned, but instead return a value based on the total of the records? It would look something like this:
{
"_id" : 25k3ejfjyi32132f9z3,
"customer_id" : 15cgrd582950jj493g5,
"openBalance": 24,
// other data...
},
{
"_id" : 35g6ejfjfj32132f8s4,
"customer_id" : 23gtrd684563jj494f4
"openBalance": 20,
// other data...
},
"totalOpenBalance": 44
If you add the following code to the end of your pipeline
$group: {
_id: null, // do not really group but throw all documents into the same bucket
documents: { $push: "$$ROOT" }, // push each encountered document into the group
totalOpenBalance: { $sum: "$openBalance" } // sum up all "openBalance" values
}
you will get something that you might be able to use:
{
"_id" : null,
"documents" : [
{
"_id" : 25k3ejfjyi32132f9z3,
"customer_id" : 15cgrd582950jj493g5,
"openBalance" : 24
},
{
"_id" : 35g6ejfjfj32132f8s4,
"customer_id" : 23gtrd684563jj494f4,
"openBalance" : 20
}
],
"totalOpenBalance" : 44
}
If you want to go completely crazy which I would not really recommend then read on. By adding the following stages
{
$group: {
_id: null, // do not really group but throw all documents into the same bucket
documents: { $push: "$$ROOT" }, // push each encountered document into the group
totalOpenBalance: { $sum: "$openBalance" } // sum up all "openBalance" values
}
}, {
$project: {
"_id": 0, // remove the "_id" field
"documents": { $concatArrays: [ "$documents", [ { "totalOpenBalance": "$totalOpenBalance" } ] ] } // append a magic subdocument to the the existing documents
}
}, {
$unwind: "$documents" // just so we can flatten the resulting array into separate documents
}, {
$replaceRoot: {
newRoot: "$documents" // and move the content of our documents field to the root
}
}
you get exactly what you asked for:
{
"_id" : 25k3ejfjyi32132f9z3,
"customer_id" : 15cgrd582950jj493g5,
"openBalance" : 24
},
{
"_id" : 35g6ejfjfj32132f8s4,
"customer_id" : 23gtrd684563jj494f4,
"openBalance" : 20
},
{
"totalOpenBalance" : 44
}
This, however, is probably just an overkill...
I am trying to implement a function that collects unread messages from an articles collection. Each article in the collection has a "discussions" entry with discussion comment subdocuments. An example of such a subdocument is:
{
"id": NumberLong(7534),
"user": DBRef("users", ObjectId("...")),
"dt_create": ISODate("2015-01-26T00:10:44Z"),
"content": "The discussion comment content"
}
The parent document has the following (partial) structure:
{
model: {
id: 17676,
title: "Article title",
author: DBRef("users", ObjectId(...)),
// a bunch of other fields here
},
statistics: {
// Statistics will be stored here (pageviews, etc)
},
discussions: [
// Array of discussion subdocuments, like the one above
]
}
Each user also has a last_viewed entry which is a document, an example is as follows:
{
"17676" : "2015-01-10T00:00:00.000Z",
"18038" : "2015-01-10T00:00:00.000Z",
"18242" : "2015-01-20T00:00:00.000Z",
"18325" : "2015-01-20T00:00:00.000Z"
}
This means that the user has looked at discussion comments for the last time on January 10th 2015 for articles with IDs 17676 and 18038, and on January 20th 2015 for articles with IDs 18242 and 18325.
So I want to collect discussion entries from the article documents, and for article with ID 17676, I want to collect the discussion entries that were created after 2015-01-10, and for article with ID 18242, I want to show the discussion entries created after 2015-01-20.
UPDATED
Based on Neil Lunn's reply, the function I have created so far is:
function getUnreadDiscussions(userid) {
user = db.users.findOne({ 'model.id': userid });
last_viewed = [];
for(var i in user.last_viewed) {
last_viewed.push({
'id': parseInt(i),
'dt': user.last_viewed[i]
});
}
result = db.articles.aggregate([
// For now, collect just articles the user has written
{ $match: { 'model.author': DBRef('users', user._id) } },
{ $unwind: '$discussions' },
{ $project: {
'model': '$model',
'discussions': '$discussions',
'last_viewed': {
'$let': {
'vars': { 'last_viewed': last_viewed },
'in': {
'$setDifference': [
{ '$map': {
'input': '$$last_viewed',
'as': 'last_viewed',
'in': {
'$cond': [
{ '$eq': [ '$$last_viewed.id', '$model.id' ] },
'$$last_viewed.dt',
false
]
}
} },
[ false ]
]
}
}
}
}
},
// To get a scalar instead of a 1-element array:
{ $unwind: '$last_viewed' },
// Match only those that were created after last_viewed
{ $match: { 'discussions.dt_create': { $gt: '$last_viewed' } } },
{ $project: {
'model.id': 1,
'model.title': 1,
'discussions': 1,
'last_viewed': 1
} }
]);
return result.toArray();
}
The whole $let thing, and the $unwind after that, transforms the data into the following partial projection (with the last $match commented out):
{
"_id" : ObjectId("54d9af1dca71d8054c8d0ee3"),
"model" : {
"id" : NumberLong(18325),
"title" : "Article title"
},
"discussions" : {
"id" : NumberLong(7543),
"user" : DBRef("users", ObjectId("54d9ae24ca71d8054c8b4567")),
"dt_create" : ISODate("2015-01-26T00:10:44Z"),
"content" : "Some comment here"
},
"last_viewed" : ISODate("2015-01-20T00:00:00Z")
},
{
"_id" : ObjectId("54d9af1dca71d8054c8d0ee3"),
"model" : {
"id" : NumberLong(18325),
"title" : "Article title"
},
"discussions" : {
"id" : NumberLong(7554),
"user" : DBRef("users", ObjectId("54d9ae24ca71d8054c8b4567")),
"dt_create" : ISODate("2015-01-26T02:03:22Z"),
"content" : "Another comment here"
},
"last_viewed" : ISODate("2015-01-20T00:00:00Z")
}
So far so good here. But the problem now is that the $match to select only the discussions created after the last_viewed date is not working. I am getting an empty array response. However, if I hard-code the date and put in $match: { 'discussions.dt_create': { $gt: ISODate("2015-01-20 00:00:00") } }, it works. But I want it to take it from last_viewed.
I found another SO thread where this issue has been resolved by using the $cmp operator.
The final part of the aggregation would be:
[
{ /* $match, $unwind, $project, $unwind as before */ },
{ $project: {
'model': 1,
'discussions': 1,
'last_viewed': 1,
'compare': {
$cmp: [ '$discussions.dt_create', '$last_viewed' ]
}
} },
{ $match: { 'compare': { $gt: 0 } } }
]
The aggregation framework is great, but it takes quite a different approach in problem-solving. Hope this helps anyone!
I'll keep the question unanswered in case anyone else has a better answer/method. If this answer has been upvoted enough times, I'll accept this one.
I have some data that looks like this (not real data):
{
_id:'cust04',
name:'Diarmuid Rellis',
address:'Elysium, Passage East',
county:'Waterford',
phone:'051-345786',
email:'dreil#drarch.com',
quotations:[
{
_id:'quot03',
supplier_ref:'A2006',
date_received: new Date('2013-05-12T00:00:00'),
date_returned: new Date('2013-05-15T00:00:00'),
supplier_price:35000.00,
customer_price:35000.00,
orders:[
{
_id:'ord03',
order_date: new Date('2013-05-20T00:00:00'),
del_date: new Date('2013-08-12T00:00:00'),
del_address:'Elysium, Passage East, Co. Waterford',
status:'BALPAID'
}
]
},
{
_id:'quot04',
supplier_ref:'A2007',
date_received: new Date('2013-08-10T00:00:00'),
date_returned: new Date('2013-08-12T00:00:00'),
supplier_price:29600.00,
customer_price:29600.00,
orders:[
{
_id:'ord04',
order_date: new Date('2014-03-20T00:00:00'),
del_date: new Date('2014-05-12T00:00:00'),
del_address:'Elysium, Passage East, Co. Waterford',
status:'INPROD'
}
]
}
]
}
I am trying to unwind the quotations and orders arrays, and get a projection of all orders in production which include the customer name, supplier_ref and order date for each.
Here is my query:
db.customers.aggregate([
{ $unwind: "$quotations" },
{ $unwind: "$quotations.orders" },
{ $match: { 'quotations.orders.status': 'INPROD' } },
{
$project: {
name: 1,
supplier_ref: "$quotations.supplier_ref",
order_id: "$quotations.orders._id",
order_date: "$quotations.orders.order_date"
}
},
{
$group: {
_id: "$order_id"
}
}
], function (err, results) {
console.log(results);
})
The query runs successfully, but just gives the order ids, not any of the other fields required. What am I missing ?
EDIT
I am hoping for a result like:
"result": [
{
"_id" : "orderid01",
"name" : "Joe Bloggs",
"supplier_ref" : "A1234",
"date_ordered" : "2012-04-14"
},
{
"_id" : "orderid02",
"name" : "Joe Bloggs",
"supplier_ref" : "A1235",
"date_ordered" : "2012-04-16"
}
]
When I add an extra field to my 'group' function, like so:
$group: {
_id: "$order_id",
supplier_ref: "$supplier_ref"
}
I get the error: "the group aggregate field 'supplier_ref' must be defined as an expression inside an object". Do I have to associate it with the result object in some way ?
Removing the group function altogether produced the results I wanted.