Group by Date with Local Time Zone in MongoDB - javascript

I am new to mongodb. Below is my query.
Model.aggregate()
.match({ 'activationId': activationId, "t": { "$gte": new Date(fromTime), "$lt": new Date(toTime) } })
.group({ '_id': { 'date': { $dateToString: { format: "%Y-%m-%d %H", date: "$datefield" } } }, uniqueCount: { $addToSet: "$mac" } })
.project({ "date": 1, "month": 1, "hour": 1, uniqueMacCount: { $size: "$uniqueCount" } })
.exec()
.then(function (docs) {
return docs;
});
The issue is mongodb stores date in iso timezone. I need this data for displaying area chart.
I want to group by date with local time zone. is there any way to add timeoffset into date when group by?

General Problem of Dealing with "local dates"
So there is a short answer to this and a long answer as well. The basic case is that instead of using any of the "date aggregation operators" you instead rather want to and "need to" actually "do the math" on the date objects instead. The primary thing here is to adjust the values by the offset from UTC for the given local timezone and then "round" to the required interval.
The "much longer answer" and also the main problem to consider involves that dates are often subject to "Daylight Savings Time" changes in the offset from UTC at different times of the year. So this means that when converting to "local time" for such aggregation purposes, you really should consider where the boundaries for such changes exist.
There is also another consideration, being that no matter what you do to "aggregate" at a given interval, the output values "should" at least initially come out as UTC. This is good practice since display to "locale" really is a "client function", and as later described, the client interfaces will commonly have a way of displaying in the present locale which will be based on the premise that it was in fact fed data as UTC.
Determining Locale Offset and Daylight Savings
This is generally the main problem that needs to be solved. The general math for "rounding" a date to an interval is the simple part, but there is no real math you can apply to knowing when such boundaries apply, and the rules change in every locale and often every year.
So this is where a "library" comes in, and the best option here in the authors opinion for a JavaScript platform is moment-timezone, which is basically a "superset" of moment.js including all the important "timezeone" features we want to use.
Moment Timezone basically defines such a structure for each locale timezone as:
{
name : 'America/Los_Angeles', // the unique identifier
abbrs : ['PDT', 'PST'], // the abbreviations
untils : [1414918800000, 1425808800000], // the timestamps in milliseconds
offsets : [420, 480] // the offsets in minutes
}
Where of course the objects are much larger with respect to the untils and offsets properties actually recorded. But that is the data you need to access in order to see if there is actually a change in the offset for a zone given daylight savings changes.
This block of the later code listing is what we basically use to determine given a start and end value for a range, which daylight savings boundaries are crossed, if any:
const zone = moment.tz.zone(locale);
if ( zone.hasOwnProperty('untils') ) {
let between = zone.untils.filter( u =>
u >= start.valueOf() && u < end.valueOf()
);
if ( between.length > 0 )
branches = between
.map( d => moment.tz(d, locale) )
.reduce((acc,curr,i,arr) =>
acc.concat(
( i === 0 )
? [{ start, end: curr }] : [{ start: acc[i-1].end, end: curr }],
( i === arr.length-1 ) ? [{ start: curr, end }] : []
)
,[]);
}
Looking at the whole of 2017 for the Australia/Sydney locale the output of this would be:
[
{
"start": "2016-12-31T13:00:00.000Z", // Interval is +11 hours here
"end": "2017-04-01T16:00:00.000Z"
},
{
"start": "2017-04-01T16:00:00.000Z", // Changes to +10 hours here
"end": "2017-09-30T16:00:00.000Z"
},
{
"start": "2017-09-30T16:00:00.000Z", // Changes back to +11 hours here
"end": "2017-12-31T13:00:00.000Z"
}
]
Which basically reveals that between the first sequence of dates the offset would be +11 hours then changes to +10 hours between the dates in the second sequence and then switches back to +11 hours for the interval covering to the end of the year and the specified range.
This logic then needs to be translated into a structure that will be understood by MongoDB as part of an aggregation pipeline.
Applying the Math
The mathematical principle here for aggregating to any "rounded date interval" essentially relies on using the milliseconds value of the represented date which is "rounded" down to the nearest number representing the "interval" required.
You essentially do this by finding the "modulo" or "remainder" of the current value applied to the required interval. Then you "subtract" that remainder from the current value which returns a value at the nearest interval.
For example, given the current date:
var d = new Date("2017-07-14T01:28:34.931Z"); // toValue() is 1499995714931 millis
// 1000 millseconds * 60 seconds * 60 minutes = 1 hour or 3600000 millis
var v = d.valueOf() - ( d.valueOf() % ( 1000 * 60 * 60 ) );
// v equals 1499994000000 millis or as a date
new Date(1499994000000);
ISODate("2017-07-14T01:00:00Z")
// which removed the 28 minutes and change to nearest 1 hour interval
This is the general math we also need to apply in the aggregation pipeline using the $subtract and $mod operations, which are the aggregation expressions used for the same math operations shown above.
The general structure of the aggregation pipeline is then:
let pipeline = [
{ "$match": {
"createdAt": { "$gte": start.toDate(), "$lt": end.toDate() }
}},
{ "$group": {
"_id": {
"$add": [
{ "$subtract": [
{ "$subtract": [
{ "$subtract": [ "$createdAt", new Date(0) ] },
switchOffset(start,end,"$createdAt",false)
]},
{ "$mod": [
{ "$subtract": [
{ "$subtract": [ "$createdAt", new Date(0) ] },
switchOffset(start,end,"$createdAt",false)
]},
interval
]}
]},
new Date(0)
]
},
"amount": { "$sum": "$amount" }
}},
{ "$addFields": {
"_id": {
"$add": [
"$_id", switchOffset(start,end,"$_id",true)
]
}
}},
{ "$sort": { "_id": 1 } }
];
The main parts here you need to understand is the conversion from a Date object as stored in MongoDB to Numeric representing the internal timestamp value. We need the "numeric" form, and to do this is a trick of math where we subtract one BSON Date from another which yields the numeric difference between them. This is exactly what this statement does:
{ "$subtract": [ "$createdAt", new Date(0) ] }
Now we have a numeric value to deal with, we can apply the modulo and subtract that from the numeric representation of the date in order to "round" it. So the "straight" representation of this is like:
{ "$subtract": [
{ "$subtract": [ "$createdAt", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$createdAt", new Date(0) ] },
( 1000 * 60 * 60 * 24 ) // 24 hours
]}
]}
Which mirrors the same JavaScript math approach as shown earlier but applied to the actual document values in the aggregation pipeline. You will also note the other "trick" there where we apply an $add operation with another representation of a BSON date as of epoch ( or 0 milliseconds ) where the "addition" of a BSON Date to a "numeric" value, returns a "BSON Date" representing the milliseconds it was given as input.
Of course the other consideration in the listed code it the actual "offset" from UTC which is adjusting the numeric values in order to ensure the "rounding" takes place for the present timezone. This is implemented in a function based on the earlier description of finding where the different offsets occur, and returns a format as usable in an aggregation pipeline expression by comparing the input dates and returning the correct offset.
With the full expansion of all the details, including the generation of handling those different "Daylight Savings" time offsets would then be like:
[
{
"$match": {
"createdAt": {
"$gte": "2016-12-31T13:00:00.000Z",
"$lt": "2017-12-31T13:00:00.000Z"
}
}
},
{
"$group": {
"_id": {
"$add": [
{
"$subtract": [
{
"$subtract": [
{
"$subtract": [
"$createdAt",
"1970-01-01T00:00:00.000Z"
]
},
{
"$switch": {
"branches": [
{
"case": {
"$and": [
{
"$gte": [
"$createdAt",
"2016-12-31T13:00:00.000Z"
]
},
{
"$lt": [
"$createdAt",
"2017-04-01T16:00:00.000Z"
]
}
]
},
"then": -39600000
},
{
"case": {
"$and": [
{
"$gte": [
"$createdAt",
"2017-04-01T16:00:00.000Z"
]
},
{
"$lt": [
"$createdAt",
"2017-09-30T16:00:00.000Z"
]
}
]
},
"then": -36000000
},
{
"case": {
"$and": [
{
"$gte": [
"$createdAt",
"2017-09-30T16:00:00.000Z"
]
},
{
"$lt": [
"$createdAt",
"2017-12-31T13:00:00.000Z"
]
}
]
},
"then": -39600000
}
]
}
}
]
},
{
"$mod": [
{
"$subtract": [
{
"$subtract": [
"$createdAt",
"1970-01-01T00:00:00.000Z"
]
},
{
"$switch": {
"branches": [
{
"case": {
"$and": [
{
"$gte": [
"$createdAt",
"2016-12-31T13:00:00.000Z"
]
},
{
"$lt": [
"$createdAt",
"2017-04-01T16:00:00.000Z"
]
}
]
},
"then": -39600000
},
{
"case": {
"$and": [
{
"$gte": [
"$createdAt",
"2017-04-01T16:00:00.000Z"
]
},
{
"$lt": [
"$createdAt",
"2017-09-30T16:00:00.000Z"
]
}
]
},
"then": -36000000
},
{
"case": {
"$and": [
{
"$gte": [
"$createdAt",
"2017-09-30T16:00:00.000Z"
]
},
{
"$lt": [
"$createdAt",
"2017-12-31T13:00:00.000Z"
]
}
]
},
"then": -39600000
}
]
}
}
]
},
86400000
]
}
]
},
"1970-01-01T00:00:00.000Z"
]
},
"amount": {
"$sum": "$amount"
}
}
},
{
"$addFields": {
"_id": {
"$add": [
"$_id",
{
"$switch": {
"branches": [
{
"case": {
"$and": [
{
"$gte": [
"$_id",
"2017-01-01T00:00:00.000Z"
]
},
{
"$lt": [
"$_id",
"2017-04-02T03:00:00.000Z"
]
}
]
},
"then": -39600000
},
{
"case": {
"$and": [
{
"$gte": [
"$_id",
"2017-04-02T02:00:00.000Z"
]
},
{
"$lt": [
"$_id",
"2017-10-01T02:00:00.000Z"
]
}
]
},
"then": -36000000
},
{
"case": {
"$and": [
{
"$gte": [
"$_id",
"2017-10-01T03:00:00.000Z"
]
},
{
"$lt": [
"$_id",
"2018-01-01T00:00:00.000Z"
]
}
]
},
"then": -39600000
}
]
}
}
]
}
}
},
{
"$sort": {
"_id": 1
}
}
]
That expansion is using the $switch statement in order to apply the date ranges as conditions to when to return the given offset values. This is the most convenient form since the "branches" argument does correspond directly to an "array", which is the most convenient output of the "ranges" determined by examination of the untils representing the offset "cut-points" for the given timezone on the supplied date range of the query.
It is possible to apply the same logic in earlier versions of MongoDB using a "nested" implementation of $cond instead, but it is a little messier to implement, so we are just using the most convenient method in implementation here.
Once all of those conditions are applied, the dates "aggregated" are actually those representing the "local" time as defined by the supplied locale. This actually brings us to what the final aggregation stage is, and the reason why it is there as well as the later handling as demonstrated in the listing.
End Results
I did mention earlier that the general recommendation is that the "output" should still return the date values in UTC format of at least some description, and therefore that is exactly what the pipeline here is doing by first converting "from" UTC to local by applying the offset when "rounding", but then the final numbers "after the grouping" are re-adjusted back by the same offset that applies to the "rounded" date values.
The listing here gives "three" different output possibilities here as:
// ISO Format string from JSON stringify default
[
{
"_id": "2016-12-31T13:00:00.000Z",
"amount": 2
},
{
"_id": "2017-01-01T13:00:00.000Z",
"amount": 1
},
{
"_id": "2017-01-02T13:00:00.000Z",
"amount": 2
}
]
// Timestamp value - milliseconds from epoch UTC - least space!
[
{
"_id": 1483189200000,
"amount": 2
},
{
"_id": 1483275600000,
"amount": 1
},
{
"_id": 1483362000000,
"amount": 2
}
]
// Force locale format to string via moment .format()
[
{
"_id": "2017-01-01T00:00:00+11:00",
"amount": 2
},
{
"_id": "2017-01-02T00:00:00+11:00",
"amount": 1
},
{
"_id": "2017-01-03T00:00:00+11:00",
"amount": 2
}
]
The one thing of note here is that for a "client" such as Angular, every single one of those formats would be accepted by it's own DatePipe which can actually do the "locale format" for you. But it depends on where the data is supplied to. "Good" libraries will be aware of using a UTC date in the present locale. Where that is not the case, then you might need to "stringify" yourself.
But it is a simple thing, and you get the most support for this by using a library which essentially bases it's manipulation of output from a "given UTC value".
The main thing here is to "understand what you are doing" when you ask such a thing as aggregating to a local time zone. Such a process should consider:
The data can be and often is viewed from the perspective of people within different timezones.
The data is generally provided by people in different timezones. Combined with point 1, this is why we store in UTC.
Timezones are often subject to a changing "offset" from "Daylight Savings Time" in many of the world timezones, and you should account for that when analyzing and processing the data.
Regardless of aggregation intervals, output "should" in fact remain in UTC, albeit adjusted to aggregate on interval according to the locale provided. This leaves presentation to be delegated to a "client" function, just as it should.
As long as you keep those things in mind and apply just like the listing here demonstrates, then you are doing all the right things for dealing with aggregation of dates and even general storage with respect to a given locale.
So you "should" be doing this, and what you "should not" be doing is giving up and simply storing the "locale date" as a string. As described, that would be a very incorrect approach and causes nothing but further problems for your application.
NOTE: The one topic I do not touch on here at all is aggregating to a "month" ( or indeed "year" ) interval. "Months" are the mathematical anomaly in the whole process since the number of days always varies and thus requires a whole other set of logic in order to apply. Describing that alone is at least as long as this post, and therefore would be another subject. For general minutes, hours, and days which is the common case, the math here is "good enough" for those cases.
Full Listing
This serves as a "demonstration" to tinker with. It employs the required function to extract the offset dates and values to be included and runs an aggregation pipeline over the supplied data.
You can change anything in here, but will probably start with the locale and interval parameters, and then maybe add different data and different start and end dates for the query. But the rest of the code need not be changed to simply make changes to any of those values, and can therefore demonstrate using different intervals ( such as 1 hour as asked in the question ) and different locales.
For instance, once supplying valid data which would actually require aggregation at a "1 hour interval" then the line in the listing would be changed as:
const interval = moment.duration(1,'hour').asMilliseconds();
In order to define a milliseconds value for the aggregation interval as required by the aggregation operations being performed on the dates.
const moment = require('moment-timezone'),
mongoose = require('mongoose'),
Schema = mongoose.Schema;
mongoose.Promise = global.Promise;
mongoose.set('debug',true);
const uri = 'mongodb://localhost/test',
options = { useMongoClient: true };
const locale = 'Australia/Sydney';
const interval = moment.duration(1,'day').asMilliseconds();
const reportSchema = new Schema({
createdAt: Date,
amount: Number
});
const Report = mongoose.model('Report', reportSchema);
function log(data) {
console.log(JSON.stringify(data,undefined,2))
}
function switchOffset(start,end,field,reverseOffset) {
let branches = [{ start, end }]
const zone = moment.tz.zone(locale);
if ( zone.hasOwnProperty('untils') ) {
let between = zone.untils.filter( u =>
u >= start.valueOf() && u < end.valueOf()
);
if ( between.length > 0 )
branches = between
.map( d => moment.tz(d, locale) )
.reduce((acc,curr,i,arr) =>
acc.concat(
( i === 0 )
? [{ start, end: curr }] : [{ start: acc[i-1].end, end: curr }],
( i === arr.length-1 ) ? [{ start: curr, end }] : []
)
,[]);
}
log(branches);
branches = branches.map( d => ({
case: {
$and: [
{ $gte: [
field,
new Date(
d.start.valueOf()
+ ((reverseOffset)
? moment.duration(d.start.utcOffset(),'minutes').asMilliseconds()
: 0)
)
]},
{ $lt: [
field,
new Date(
d.end.valueOf()
+ ((reverseOffset)
? moment.duration(d.start.utcOffset(),'minutes').asMilliseconds()
: 0)
)
]}
]
},
then: -1 * moment.duration(d.start.utcOffset(),'minutes').asMilliseconds()
}));
return ({ $switch: { branches } });
}
(async function() {
try {
const conn = await mongoose.connect(uri,options);
// Data cleanup
await Promise.all(
Object.keys(conn.models).map( m => conn.models[m].remove({}))
);
let inserted = await Report.insertMany([
{ createdAt: moment.tz("2017-01-01",locale), amount: 1 },
{ createdAt: moment.tz("2017-01-01",locale), amount: 1 },
{ createdAt: moment.tz("2017-01-02",locale), amount: 1 },
{ createdAt: moment.tz("2017-01-03",locale), amount: 1 },
{ createdAt: moment.tz("2017-01-03",locale), amount: 1 },
]);
log(inserted);
const start = moment.tz("2017-01-01", locale)
end = moment.tz("2018-01-01", locale)
let pipeline = [
{ "$match": {
"createdAt": { "$gte": start.toDate(), "$lt": end.toDate() }
}},
{ "$group": {
"_id": {
"$add": [
{ "$subtract": [
{ "$subtract": [
{ "$subtract": [ "$createdAt", new Date(0) ] },
switchOffset(start,end,"$createdAt",false)
]},
{ "$mod": [
{ "$subtract": [
{ "$subtract": [ "$createdAt", new Date(0) ] },
switchOffset(start,end,"$createdAt",false)
]},
interval
]}
]},
new Date(0)
]
},
"amount": { "$sum": "$amount" }
}},
{ "$addFields": {
"_id": {
"$add": [
"$_id", switchOffset(start,end,"$_id",true)
]
}
}},
{ "$sort": { "_id": 1 } }
];
log(pipeline);
let results = await Report.aggregate(pipeline);
// log raw Date objects, will stringify as UTC in JSON
log(results);
// I like to output timestamp values and let the client format
results = results.map( d =>
Object.assign(d, { _id: d._id.valueOf() })
);
log(results);
// Or use moment to format the output for locale as a string
results = results.map( d =>
Object.assign(d, { _id: moment.tz(d._id, locale).format() } )
);
log(results);
} catch(e) {
console.error(e);
} finally {
mongoose.disconnect();
}
})()

November 2017 saw the release of MongoDB v3.6, which included timezone-aware date aggregation operators. I would encourage anyone reading this to put them to use rather than rely on client-side date manipulation, as demonstrated in Neil's answer, particularly because it is way easier to read and understand.
Depending on the requirements, different operators might come in handy, but I've found $dateToParts to be the most universal/generic. Here's a basic demonstration using OP's example:
project({
dateParts: {
// This will split the date stored in `dateField` into parts
$dateToParts: {
date: "$dateField",
// This can be an Olson timezone, such as Europe/London, or
// a fixed offset, such as +0530 for India.
timezone: "+05:30"
}
}
})
.group({
_id: {
// Here we group by hour! Using these date parts grouping
// by hour/day/month/etc. is trivial - start with the year
// and add every unit greater than or equal to the target
// unit.
year: "$dateParts.year",
month: "$dateParts.month",
day: "$dateParts.day",
hour: "$dateParts.hour"
},
uniqueCount: {
$addToSet: "$mac"
}
})
.project({
_id: 0,
year: "$_id.year",
month: "$_id.month",
day: "$_id.day",
hour: "$_id.hour",
uniqueMacCount: { $size: "$uniqueCount" }
});
Alternatively, one might wish to assemble the date parts back to a date object. This is also very simple with the inverse $dateFromParts operator:
project({
_id: 0,
date: {
$dateFromParts: {
year: "$_id.year",
month: "$_id.month",
day: "$_id.day",
hour: "$_id.hour",
timezone: "+05:30"
}
},
uniqueMacCount: { $size: "$uniqueCount" }
})
The great thing here is that all the underlying dates remain in UTC and any returned dates are also in UTC.
Unfortunately, it seems that grouping by more unusual arbitrary ranges, such as half-day, might be harder. I haven't given it much thought however.

Maybe this will help someone coming to this question.
There is property "timezone" in $dateToString object.
For example:
$dateToString: { format: "%Y-%m-%d %H", date: "$datefield", timezone: "Europe/London" }

Related

Querying inside a subdocument array in mongodb

Am really new to MongoDB or NoSQL database.
I have this userSchema schema
const postSchema = {
title: String,
posted_on: Date
}
const userSchema = {
name: String,
posts: [postSchema]
}
I want to retrieve the posts by a user in given range(/api/users/:userId/posts?from=date&to=date&limit=limit) using mongodb query. In a relational database, we generally create two different sets of tables and query the second table(posts) using some condition and get the required result.
How can we achieve the same in mongodb? I have tried using $elemMatch by referring this but it doesn't seem to work.
2 ways to do it with aggregation framework, that can do much more than a find can do.
With find we mostly select documents from a collection, or project to keep some fields from a document that is selected, but here you need only some members of an array, so aggregation is used.
Local way (solution at document level) no unwind etc
Test code here
Query
filter the array and keep only posted_on >1 and <4
(i used numbers fro simplicity use dates its the same)
take the first 2 elements of the array (limit 2)
db.collection.aggregate([
{
"$match": {
"name": {
"$eq": "n1"
}
}
},
{
"$set": {
"posts": {
"$slice": [
{
"$filter": {
"input": "$posts",
"cond": {
"$and": [
{
"$gt": [
"$$this.posted_on",
1
]
},
{
"$lt": [
"$$this.posted_on",
5
]
}
]
}
}
},
2
]
}
}
}
])
Uwind solution (solution at collection level)
(its smaller a bit, but keeping things local is better, but in your case it doesn't matter)
Test code here
Query
match user
unwind the array, and make each member to be ROOT
match the dates >1 <4
limit 2
db.collection.aggregate([
{
"$match": {
"name": {
"$eq": "n1"
}
}
},
{
"$unwind": {
"path": "$posts"
}
},
{
"$replaceRoot": {
"newRoot": "$posts"
}
},
{
"$match": {
"$and": [
{
"posted_on": {
"$gt": 1
}
},
{
"posted_on": {
"$lt": 5
}
}
]
}
},
{
"$limit": 2
}
])

Elasticsearch - find closest number when scoring results

I need a way to match the closest number of an elasticsearch document.
I'm wanting to use elastic search to filter quantifiable attributes and have been able to achieve hard limits using range queries accept that results that are outside of that result set are skipped. I would prefer to have the closest results to multiple filters match.
const query = {
query: {
bool: {
should: [
{
range: {
gte: 5,
lte: 15
}
},
{
range: {
gte: 1979,
lte: 1989
}
}
]
}
}
}
const results = await client.search({
index: 'test',
body: query
})
Say I had some documents that had year and sales. In the snippet is a little example of how it would be done in javascript. It runs through the entire list and calculates a score, then based on that score it sorts them, at no point are results filtered out, they are just organized by relevance.
const data = [
{ "item": "one", "year": 1980, "sales": 20 },
{ "item": "two", "year": 1982, "sales": 12 },
{ "item": "three", "year": 1986, "sales": 6 },
{ "item": "four", "year": 1989, "sales": 4 },
{ "item": "five", "year": 1991, "sales": 6 }
]
const add = (a, b) => a + b
const findClosestMatch = (filters, data) => {
const scored = data.map(item => ({
...item,
// add the score to a copy of the data
_score: calculateDifferenceScore(filters, item)
}))
// mutate the scored array by sorting it
scored.sort((a, b) => a._score.total - b._score.total)
return scored
}
const calculateDifferenceScore = (filters, item) => {
const result = Object.keys(filters).reduce((acc, x) => ({
...acc,
// calculate the absolute difference between the filter and data point
[x]: Math.abs(filters[x] - item[x])
}), {})
// sum the total diffences
result.total = Object.values(result).reduce(add)
return result
}
console.log(
findClosestMatch({ sales: 10, year: 1984 }, data)
)
<script src="https://codepen.io/synthet1c/pen/KyQQmL.js"></script>
I'm trying to achieve the same thing in elasticsearch but having no luck when using a function_score query. eg
const query = {
query: {
function_score: {
functions: [
{
linear: {
"year": {
origin: 1984,
},
"sales": {
origin: 10,
}
}
}
]
}
}
}
const results = await client.search({
index: 'test',
body: query
})
There is no text to search, I'm using it for filtering by numbers only, am I doing something wrong or is this not what elastic search is made for and are there any better alternatives?
Using the above every document still has a default score, and I have not been able to get any filter to apply any modifiers to the score.
Thanks for any help, I new to elasticsearch links to articles or areas of the documentation are appreciated!
You had the right idea, you're just missing a few fields in your query to make it work.
It should look like this:
{
"query": {
function_score: {
functions: [
{
linear: {
"year": {
origin: 1984,
scale: 1,
decay: 0.999
},
"sales": {
origin: 10,
scale: 1,
decay: 0.999
}
}
},
]
}
}
}
The scale field is mandatory as it tells elastic how to decay the score, without it the query just fails.
The decay field is not mandatory, however without it elastic does not really know how to calculate the new score to documents so it will end up giving a default score only to documents in the range of origin + scale which is not useful for us.
source docs.
I also recommend you limit the result size to 1 if you want the top scoring document, otherwise you'll have add a sort phase (either in elastic or in code).
EDIT: (AVOID NULLS)
You can add a filter above the functions like so:
{
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"bool": {
"filter": [
{
"bool": {
"must": [
{
"exists": {
"field": "year"
}
},
{
"exists": {
"field": "sales"
}
},
]
}
}
]
}
},
{
"match_all": {}
}
]
}
},
"functions": [
{
"linear": {
"year": {
"origin": 1999,
"scale": 1,
"decay": 0.999
},
"sales": {
"origin": 50,
"scale": 1,
"decay": 0.999
}
}
}
]
}
}
}
Notice i have a little hack going on using match_all query, this is due to filter query setting the score to 0 so by using the match all query i reset it back to 1 for all matched documents.
This can also be achieved in a more "proper" way by altering the functions, a path i choose not to take.

How to calculate difference of lowest date and highest date among array of sub documents?

I have the following sub-documents:
experiences: [
{
"workExperienceId" : ObjectId("59f8064e68d1f61441bec94a"),
"workType" : "Full Time",
"functionalArea" : "Law",
"company" : "Company A",
"title" : "new",
"from" : ISODate("2010-10-13T00:00:00.000Z"),
"to" : ISODate("2012-10-13T00:00:00.000Z"),
"_id" : ObjectId("59f8064e68d1f61441bec94b"),
"currentlyWorking" : false
},
...
...
{
"workExperienceId" : ObjectId("59f8064e68d1f61441bec94a"),
"workType" : "Full Time",
"functionalArea" : "Law",
"company" : "Company A",
"title" : "new",
"from" : ISODate("2014-10-14T00:00:00.000Z"),
"to" : ISODate("2015-12-13T00:00:00.000Z"),
"_id" : ObjectId("59f8064e68d1f61441bec94c"),
"currentlyWorking" : false
},
{
"workExperienceId" : ObjectId("59f8064e68d1f61441bec94a"),
"workType" : "Full Time",
"functionalArea" : "Law",
"company" : "Company A",
"title" : "new",
"from" : ISODate("2017-10-13T00:00:00.000Z"),
"to" : null,
"_id" : ObjectId("59f8064e68d1f61441bec94d"),
"currentlyWorking" : true
},
{
"workExperienceId" : ObjectId("59f8064e68d1f61441bec94a"),
"workType" : "Full Time",
"functionalArea" : "Law",
"company" : "Company A",
"title" : "new",
"from" : ISODate("2008-10-14T00:00:00.000Z"),
"to" : ISODate("2009-12-13T00:00:00.000Z"),
"_id" : ObjectId("59f8064e68d1f61441bec94c"),
"currentlyWorking" : false
},
]
As you see, there may not be date ordered within sequential date and maybe a non ordered date. Above data is for each user. So what I want is to get total years of experience for each user in year format. When to field is null and currentlyWorking is true then it means that I am currently working on that company.
Aggregation
Using the aggregation framework you could apply $indexOfArray where you have it available:
Model.aggregate([
{ "$addFields": {
"difference": {
"$subtract": [
{ "$cond": [
{ "$eq": [{ "$indexOfArray": ["$experiences.to", null] }, -1] },
{ "$max": "$experiences.to" },
new Date()
]},
{ "$min": "$experiences.from" }
]
}
}}
])
Failing that as long as the "latest" is always the last in the array, using $arrayElemAt:
Model.aggregate([
{ "$addFields": {
"difference": {
"$subtract": [
{ "$cond": [
{ "$eq": [{ "$arrayElemAt": ["$experiences.to", -1] }, null] },
new Date(),
{ "$max": "$experiences.to" }
]},
{ "$min": "$experiences.from" }
]
}
}}
])
That's pretty much the most efficient ways to do this, as a single pipeline stage applying $min and $max operators. For $indexOfArray you would need MongoDB 3.4 at least, and for simply using $arrayElemAt you can have MongoDB 3.2, which is the minimal version you should be running in production environments anyway.
One pass, means it gets done fast with little overhead.
The brief parts are the $min and $max allow you to extract the appropriate values directly from the array elements, being the "smallest" value of "from" and the largest value of "to" within the array. Where available the $indexOfArray operator can return the matched index from a provided array ( in this case from "to" values ) where a specified value ( as null here ) exists. If it's there the index of that value is returned, and where it is not the value of -1 is returned indicating that it is not found.
We use $cond which is a "ternary" or if..then..else operator to determine that when the null is not found then you want the $max value from "to". Of course when it is found this is the else where the value of the current Date which is fed into the aggregation pipeline as an external parameter on execution is returned instead.
The alternate case for a MongoDB 3.2 is that you instead "presume" the last element of your array is the most recent employment history item. In generally would be best practice to order these items so the most recent was either the "last" ( as seems to be indicated in your question ) or the "first" entry of the array. It would be logical to keep these entries in such order as opposed to relying on sorting the list at runtime.
So when using a "known" position such as "last", we can use the $arrayElemAt operator to return the value from the array at the specified position. Here it is -1 for the "last" element. The "first" element would be 0, and could arguably be applied to geting the "smallest" value of "from" as well, since you should have your array in order. Again $cond is used to transpose the values depending on whether null is returned. As an alternate to $max you can even use $ifNull to swap the values instead:
Model.aggregate([
{ "$addFields": {
"difference": {
"$subtract": [
{ "$ifNull": [{ "$arrayElemAt": ["$experiences.to", -1] }, new Date()] },
{ "$min": "$experiences.from" }
]
}
}}
])
That operator essentially switches out the values returned if the response of the first condition is null. So since we are grabbing the value from the "last" element already, we can "presume" that this does mean the "largest" value of "to".
The $subtract is what actually returns the "difference", since when you "subtract" one date from another the difference is returned as the milliseconds value between the two. This is how BSON Dates are actually internally stored, and it's the common internal date storage of date formats being the "milliseconds since epoch".
If you want the interval in a specific duration such as "years", then it's a simple matter of applying the "date math" to change from the milliseconds difference between the date values. So adjust by dividing out from the interval ( also showing $arrayElemAt on the "from" just for completeness ):
Model.aggregate([
{ "$addFields": {
"difference": {
"$floor": {
"$divide": [
{ "$subtract": [
{ "$ifNull": [{ "$arrayElemAt": ["$experiences.to", -1] }, new Date()] },
{ "$arrayElemAt": ["$experiences.from", 0] }
]},
1000 * 60 * 60 * 24 * 365
]
}
}
}}
])
That uses $divide as a math operator and 1000 milliseconds 60 for each of seconds and minutes, 24 hours and 365 days as the divisor value. The $floor "rounds down" the number from decimal places. You can do whatever you want there, but it "should" be used "inline" and not in separate stages, which simply add to processing overhead.
Of course, the presumption of 365 days is an "approximation" at best. If you want something more complete, then you can instead apply the date aggregation operators to the values to get a more accurate reading. So here, also applying $let to declare as "variables" for later manipulation:
Model.aggregate([
{ "$addFields": {
"difference": {
"$let": {
"vars": {
"to": { "$ifNull": [{ "$arrayElemAt": ["$experiences.to", -1] }, new Date()] },
"from": { "$arrayElemAt": ["$experiences.from", 0] }
},
"in": {
"years": {
"$subtract": [
{ "$subtract": [
{ "$year": "$$to" },
{ "$year": "$$from" }
]},
{ "$cond": {
"if": { "$gt": [{ "$month": "$$to" },{ "$month": "$$from" }] },
"then": 0,
"else": 1
}}
]
},
"months": {
"$add": [
{ "$subtract": [
{ "$month": "$$to" },
{ "$month": "$$from" }
]},
{ "$cond": {
"if": { "$gt": [{ "$month": "$$to" },{ "$month": "$$from" }] },
"then": 0,
"else": 12
}}
]
},
"days": {
"$add": [
{ "$subtract": [
{ "$dayOfYear": "$$to" },
{ "$dayOfYear": "$$from" }
]},
{ "$cond": {
"if": { "$gt": [{ "$month": "$$to" },{ "$month": "$$from" }] },
"then": 0,
"else": 365
}}
]
}
}
}
}
}}
])
Again that's a slight approximation on the days of the year. MongoDB 3.6 actually would allow you to test the "leap year" by implementing $dateFromParts to determine if 29th February was valid in the current year or not by assembling from the "pieces" we have available.
Work with returned data
Of course all the above is using the aggregation framework to determine the intervals from the array for each person. This would be the advised course if you were intending to "reduce" the data returned by essentially not returning the array items at all, or if you wanted these numbers for further aggregation in reporting to a larger "sum" or "average" statistic from the data.
If on the other hand you actually do want all the data returned for the person including the complete "experiences" array, then it's probably the best course of action to simply apply the calculations "after" all the data is returned from the server as you process each item returned.
The simple application of this would be to "merge" a new field into the results, just like $addFields does but on the "client" side instead:
Model.find().lean().cursor().map( doc =>
Object.assign(doc, {
"difference":
((doc.experiences.map(e => e.to).indexOf(null) === -1)
? Math.max.apply(null, doc.experiences.map(e => e.to))
: new Date() )
- Math.min.apply(null, doc.experiences.map(e => e.from)
})
).toArray((err, result) => {
// do something with result
})
That's just applying the same logic represented in the first aggregation example to a "client" side processing of the result cursor. Since you are using mongoose, the .cursor() method actually returns us a Cursor object from the underlying driver, of which mongoose normally hides away for "convenience". Here we want it because it gives us access to some handy methods.
The Cursor.map() is one such handy method which allows use to apply a "transform" on the content returned from the server. Here we use Object.assign() to "merge" a new property to the returned document. We could alternately use Array.map() on the "array" returned by mongoose by "default", but processing inline looks a little cleaner, as well as being a bit more efficient.
In fact Array.map() is the main tool here in manipulation since where we applied statements like "$experiences.to" in the aggregation statement, we apply on the "client" using doc.experiences.map(e => e.to), which does the same thing "transforming" the array of objects into an "array of values" for the specified field instead.
This allows the same checking using Array.indexOf() against the array of values, and also the Math.min() and Math.max() are used in the same way, implementing apply() to use those "mapped" array values as the argument values to the functions.
Finally of course since we still have a Cursor being returned, we convert this back into the more typical form you would work with mongoose results as an "array" using Cursor.toArray(), which is exactly what mongoose does "under the hood" for you on it's default requests.
The Query.lean() us a mongoose modifier which basically says to return and expect "plain JavaScript Objects" as opposed to "mongoose documents" matched to the schema with applied methods that are again the default return. We want that because we are "manipulating" the result. Again the alternate is to do the manipulation "after" the default array is returned, and convert via .toObject() which is present on all mongoose documents, in the event that "serializing virtual properties" is important to you.
So this is essentially a "mirror" of that first aggregation approach, yet applied to "client side" logic instead. As stated, it generally makes more sense to do it this way when you actually want ALL of the properties in the document in results anyway. The simple reason being that it makes no real since to add "additional" data to the results returned "before" you return those from the server. So instead, simply apply the transform "after" the database returns them.
Also much like above, the same client transformation approaches can be applied as was demonstrated in ALL the aggregation examples. You can even employ external libraries for date manipulation which give you "helpers" for some of the "raw math" approaches here.
you can achieve this with the aggregation framework like this:
db.collection.aggregate([
{
$unwind:"$experiences"
},
{
$sort:{
"experiences.from":1
}
},
{
$group:{
_id:null,
"from":{
$first:"$experiences.from"
},
"to":{
$last:{
$ifNull:[
"$to",
new Date()
]
}
}
}
},
{
$project:{
"diff":{
$subtract:[
"$to",
"$from"
]
}
}
}
])
This returns:
{ "_id" : null, "diff" : NumberLong("65357827142") }
Which is the difference in ms between the two dates, see $subtract for details
You can get the year by adding this additional stage to the end of the pipeline:
{
$project:{
"year":{
$floor:{
$divide:[
"$diff",
1000*60*60*24*365
]
}
}
}
}
This would then return:
{ "_id" : null, "year" : 2 }

Integrate between two collections

I have two collections:
'DBVisit_DB':
"_id" : ObjectId("582bc54958f2245b05b455c6"),
"visitEnd" : NumberLong(1479252157766),
"visitStart" : NumberLong(1479249815749),
"fuseLocation" : {.... }
"userId" : "A926D9E4853196A98D1E4AC6006DAF00#1927cc81cfcf7a467e9d4f4ac7a1534b",
"modificationTimeInMillis" : NumberLong(1479263563107),
"objectId" : "C4B4CE9B-3AF1-42BC-891C-C8ABB0F8DC40",
"creationTime" : NumberLong(1479252167996),
"lastUserInteractionTime" : NumberLong(1479252167996)
}
'device_data':
"_id" : { "$binary" : "AN6GmE7Thi+Sd/dpLRjIilgsV/4AAAg=", "$type" : "00" },
"auditVersion" : "1.0",
"currentTime" : NumberLong(1479301118381),
"data" : {
"networkOperatorName" : "Cellcom",...
},
"timezone" : "Asia/Jerusalem",
"collectionAlias" : "DEVICE_DATA",
"shortDate" : 17121,
"userId" : "00DE86984ED3862F9277F7692D18C88A#1927cc81cfcf7a467e9d4f4ac7a1534b"
In DBVisit_DB I need to show all visits only for Cellcom users which took more than 1 hour. (visitEnd - visitStart > 1 hour). by matching the userId value in both the collection.
this is what I did so far:
//create an array that contains all the rows that "Cellcom" is their networkOperatorName
var users = db.device_data.find({ "data.networkOperatorName": "Cellcom" },{ userId: 1, _id: 0}).toArray();
//create an array that contains all the rows that the visit time is more then one hour
var time = db.DBVisit_DB.find( { $where: function() {
timePassed = new Date(this.visitEnd - this.visitStart).getHours();
return timePassed > 1}},
{ userId: 1, _id: 0, "visitEnd" : 1, "visitStart":1} ).toArray();
//merge between the two arrays
var result = [];
var i, j;
for (i = 0; i < time; i++) {
for (j = 0; j < users; j++) {
if (time[i].userId == users[j].userId) {
result.push(time[i]);
}
}
}
for (var i = 0; i < result.length; i++) {
print(result[i].userId);
}
but it doesn't show anything although I know for sure that there is id's that can be found in both the array I created.
*for verification: I'm not 100% sure that I calculated the visit time correctly.
btw I'm new to both javaScript and mongodb
********update********
in the "device_data" there are different rows but with the same "userId" field.
in the "device_data" I have also the "data.networkOperatorName" field which contains different types of cellular companies.
I've been asked to show all "Cellcom" users that based on the 'DBVisit_DB' collection been connected more then an hour means,
based on the field "visitEnd" and "visitStart" I need to know if ("visitEnd" - "visitStart" > 1)
{ "userId" : "457A7A0097F83074DA5E05F7E05BEA1D#1927cc81cfcf7a467e9d4f4ac7a1534b" }
{ "userId" : "E0F5C56AC227972CFAFC9124E039F0DE#1927cc81cfcf7a467e9d4f4ac7a1534b" }
{ "userId" : "309FA12926EC3EB49EB9AE40B6078109#1927cc81cfcf7a467e9d4f4ac7a1534b" }
{ "userId" : "B10420C71798F1E8768ACCF3B5E378D0#1927cc81cfcf7a467e9d4f4ac7a1534b" }
{ "userId" : "EE5C11AD6BFBC9644AF3C742097C531C#1927cc81cfcf7a467e9d4f4ac7a1534b" }
{ "userId" : "20EA1468672EFA6793A02149623DA2C4#1927cc81cfcf7a467e9d4f4ac7a1534b" }
each array contains this format, after my queries, I need to merge them into one. that I'll have the intersection between them.
thanks a lot for all the help!
With the aggregation framework, you can achieve the desired result by making use of the $lookup operator which allows you to do a "left-join" operation on collections in the same database as well as taking advantage of the $redact pipeline operator which can accommodate arithmetic operators that manipulate timestamps and converting them to minutes which you can query.
To show a simple example how useful the above aggregate operators are, you can run the following pipeline on the DBVisit_DB collection to see the actual time difference in minutes:
db..getCollection('DBVisit_DB').aggregate([
{
"$project": {
"visitStart": { "$add": [ "$visitStart", new Date(0) ] },
"visitEnd": { "$add": [ "$visitEnd", new Date(0) ] },
"timeDiffInMinutes": {
"$divide": [
{ "$subtract": ["$visitEnd", "$visitStart"] },
1000 * 60
]
},
"isMoreThanHour": {
"$gt": [
{
"$divide": [
{ "$subtract": ["$visitEnd", "$visitStart"] },
1000 * 60
]
}, 60
]
}
}
}
])
Sample Output
{
"_id" : ObjectId("582bc54958f2245b05b455c6"),
"visitEnd" : ISODate("2016-11-15T23:22:37.766Z"),
"visitStart" : ISODate("2016-11-15T22:43:35.749Z"),
"timeDiffInMinutes" : 39.0336166666667,
"isMoreThanHour" : false
}
Now, having an understanding of how the above operators work, you can now apply it in the following example, where running the following aggregate pipeline will use the device_data collection as the main collection, first filter the documents on the specified field using $match and then do the join to DBVisit_DB collection using $lookup. $redact will process the logical condition of getting visits which are more than an hour long within $cond and uses the special system variables $$KEEP to "keep" the document where the logical condition is true or $$PRUNE to "discard" the document where the condition was false.
The arithmetic operators $divide and $subtract allow you to calculate the difference between the two timestamp fields as minutes, and the $gt logical operator then evaluates the condition:
db.device_data.aggregate([
/* Filter input documents */
{ "$match": { "data.networkOperatorName": "Cellcom" } },
/* Do a left-join to DBVisit_DB collection */
{
"$lookup": {
"from": "DBVisit_DB",
"localField": "userId",
"foreignField": "userId",
"as": "userVisits"
}
},
/* Flatten resulting array */
{ "$unwind": "$userVisits" },
/* Redact documents */
{
"$redact": {
"$cond": [
{
"$gt": [
{
"$divide": [
{ "$subtract": [
"$userVisits.visitEnd",
"$userVisits.visitStart"
] },
1000 * 60
]
},
60
]
},
"$$KEEP",
"$$PRUNE"
]
}
}
])
There are couple of things incorrect in your java script.
Replace time and users condition with time.length and users.length in for loops.
Your timePassed calculation should be
timePassed = this.visitEnd - this.visitStart
return timePassed > 3600000
You have couple of data related issues.
You don't have matching userId and difference between visitEnd and visitStart is less than an hour for the documents you posted in the question.
For mongo based query you should checkout the other answer.

Convert string to ISODate in MongoDB

I am new to MongoDB and I am stuck on the String to Date conversion. In the db the date item is stored in String type as "date":"2015-06-16T17:50:30.081Z"
I want to group the docs by date and calculate the sum of each day so I have to extract year, month and day from the date string and wipe off the hrs, mins and seconds. I have tried multiple way but they either return a date type of 1970-01-01 or the current date.
Moreover, I want to convert the following mongo query into python code, which get me the same problem, I am not able to call a javascript function in python, and the datetime can not parse the mongo syntax $date either.
I have tried:
new Date("2015-06-16T17:50:30.081Z")
new Date(Date.parse("2015-06-16T17:50:30.081Z"))
etc...
I am perfectly find if the string is given in Javascript or in Python, I know more than one way to parse it. However I have no idea about how to do it in MongoDB query.
db.collection.aggregate([
{
//somthing
},
{
'$group':{
'_id':{
'month': (new Date(Date.parse('$approTime'))).getMonth(),
'day': (new Date(Date.parse('$approTime'))).getDate(),
'year': (new Date(Date.parse('$approTime'))).getFullYear(),
'countries':'$countries'
},
'count': {'$sum':1}
}
}
])
If you can be assured of the format of the input date string AND you are just trying to get a count of unique YYYYMMDD, then just $project the substring and group on it:
var data = [
{ "name": "buzz", "d1": "2015-06-16T17:50:30.081Z"},
{ "name": "matt", "d1": "2018-06-16T17:50:30.081Z"},
{ "name": "bob", "d1": "2018-06-16T17:50:30.081Z"},
{ "name": "corn", "d1": "2019-06-16T17:50:30.081Z"},
];
db.foo.drop();
db.foo.insert(data);
db.foo.aggregate([
{ "$project": {
"xd": { "$substr": [ "$d1", 0, 10 ]}
}
},
{ "$group": {
"_id": "$xd",
"n": {$sum: 1}
}
}
]);
{ "_id" : "2019-06-16", "n" : 1 }
{ "_id" : "2018-06-16", "n" : 2 }
{ "_id" : "2015-06-16", "n" : 1 }
Starting in Mongo 4.0, you can use "$toDate" to convert a string to a date:
// { name: "buzz", d1: "2015-06-16T17:50:30.081Z" }
// { name: "matt", d1: "2018-06-16T17:50:30.081Z" }
// { name: "bob", d1: "2018-06-16T17:50:30.081Z" }
// { name: "corn", d1: "2019-06-16T17:50:30.081Z" }
db.collection.aggregate(
{ $group: {
_id: { $dateToString: { date: { $toDate: "$d1" }, format: "%Y-%m-%d" } },
n: { $sum: 1 }
}}
)
// { _id: "2015-06-16", n: 1 }
// { _id: "2018-06-16", n: 2 }
// { _id: "2019-06-16", n: 1 }
Within the group stage, this:
first converts strings (such as "2015-06-16T17:50:30.081Z") to date objects (ISODate("2015-06-16T17:50:30.081Z")) using the "$toDate" operator.
then converts the converted date (such as ISODate("2015-06-16T17:50:30.081Z")) back to string ("2015-06-16") but this time with this format "%Y-%m-%d", using the $dateToString operator.

Categories