I'm writing a migration script to update some fields of a collection, say collection2.
In our collections, we store Japanese dates with following format in each document:
"date" : { "era" : 4, "year" : 25, "month" : 11, "day" : 25 }// i.e `25-11-2014`
Now I'm looking for an easy way to get all the documents of the collection with date > 1-10-2014 i.e
date > { "era" : 4, "year" : 25, "month" : 10, "day" : 1 }
Code works well, but I'm feeling like it can be optimised but don't know how.
iterating collection1 using forEach and extracting its date
check date collection1.date > 1-10-2014
copy some document fields from collection2 and update them
db.col1.find({ name: x123 }).forEach(function(doc){
if(hasValidDate(doc.date1)){
db.col2.find({col1_id:doc._id}).forEach(function(doc2){
var copyobj = {doc2.x1, doc2.x2, ...};
db.col2.update({col1_id:doc._id}, copyobj);
});
}
});
function hasValidDate(date){
return (date.era == 4 && date.year >= 26 &&
(date.month >= 10 && date.day >= 1))?true:false;
}
You could try including the actual date filtering within your find() query:
db.col1.find(
{
"name": "x123",
"date.era": 4,
"date.year": { "$gte": 26 },
"date.month": { "$gte": 10 },
"date.day": { "$gte": 1 },
}
).forEach(function(doc){
var copyobj = { "$set": {"x1": "1", "x2": "3", ...} };
db.col2.update({_id: doc._id}, copyobj);
});
Related
I have a collection with following structure:
{
"_id" : "Pd2fl7xcT3iWEmpAafv4DA",
"slot" : 1,
"stat" : [
{
"unitStat" : "5"
"value" : 13
},
{
"unitStat" : "18",
"value" : 1.96
},
{
"unitStat" : "28",
"value" : 1373
},
{
"unitStat" : "41",
"roll" : 2,
"value" : 69
}
]
}
I want to get 5 sorted objects (by any unitStat type) for every slot.
In that moment, I can perform 6 calls to db, but it isn't a good idea.
I tried to use aggregation, but I can perform it only for one slot:
db.collection.aggregate(
{
`$match`: {
slot: 1,
secondaryStat: {
`$elemMatch`: {
unitStat:'5'
}
}
}
},
{
`$unwind`: `'$secondaryStat'`
},
{
`$match`: {
'secondaryStat.unitStat' : '5'
}
},
{
`$sort`: {
'secondaryStat.value': -1
}
},
{
`$limit`: 5
}
)
Can I find, for example top 5 sorted objects from 6 different slots?
The following query can get us the expected output:
db.collection.aggregate([
{
$unwind:"$stat"
},
{
$match:{
"stat.unitStat":"5"
}
},
{
$sort:{
"slot":1,
"stat.value":1
}
},
{
$group:{
"_id":"$slot",
"slot":{
$first:"$slot"
},
"stat":{
$push:"$stat"
}
}
},
{
$project:{
"_id":0,
"slot":1,
"stat":{
$slice:["$stat",0,5]
}
}
}
]).pretty()
Aggregation stages details:
Stage I: Unwind the stat array
Stage II: Filter unitStat for any specified value. "5" in this case.
Stage III: Sort the data in ascending order on the basis of slot and
stat.value
Stage IV: Group back the data on the basis of slot and push all filtered stat into an array with name 'stat'
Stage V: Slice the stat array with the specified length. 5 in this
case.
I have two collections:
'DBVisit_DB':
"_id" : ObjectId("582bc54958f2245b05b455c6"),
"visitEnd" : NumberLong(1479252157766),
"visitStart" : NumberLong(1479249815749),
"fuseLocation" : {.... }
"userId" : "A926D9E4853196A98D1E4AC6006DAF00#1927cc81cfcf7a467e9d4f4ac7a1534b",
"modificationTimeInMillis" : NumberLong(1479263563107),
"objectId" : "C4B4CE9B-3AF1-42BC-891C-C8ABB0F8DC40",
"creationTime" : NumberLong(1479252167996),
"lastUserInteractionTime" : NumberLong(1479252167996)
}
'device_data':
"_id" : { "$binary" : "AN6GmE7Thi+Sd/dpLRjIilgsV/4AAAg=", "$type" : "00" },
"auditVersion" : "1.0",
"currentTime" : NumberLong(1479301118381),
"data" : {
"networkOperatorName" : "Cellcom",...
},
"timezone" : "Asia/Jerusalem",
"collectionAlias" : "DEVICE_DATA",
"shortDate" : 17121,
"userId" : "00DE86984ED3862F9277F7692D18C88A#1927cc81cfcf7a467e9d4f4ac7a1534b"
In DBVisit_DB I need to show all visits only for Cellcom users which took more than 1 hour. (visitEnd - visitStart > 1 hour). by matching the userId value in both the collection.
this is what I did so far:
//create an array that contains all the rows that "Cellcom" is their networkOperatorName
var users = db.device_data.find({ "data.networkOperatorName": "Cellcom" },{ userId: 1, _id: 0}).toArray();
//create an array that contains all the rows that the visit time is more then one hour
var time = db.DBVisit_DB.find( { $where: function() {
timePassed = new Date(this.visitEnd - this.visitStart).getHours();
return timePassed > 1}},
{ userId: 1, _id: 0, "visitEnd" : 1, "visitStart":1} ).toArray();
//merge between the two arrays
var result = [];
var i, j;
for (i = 0; i < time; i++) {
for (j = 0; j < users; j++) {
if (time[i].userId == users[j].userId) {
result.push(time[i]);
}
}
}
for (var i = 0; i < result.length; i++) {
print(result[i].userId);
}
but it doesn't show anything although I know for sure that there is id's that can be found in both the array I created.
*for verification: I'm not 100% sure that I calculated the visit time correctly.
btw I'm new to both javaScript and mongodb
********update********
in the "device_data" there are different rows but with the same "userId" field.
in the "device_data" I have also the "data.networkOperatorName" field which contains different types of cellular companies.
I've been asked to show all "Cellcom" users that based on the 'DBVisit_DB' collection been connected more then an hour means,
based on the field "visitEnd" and "visitStart" I need to know if ("visitEnd" - "visitStart" > 1)
{ "userId" : "457A7A0097F83074DA5E05F7E05BEA1D#1927cc81cfcf7a467e9d4f4ac7a1534b" }
{ "userId" : "E0F5C56AC227972CFAFC9124E039F0DE#1927cc81cfcf7a467e9d4f4ac7a1534b" }
{ "userId" : "309FA12926EC3EB49EB9AE40B6078109#1927cc81cfcf7a467e9d4f4ac7a1534b" }
{ "userId" : "B10420C71798F1E8768ACCF3B5E378D0#1927cc81cfcf7a467e9d4f4ac7a1534b" }
{ "userId" : "EE5C11AD6BFBC9644AF3C742097C531C#1927cc81cfcf7a467e9d4f4ac7a1534b" }
{ "userId" : "20EA1468672EFA6793A02149623DA2C4#1927cc81cfcf7a467e9d4f4ac7a1534b" }
each array contains this format, after my queries, I need to merge them into one. that I'll have the intersection between them.
thanks a lot for all the help!
With the aggregation framework, you can achieve the desired result by making use of the $lookup operator which allows you to do a "left-join" operation on collections in the same database as well as taking advantage of the $redact pipeline operator which can accommodate arithmetic operators that manipulate timestamps and converting them to minutes which you can query.
To show a simple example how useful the above aggregate operators are, you can run the following pipeline on the DBVisit_DB collection to see the actual time difference in minutes:
db..getCollection('DBVisit_DB').aggregate([
{
"$project": {
"visitStart": { "$add": [ "$visitStart", new Date(0) ] },
"visitEnd": { "$add": [ "$visitEnd", new Date(0) ] },
"timeDiffInMinutes": {
"$divide": [
{ "$subtract": ["$visitEnd", "$visitStart"] },
1000 * 60
]
},
"isMoreThanHour": {
"$gt": [
{
"$divide": [
{ "$subtract": ["$visitEnd", "$visitStart"] },
1000 * 60
]
}, 60
]
}
}
}
])
Sample Output
{
"_id" : ObjectId("582bc54958f2245b05b455c6"),
"visitEnd" : ISODate("2016-11-15T23:22:37.766Z"),
"visitStart" : ISODate("2016-11-15T22:43:35.749Z"),
"timeDiffInMinutes" : 39.0336166666667,
"isMoreThanHour" : false
}
Now, having an understanding of how the above operators work, you can now apply it in the following example, where running the following aggregate pipeline will use the device_data collection as the main collection, first filter the documents on the specified field using $match and then do the join to DBVisit_DB collection using $lookup. $redact will process the logical condition of getting visits which are more than an hour long within $cond and uses the special system variables $$KEEP to "keep" the document where the logical condition is true or $$PRUNE to "discard" the document where the condition was false.
The arithmetic operators $divide and $subtract allow you to calculate the difference between the two timestamp fields as minutes, and the $gt logical operator then evaluates the condition:
db.device_data.aggregate([
/* Filter input documents */
{ "$match": { "data.networkOperatorName": "Cellcom" } },
/* Do a left-join to DBVisit_DB collection */
{
"$lookup": {
"from": "DBVisit_DB",
"localField": "userId",
"foreignField": "userId",
"as": "userVisits"
}
},
/* Flatten resulting array */
{ "$unwind": "$userVisits" },
/* Redact documents */
{
"$redact": {
"$cond": [
{
"$gt": [
{
"$divide": [
{ "$subtract": [
"$userVisits.visitEnd",
"$userVisits.visitStart"
] },
1000 * 60
]
},
60
]
},
"$$KEEP",
"$$PRUNE"
]
}
}
])
There are couple of things incorrect in your java script.
Replace time and users condition with time.length and users.length in for loops.
Your timePassed calculation should be
timePassed = this.visitEnd - this.visitStart
return timePassed > 3600000
You have couple of data related issues.
You don't have matching userId and difference between visitEnd and visitStart is less than an hour for the documents you posted in the question.
For mongo based query you should checkout the other answer.
I need to get a list of week ranges for all records in my MongoDB. When I click on a week range, it will display only the records for that week range. Clicking on the week range sends the ID of the week (lets say 42, ie the 42nd week out of year 2015), it should get those results.
Question: How can I query for a set of records given a week number and year? This should work, right?
SCHEMA:
var orderSchema = mongoose.Schema({
date: Date, //ISO date
request: {
headers : {
...
First: Get all week IDs for all Objects:
var query = Order.aggregate(
[
{
$project:
{
week:
{
$week: '$date'
}
}
},
{
$group:
{
_id: null,
distinctDate:
{
$addToSet:
{
week: '$week'
}
}
}
}
]
);
Result:
distinctDate: Array[35]
0: Object
week: 40
1: Object
week: 37
...
Convert to week ranges using MomentJS and display:
data.forEach(function(v, k) {
$scope.weekRanges.push(getWeekRange(v.week));
});
function getWeekRange(weekNum) {
var monday = moment().day("Monday").isoWeek(weekNum).format('MM-DD-YYYY');
var sunday = moment().day("Sunday").isoWeek(weekNum).format('MM-DD-YYYY');
...
Output:
Week
10-12-2015 to 10-18-2015 //week ID 42
10-05-2015 to 10-11-2015 //week ID 41
09-28-2015 to 10-04-2015 ...
...
Second: Click on week range and get Objects Per Week ID:
var year = 2015;
var weekID = weekParamID; //42
if (!Order) {
Order = mongoose.model('Order', orderSchema());
}
var query = Order.aggregate(
{
$project:
{
cust_ID : '$request.headers.custID',
cost : '$response.body.pricing.cost',
year :
{
$year: '$date'
},
month :
{
$month: '$date'
},
week:
{
$week: '$date'
},
day:
{
$dayOfMonth: '$date'
}
}
},
{
$match:
{
year : year, //2015
week : weekID //42
}
}
);
And if I click on Week Range 10-12-2015 to 10-18-2015 (week ID 42), I get results with dates outside of the range (10-19-2015):
10-19-2015 Order info
10-18-2015 Order info
10-19-2015 Order info
Using MongoDB command line:
db.mycollection.aggregate({ $project: { week: { $week: '$date' }, day: { $dayOfMonth: '$date' } } }, { $match: { week: 42 } }
Results:
{ "_id" : "1bd482f6759b", "week" : 42, "day" : 19 } //shouldn't exceed week range
{ "_id" : "b3d38759", "week" : 42, "day" : 19 }
EDIT: Update
So there is a discrepancy with MongoDB ISO weeks (starts on Sunday) and Moment JS ISO (starts on Monday).
This SO post suggests subtracting the dates from the query so the Mongo date starts on Monday:
{
$project:
{
week: { $week: [ "$datetime" ] },
dayOfWeek:{$dayOfWeek:["$datetime"]}}
},
{
$project:
{
week:{$cond:[{$eq:["$dayOfWeek",1]},{$subtract:["$week",1]},'$week']}
}
}
I implemented this with my query, but now it's not returning two fields that I need:
cust_ID : '$request.headers.custID',
cost : '$response.body.pricing.cost'
Query:
db.mycollection.aggregate(
{
$project:
{
cust_ID : '$request.headers.custID',
cost : '$response.body.pricing.cost',
week:
{
$week: ['$date']
},
dayOfWeek:
{
$dayOfWeek: ['$date']
}
}
},
{
$project:
{
week: {
$cond: [
{
$eq: [
"$dayOfWeek", 1
]
},
{
$subtract: [
"$week", 1
]
}, '$week'
]
}
}
},
{
$match:
{
week : 42
}
}
);
Results:
{ "_id" : "387e2", "week" : 42 }
{ "_id" : "ef269f6341", "week" : 42 }
{ "_id" : "17482f6759b", "week" : 42 }
{ "_id" : "7123d38759", "week" : 42 }
{ "_id" : "ff89b1fb", "week" : 42 }
It's not returning the fieldsets I specified in $project
The MongoDB $week operator considers weeks to begin on Sunday, see docs:
Weeks begin on Sundays, and week 1 begins with the first Sunday of the
year... This behavior is the same as the “%U” operator to the strftime
standard library function.
Moment.JS's isoWeekday() uses the ISO week which considers weeks to begin on Monday. It also differs in that it considers the week 1 to be the first week with a Thursday in it.
This discrepancy could explain the behaviour you are seeing.
E.g. if I save this doc in MongoDB, which is a Monday:
db.test.save({ "date" : new ISODate("2015-10-19T10:10:10Z") })
then run your aggregation query above, I get week 42.
But then if I run the following:
console.log(moment().day("Monday").isoWeek(42))
I get the below date which is not the one I originally saved in MongoDB, even though it is Monday of the week MongoDB reported.
Mon Oct 12 2015
How to fix it I guess depends on which definition of week you need.
If you are happy with the MongoDB $week definition, it's probably easy to find/write an alternative implementation to convert the week number to the corresponding date. Here is one library that adds strftime support to Moment.js:
https://github.com/benjaminoakes/moment-strftime
If you want to use the ISO format, it's more complicated. As per your edit above you'll need to account for the week start difference. But you'll also need to account for the week number at start of year difference. This difference means that the strftime week number can have a week 0 while ISO always starts on week 1. For 2015 it looks like you need to add 1 week on to the strftime week to get the ISO week, as well as accounting for the week start day, but that won't be reliable in general.
Starting from MongoDB version 3.4 you can use the $isoWeek aggregation operator.
Returns the week number in ISO 8601 format, ranging from 1 to 53. Week numbers start at 1 with the week (Monday through Sunday) that contains the year’s first Thursday.
You can find more infos on this in the MongoDB docs.
I am new to MongoDB and I am stuck on the String to Date conversion. In the db the date item is stored in String type as "date":"2015-06-16T17:50:30.081Z"
I want to group the docs by date and calculate the sum of each day so I have to extract year, month and day from the date string and wipe off the hrs, mins and seconds. I have tried multiple way but they either return a date type of 1970-01-01 or the current date.
Moreover, I want to convert the following mongo query into python code, which get me the same problem, I am not able to call a javascript function in python, and the datetime can not parse the mongo syntax $date either.
I have tried:
new Date("2015-06-16T17:50:30.081Z")
new Date(Date.parse("2015-06-16T17:50:30.081Z"))
etc...
I am perfectly find if the string is given in Javascript or in Python, I know more than one way to parse it. However I have no idea about how to do it in MongoDB query.
db.collection.aggregate([
{
//somthing
},
{
'$group':{
'_id':{
'month': (new Date(Date.parse('$approTime'))).getMonth(),
'day': (new Date(Date.parse('$approTime'))).getDate(),
'year': (new Date(Date.parse('$approTime'))).getFullYear(),
'countries':'$countries'
},
'count': {'$sum':1}
}
}
])
If you can be assured of the format of the input date string AND you are just trying to get a count of unique YYYYMMDD, then just $project the substring and group on it:
var data = [
{ "name": "buzz", "d1": "2015-06-16T17:50:30.081Z"},
{ "name": "matt", "d1": "2018-06-16T17:50:30.081Z"},
{ "name": "bob", "d1": "2018-06-16T17:50:30.081Z"},
{ "name": "corn", "d1": "2019-06-16T17:50:30.081Z"},
];
db.foo.drop();
db.foo.insert(data);
db.foo.aggregate([
{ "$project": {
"xd": { "$substr": [ "$d1", 0, 10 ]}
}
},
{ "$group": {
"_id": "$xd",
"n": {$sum: 1}
}
}
]);
{ "_id" : "2019-06-16", "n" : 1 }
{ "_id" : "2018-06-16", "n" : 2 }
{ "_id" : "2015-06-16", "n" : 1 }
Starting in Mongo 4.0, you can use "$toDate" to convert a string to a date:
// { name: "buzz", d1: "2015-06-16T17:50:30.081Z" }
// { name: "matt", d1: "2018-06-16T17:50:30.081Z" }
// { name: "bob", d1: "2018-06-16T17:50:30.081Z" }
// { name: "corn", d1: "2019-06-16T17:50:30.081Z" }
db.collection.aggregate(
{ $group: {
_id: { $dateToString: { date: { $toDate: "$d1" }, format: "%Y-%m-%d" } },
n: { $sum: 1 }
}}
)
// { _id: "2015-06-16", n: 1 }
// { _id: "2018-06-16", n: 2 }
// { _id: "2019-06-16", n: 1 }
Within the group stage, this:
first converts strings (such as "2015-06-16T17:50:30.081Z") to date objects (ISODate("2015-06-16T17:50:30.081Z")) using the "$toDate" operator.
then converts the converted date (such as ISODate("2015-06-16T17:50:30.081Z")) back to string ("2015-06-16") but this time with this format "%Y-%m-%d", using the $dateToString operator.
I'd like to apply some simple String manipulation when doing $project, is it possible to apply something like the following function on $project? :
var themeIdFromZipUrl = function(zipUrl){
return zipUrl.match(/.*\/(T\d+)\/.*/)[1]
};
I'm using the following query:
db.clientRequest.aggregate(
{
$match: {
"l": {$regex: ".*zip"},
"t": { "$gte": new Date('1/SEP/2013'),
"$lte": new Date('7/OCT/2013')
}
}
},
{
$project: {"theme_url" : "$l", "_id": 0, "time": "$t"}
},
{
$group: { _id: {
theme_url: "$theme_url",
day: {
"day": {$dayOfMonth : "$time"},
"month": {$month: "$time"},
"year": {$year: "$time"}
},
},
count: {$sum:1}
}
}
)
This returns following:
{
"_id" : {
"theme_url" : "content/theme/T70/zip",
"day" : {
"day" : 13,
"month" : 9,
"year" : 2013
}
},
"count" : 2
}
Can I apply the function above on the theme_url field and turn it to theme_id? I took a little look on Map-Reduce, but I'm not sure whether it's a bit too complicated for such an easy case.
Thanks,
Amit.
There's no way to do this using the Aggregation Framework currently.
You could do it with MapReduce but that would probably slow down the entire thing (if the amount of data is large).
If this is the last step of the aggregation you can also do it on the clientside after the aggregation completes. e.g. in the Mongo shell:
var aggregationResults = col.aggregate([ /* aggregation pipeline here */]);
aggregationResults.results.forEach(function(x) {
x._id.theme_id = themeIdFromUrl(x._id.themeUrl);
});
If you're using a driver for another language you'll have to do this in whatever language you're using, of course.
Generally speaking, if your data contains a theme_url and the theme_id is encoded in the URL, it might make sense to store it in its own field. Mongo is not a very good tool for text manipulation.