MongoDB Group By _id's Timestamp

MongoDB Group By _id's Timestamp - javascript

I'm looking to group a bunch of documents by creation date.
Using the MongoDB Aggregation Framework, is it possible to group documents by the _id's timestamp?
Something of the like
db.sessions.aggregate(
{ $group :
{_id: { $dayOfYear: "$_id.getTimestamp()"},
count: { $sum: 1 }
}
})
Thanks

The function you are referring to here is a JavaScript method implemented as a shell helper for the ObjectId wrapper. Other driver implementations for various languages contain a similar method whose basic function can be seen from the mongo shell as this:
function (){
return new Date(parseInt(this.valueOf().slice(0,8), 16)*1000);
}
But this at least in the shell context is JavaScript and you cannot use JavaScript methods within the aggregation framework, only the implemented operators are allowed. There is presently no "helper" in the aggregation framework methods to extract a "timestamp" from an ObjectId.
As well, the required functions as shown in example above to implement this are lacking at present from the aggregation framework. You cannot possibly "cast" an ObjectId as a string, let alone cast strings as integers or convert from a base type.
For the aggregation framework, the best design approach is to include the required date value in your documents and update accordingly.
If you really wish not to do this and must extract a date from the ObjectId value, then you need to use JavaScript evaluation with mapReduce, or otherwise transfer to client side code:
db.collection.mapReduce(
function() {
// Get time group per day
var id = this._id.getTimestamp()
- ( this._id.getTimestamp() % ( 1000 * 60 * 60 * 24 ) );
delete this._id;
emit( id, this );
},
function(key,values) {
// some reduce operation
},
{ "out": { "inline": 1 } }
)

Related

Updating a collection from a different database

I'm using Mongo 4.1 and would like to update a collection named "location_copy", by adding a new field to it of type object named "time", with two subfields: "utcTime", which will be populated by the value of that documents "time" field, and "tz", which will be populated by value of "subject.contactInf[0].addresses[0].timeZoneID" from of the document in the collection "subjects" in the database "Subjects" (a different database from the one of the first collection) with "_id" field value corresponding to "subjectID" field in locations_copy.
I have tried to accomplish this with the following code:
const get_time_zone_id = function(doc) {return doc.contactInfo[0].addresses[0].timeZoneID}
const get_location_doc = function(subjectID) { return db.getSiblingDB('Subjects').subjects.find({"_id": subjectID, "contactInfo": {"$exists": true}, "$where" : function() {
return (this.contactInfo.length > 0 && this.contactInfo[0].addresses && this.contactInfo[0].addresses.length > 0 && this.contactInfo[0].addresses[0].timeZoneID)
}}, {"contactInfo" : {"$slice": 1}, "contactInfo.addresses": {"$slice": 1},"contactInfo.addresses.timeZoneID" : 1}).map(get_time_zone_id)}
db.locations_copy.aggregate( [
{ $match: {"subjectID": {"$exists": true}}},
{ $addFields: {
time: { utc: "$timeUTC",
tz: { "$arrayElemAt": [get_location_doc(ObjectId("$subjectID")), 0 ] }}
}
}
] ).forEach(function(x){db.locations_copy.save(x)})
everything works except for one thing: when I try to pass ObjectId("$subjectID") as a parameter to "get_location_doc", it parses "$subjectID" as a literal string rather than passing the value of the underlying field in each document. I have also tried passing simply subjectID (without quotes) in which case it was simply undefined, or "$$subjectID" which led me to a literal string again. I understand this is due to client/server side parsing in run time.
I have tried to utilize the "$function" operator, but apparently it's only available from version 4.4 (I'm using 4.1).
I should note, that if I replace "$subjectID" with a hard-coded string ID (for example "5ff4c037bc0a716381231277") everything works as you'd expect.
Can anyone please help me accomplish what I intend? since this script is only meant to be executed once, performance is not much of an issue.
Thank you!

db.getSiblingDB().collection.find() is a client-side operation. It is not something you can use to join collections as part of a query. For that, see https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/.
The second thing you are doing is retrieving nested fields out of a document. You can do this with $set and dot notation. See specifically the example at https://docs.mongodb.com/manual/reference/operator/aggregation/set/#adding-fields-to-an-embedded-document.
You will need to construct a single aggregation pipeline that does everything your current mix of aggregation and javascript does using only the operations documented in https://docs.mongodb.com/manual/reference/operator/aggregation/ and the stages documented in https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/.

Find latest date of document based on 2 separate fields: month and year

I have documents in mongoDB that look like this:
{username:String, paymentYear:Int, paymentMonth:Int}
I would like to find the latest document of a username, that means, the closest date to our Date.now(). What is the best way of accomplishing it? Is there any query of mongo I can use or should I write my own code?
Thanks

This is achievable using MongoDB's $dateFromParts stage Aggregation pipeline.
db.test.aggregate([
{
"$addFields": {
"tempDate": {
"$dateFromParts": {
year : '$paymentYear',
month : '$paymentMonth',
}
}
}
},
{
"$sort": {"tempDate": -1} // Change from `-1` to `1` For Ascending Order
},
{
"$limit": 1 // Number of documents to be returned based on the sort order
},
])
Use can implement this in $project stage instead of $addFields stage based on your needs for better optimization.
Note: This will work only for MongoDB version 3.6 and above.

How do I query MongoDB for 2 ranges and text search?

I have event objects in MonogDB that look like this:
{
"start": 2010-09-04T16:54:11.216Z,
"title":"Short descriptive title",
"description":"Full description",
"score":56
}
And I need to get a query across three parameters:
Time window (event start is between two dates)
Score threshold (score is > x)
Full-text search of title and description
What's the right way to approach this efficiently? I think the first two are done with an aggregation but I'm not sure how text search would factor in.

Assuming your start field is of type date (which it should be) and not a string, here are the basic components that you'd want to play with. In fact, given the ISO 8601 structure of a MongoDB date a string based comparison would work just as well.
// create your text index
db.collection.ensureIndex({
description: "text",
title: "text"
})
// optionally create an index on your other fields
db.collection.ensureIndex({
start: 1,
score: 1
})
x = 50
lowerDate = ISODate("2010-09-04T16:54:11.216Z") // or just the string part for string fields
upperDate = ISODate("2010-09-04T16:54:11.216Z")
// simple find to retrieve your result set
db.collection.find({
start: {
$gte: lowerDate, // depending on your exact scenario, you'd need to use $gt
$lte: upperDate // depending on your exact scenario, you'd need to use $lt
},
score: { $gt: x }, // depending on your exact scenario, you'd need to use $gte
$text: { // here comes the text search
$search: "descriptive"
}
})
There is an important topic with respect to performance/indexing that needs to be understood, though, which is very well documented here: Combine full text with other index
This is why I initially wrote "components of what you'd want to play with". So depending on the rest of your application you may want to create different indexes.

CouchDB query issues

I will start off by saying while I am not new to CouchDB, I am new to querying the views using JavaScript and the web.
I have looked at multiple other questions on here, including CouchDB - Queries with params, couchDB queries, Couchdb query with AND operator, CouchDB Querying Dates, and Basic CouchDB Queries, just to list a few.
While all have good information in them, I haven't found one that has my particular problem in it.
I have a view set up like so:
function (docu) {
if(docu.status && docu.doc && docu.orgId.toString() && !docu.deleted){
switch(docu.status){
case "BASE":
emit(docu.name, docu);
break;
case "AIR":
emit(docu.eta, docu);
break;
case "CHECK":
emit(docu.checkTime, docu);
break;
}
}
}
with all documents having a status, doc, orgId, deleted, name, eta, and checkTime. (I changed doc to docu because of my custom doc key.
I am trying to query and emit based on a set of keys, status, doc, orgId, where orgId is an integer.
My jQuery to do this looks like so:
$.couch.db("myDB").view("designDoc/viewName", {
keys : ["status","doc",orgId],
success: function(data) {
console.log(data);
},
error: function(status) {
console.log(status);
}
});
I receive
{"total_rows":59,"offset":59,"rows":[
]}
Sometimes the offset is 0, sometimes it is 59. I feel I must be doing something wrong for this not to be working correctly.
So for my questions:
I did not mention this, but I had to set docu.orgId.toString() because I guess it parses the URL as a string, is there a way to use this number as a numeric value?
How do I correctly view multiple documents based on multiple keys, i.e. if(key1 && key2) emit(doc.name, doc)
Am I doing something obviously wrong that I lack the knowledge to notice?
Thank you all.

You're so very close. To answer your questions
When you're using docu.orgId.toString() in that if-statement you're basically saying: this value must be truthy. If you didn't convert to string, any number, other than 0, would be true. Since you are converting to a string, any value other than an empty string will be true. Also, since you do not use orgId as the first argument in an emit call, at least not in the example above, you cannot query by it at all.
I'll get to this.
A little.
The thing to remember is emit creates a key-value table (that's really all a view is) that you can use to query. Let's say we have the following documents
{type:'student', dept:'psych', name:'josh'},
{type:'student', dept:'compsci', name:'anish'},
{type:'professor', dept:'compsci', name:'kender'},
{type:'professor', dept:'psych', name:'josh'},
{type:'mascot', name:'owly'}
Now let's say we know that for this one view, we want to query 1) everything but mascots, 2) we want to query by type, dept, and name, all of the available fields in this example. We would write a map function like this:
function(doc) {
if (doc.type === 'mascot') { return; } // don't do anything
// allow for queries by type
emit(doc.type, null); // the use of null is explained below
// allow queries by dept
emit(doc.dept, null);
// allow for queries by name
emit(doc.name, null);
}
Then, we would query like this:
// look for all joshs
$.couch.db("myDB").view("designDoc/viewName", {
keys : ["josh"],
// ...
});
// look for everyone in the psych department
$.couch.db("myDB").view("designDoc/viewName", {
keys : ["psych"],
// ...
});
// look for everyone that's a professor and everyone named josh
$.couch.db("myDB").view("designDoc/viewName", {
keys : ["professor", "josh"],
// ...
});
Notice the last query isn't and in the sense of a logical conjunction, it's in the sense of a union. If you wanted to restrict what was returned to documents that were only professors and also joshs, there are a few options. The most basic would be to concatenate the key when you emit. Like
emit('type-' + doc.type + '_name-' + doc.name, null);
You would then query like this: key : ["type-professor_name-josh"]
It doesn't feel very proper to rely on strings like this, at least it didn't to me when I first started doing it, but it is a quite common method for querying key-value stores. The characters - and _ have no special meaning in this example, I simply use them as delimiters.
Another option would be what you mentioned in your comment, to emit an array like
emit([ doc.type, doc.name ], null);
Then you would query like
key: ["professor", "josh"]
This is perfectly fine, but generally, the use case for emitting arrays as keys, is for aggregating returned rows. For example, you could emit([year, month, day]) and if you had a simple reduce function that basically passed the records through:
function(keys, values, rereduce) {
if (rereduce) {
return [].concat.apply([], values);
} else {
return values;
}
}
You could query with the url parameter group_level set to 1 or 2 and start querying by year and month or just year on the exact same view using arrays as keys. Compared to SQL or Mongo it's mad complicated and convoluted, but hey, it's there.
The use of null in the view is really for resource saving. When you query a view, the rows contain an _id that you can use in a second ajax call to get all the documents from, for example, _all_docs.
I hope that makes sense. If you need any clarification you can use the comments and I'll try my best.

I want to retrieve values inserted on particular date using _id of mongodb

I want to retrieve values inserted on particular date. Is this possible using mongodb "_id" field? as this contains embedded date time. I want to retieve values in mongodb shell not by using any application.

While it is true that the ObjectId is based on a "timestamp" in part, generally this is a "client" library operation to "extract" this date from the ObjectId value.
You can do this with the JavaScript evaluation of $where, but it will need to "scan" the entire collection, so is not very efficient:
db.collection.find(function() {
return (
( this._id.getTimestamp().valueOf() -
this._id.getTimestamp().valueOf() % ( 1000 * 60 * 60 * 24 ) )
== new Date("2014-07-14").valueOf() );
})
That will basically compare to see if the ObjectId was created on the same day as the date provided. Other date math or methods apply to other intervals.

We Keep Coding

JavaScript is the programming language of the Web.

MongoDB Group By _id's Timestamp - javascript

I'm looking to group a bunch of documents by creation date. Using the MongoDB Aggregation Framework, is it possible to group documents by the _id's timestamp? Something of the like db.sessions.aggregate( { $group : {_id: { $dayOfYear: "$_id.getTimestamp()"}, count: { $sum: 1 } } }) Thanks

Related

Updating a collection from a different database

Find latest date of document based on 2 separate fields: month and year

How do I query MongoDB for 2 ranges and text search?

CouchDB query issues

I want to retrieve values inserted on particular date using _id of mongodb

Categories

Resources