I'd like to apply some simple String manipulation when doing $project, is it possible to apply something like the following function on $project? :
var themeIdFromZipUrl = function(zipUrl){
return zipUrl.match(/.*\/(T\d+)\/.*/)[1]
};
I'm using the following query:
db.clientRequest.aggregate(
{
$match: {
"l": {$regex: ".*zip"},
"t": { "$gte": new Date('1/SEP/2013'),
"$lte": new Date('7/OCT/2013')
}
}
},
{
$project: {"theme_url" : "$l", "_id": 0, "time": "$t"}
},
{
$group: { _id: {
theme_url: "$theme_url",
day: {
"day": {$dayOfMonth : "$time"},
"month": {$month: "$time"},
"year": {$year: "$time"}
},
},
count: {$sum:1}
}
}
)
This returns following:
{
"_id" : {
"theme_url" : "content/theme/T70/zip",
"day" : {
"day" : 13,
"month" : 9,
"year" : 2013
}
},
"count" : 2
}
Can I apply the function above on the theme_url field and turn it to theme_id? I took a little look on Map-Reduce, but I'm not sure whether it's a bit too complicated for such an easy case.
Thanks,
Amit.
There's no way to do this using the Aggregation Framework currently.
You could do it with MapReduce but that would probably slow down the entire thing (if the amount of data is large).
If this is the last step of the aggregation you can also do it on the clientside after the aggregation completes. e.g. in the Mongo shell:
var aggregationResults = col.aggregate([ /* aggregation pipeline here */]);
aggregationResults.results.forEach(function(x) {
x._id.theme_id = themeIdFromUrl(x._id.themeUrl);
});
If you're using a driver for another language you'll have to do this in whatever language you're using, of course.
Generally speaking, if your data contains a theme_url and the theme_id is encoded in the URL, it might make sense to store it in its own field. Mongo is not a very good tool for text manipulation.
Related
I have a document structure something along the lines of the following:
{
"_id" : "777",
"someKey" : "someValue",
"someArray" : [
{
"name" : "name1",
"someNestedArray" : [
{
"name" : "value"
},
{
"name" : "delete me"
}
]
}
]
}
I want to delete the nested array element with the value "delete me".
I know I can find documents which match this description using nested $elemMatch expressions. What is the query syntax for removing the element in question?
To delete the item in question you're actually going to use an update. More specifically you're going to do an update with the $pull command which will remove the item from the array.
db.temp.update(
{ _id : "777" },
{$pull : {"someArray.0.someNestedArray" : {"name":"delete me"}}}
)
There's a little bit of "magic" happening here. Using .0 indicates that we know that we are modifying the 0th item of someArray. Using {"name":"delete me"} indicates that we know the exact data that we plan to remove.
This process works just fine if you load the data into a client and then perform the update. This process works less well if you want to do "generic" queries that perform these operations.
I think it's easiest to simply recognize that updating arrays of sub-documents generally requires that you have the original in memory at some point.
In response to the first comment below, you can probably help your situation by changing the data structure a little
"someObjects" : {
"name1": {
"someNestedArray" : [
{
"name" : "value"
},
{
"name" : "delete me"
}
]
}
}
Now you can do {$pull : { "someObjects.name1.someNestedArray" : ...
Here's the problem with your structure. MongoDB does not have very good support for manipulating "sub-arrays". Your structure has an array of objects and those objects contain arrays of more objects.
If you have the following structure, you are going to have a difficult time using things like $pull:
array [
{ subarray : array [] },
{ subarray : array [] },
]
If your structure looks like that and you want to update subarray you have two options:
Change your structure so that you can leverage $pull.
Don't use $pull. Load the entire object into a client and use findAndModify.
MongoDB 3.6 added $[] operator that facilitates updates to arrays that contain embedded documents. So the problem can be solved by:
db.test.update(
{ _id : "777" },
{$pull : {"someArray.$[].someNestedArray" : {"name":"delete me"}}}
)
As #Melkor has commented (should probably be an answer as itself),
If you do not know the index use:
{
_id: TheMainID,
"theArray._id": TheArrayID
},
{
$pull: {
"theArray.$.theNestedArray": {
_id: theNestedArrayID
}
}
}
From MongoDB 3.6 on you can use arrayFilters to do this:
db.test.update(
{ _id: "777" },
{ $pull: { "someArray.$[elem].someNestedArray": { name: "delete me" } } },
{ arrayFilters: [{ "elem.name": "name1"}] }
)
see also https://docs.mongodb.com/manual/reference/operator/update/positional-filtered/index.html#update-all-documents-that-match-arrayfilters-in-an-array
Other example and usage could be like this:
{
"company": {
"location": {
"postalCode": "12345",
"Address": "Address1",
"city": "Frankfurt",
"state": "Hessen",
"country": "Germany"
},
"establishmentDate": "2019-04-29T14:12:37.206Z",
"companyId": "1",
"ceo": "XYZ"
},
"items": [{
"name": "itemA",
"unit": "kg",
"price": "10"
},
{
"name": "itemB",
"unit": "ltr",
"price": "20"
}
]
}
DELETE : Mongodb Query to delete ItemB:
db.getCollection('test').update(
{"company.companyId":"1","company.location.city":"Frankfurt"},
{$pull : {"items" : {"name":"itemB"}}}
)
FIND: Find query for itemB:
db.getCollection('test').find(
{"company.companyId":"1","company.location.city":"Frankfurt","items.name":"itemB"},
{ "items.$": 1 }
)
3.UPDATE : update query for itemB:
db.getCollection('test').update
(
{"company.companyId":"1","company.location.city":"Frankfurt","items.name":"itemB"},
{ $set: { "items.$[].price" : 90 }},
{ multi: true });
I have the following sub-documents:
experiences: [
{
"workExperienceId" : ObjectId("59f8064e68d1f61441bec94a"),
"workType" : "Full Time",
"functionalArea" : "Law",
"company" : "Company A",
"title" : "new",
"from" : ISODate("2010-10-13T00:00:00.000Z"),
"to" : ISODate("2012-10-13T00:00:00.000Z"),
"_id" : ObjectId("59f8064e68d1f61441bec94b"),
"currentlyWorking" : false
},
...
...
{
"workExperienceId" : ObjectId("59f8064e68d1f61441bec94a"),
"workType" : "Full Time",
"functionalArea" : "Law",
"company" : "Company A",
"title" : "new",
"from" : ISODate("2014-10-14T00:00:00.000Z"),
"to" : ISODate("2015-12-13T00:00:00.000Z"),
"_id" : ObjectId("59f8064e68d1f61441bec94c"),
"currentlyWorking" : false
},
{
"workExperienceId" : ObjectId("59f8064e68d1f61441bec94a"),
"workType" : "Full Time",
"functionalArea" : "Law",
"company" : "Company A",
"title" : "new",
"from" : ISODate("2017-10-13T00:00:00.000Z"),
"to" : null,
"_id" : ObjectId("59f8064e68d1f61441bec94d"),
"currentlyWorking" : true
},
{
"workExperienceId" : ObjectId("59f8064e68d1f61441bec94a"),
"workType" : "Full Time",
"functionalArea" : "Law",
"company" : "Company A",
"title" : "new",
"from" : ISODate("2008-10-14T00:00:00.000Z"),
"to" : ISODate("2009-12-13T00:00:00.000Z"),
"_id" : ObjectId("59f8064e68d1f61441bec94c"),
"currentlyWorking" : false
},
]
As you see, there may not be date ordered within sequential date and maybe a non ordered date. Above data is for each user. So what I want is to get total years of experience for each user in year format. When to field is null and currentlyWorking is true then it means that I am currently working on that company.
Aggregation
Using the aggregation framework you could apply $indexOfArray where you have it available:
Model.aggregate([
{ "$addFields": {
"difference": {
"$subtract": [
{ "$cond": [
{ "$eq": [{ "$indexOfArray": ["$experiences.to", null] }, -1] },
{ "$max": "$experiences.to" },
new Date()
]},
{ "$min": "$experiences.from" }
]
}
}}
])
Failing that as long as the "latest" is always the last in the array, using $arrayElemAt:
Model.aggregate([
{ "$addFields": {
"difference": {
"$subtract": [
{ "$cond": [
{ "$eq": [{ "$arrayElemAt": ["$experiences.to", -1] }, null] },
new Date(),
{ "$max": "$experiences.to" }
]},
{ "$min": "$experiences.from" }
]
}
}}
])
That's pretty much the most efficient ways to do this, as a single pipeline stage applying $min and $max operators. For $indexOfArray you would need MongoDB 3.4 at least, and for simply using $arrayElemAt you can have MongoDB 3.2, which is the minimal version you should be running in production environments anyway.
One pass, means it gets done fast with little overhead.
The brief parts are the $min and $max allow you to extract the appropriate values directly from the array elements, being the "smallest" value of "from" and the largest value of "to" within the array. Where available the $indexOfArray operator can return the matched index from a provided array ( in this case from "to" values ) where a specified value ( as null here ) exists. If it's there the index of that value is returned, and where it is not the value of -1 is returned indicating that it is not found.
We use $cond which is a "ternary" or if..then..else operator to determine that when the null is not found then you want the $max value from "to". Of course when it is found this is the else where the value of the current Date which is fed into the aggregation pipeline as an external parameter on execution is returned instead.
The alternate case for a MongoDB 3.2 is that you instead "presume" the last element of your array is the most recent employment history item. In generally would be best practice to order these items so the most recent was either the "last" ( as seems to be indicated in your question ) or the "first" entry of the array. It would be logical to keep these entries in such order as opposed to relying on sorting the list at runtime.
So when using a "known" position such as "last", we can use the $arrayElemAt operator to return the value from the array at the specified position. Here it is -1 for the "last" element. The "first" element would be 0, and could arguably be applied to geting the "smallest" value of "from" as well, since you should have your array in order. Again $cond is used to transpose the values depending on whether null is returned. As an alternate to $max you can even use $ifNull to swap the values instead:
Model.aggregate([
{ "$addFields": {
"difference": {
"$subtract": [
{ "$ifNull": [{ "$arrayElemAt": ["$experiences.to", -1] }, new Date()] },
{ "$min": "$experiences.from" }
]
}
}}
])
That operator essentially switches out the values returned if the response of the first condition is null. So since we are grabbing the value from the "last" element already, we can "presume" that this does mean the "largest" value of "to".
The $subtract is what actually returns the "difference", since when you "subtract" one date from another the difference is returned as the milliseconds value between the two. This is how BSON Dates are actually internally stored, and it's the common internal date storage of date formats being the "milliseconds since epoch".
If you want the interval in a specific duration such as "years", then it's a simple matter of applying the "date math" to change from the milliseconds difference between the date values. So adjust by dividing out from the interval ( also showing $arrayElemAt on the "from" just for completeness ):
Model.aggregate([
{ "$addFields": {
"difference": {
"$floor": {
"$divide": [
{ "$subtract": [
{ "$ifNull": [{ "$arrayElemAt": ["$experiences.to", -1] }, new Date()] },
{ "$arrayElemAt": ["$experiences.from", 0] }
]},
1000 * 60 * 60 * 24 * 365
]
}
}
}}
])
That uses $divide as a math operator and 1000 milliseconds 60 for each of seconds and minutes, 24 hours and 365 days as the divisor value. The $floor "rounds down" the number from decimal places. You can do whatever you want there, but it "should" be used "inline" and not in separate stages, which simply add to processing overhead.
Of course, the presumption of 365 days is an "approximation" at best. If you want something more complete, then you can instead apply the date aggregation operators to the values to get a more accurate reading. So here, also applying $let to declare as "variables" for later manipulation:
Model.aggregate([
{ "$addFields": {
"difference": {
"$let": {
"vars": {
"to": { "$ifNull": [{ "$arrayElemAt": ["$experiences.to", -1] }, new Date()] },
"from": { "$arrayElemAt": ["$experiences.from", 0] }
},
"in": {
"years": {
"$subtract": [
{ "$subtract": [
{ "$year": "$$to" },
{ "$year": "$$from" }
]},
{ "$cond": {
"if": { "$gt": [{ "$month": "$$to" },{ "$month": "$$from" }] },
"then": 0,
"else": 1
}}
]
},
"months": {
"$add": [
{ "$subtract": [
{ "$month": "$$to" },
{ "$month": "$$from" }
]},
{ "$cond": {
"if": { "$gt": [{ "$month": "$$to" },{ "$month": "$$from" }] },
"then": 0,
"else": 12
}}
]
},
"days": {
"$add": [
{ "$subtract": [
{ "$dayOfYear": "$$to" },
{ "$dayOfYear": "$$from" }
]},
{ "$cond": {
"if": { "$gt": [{ "$month": "$$to" },{ "$month": "$$from" }] },
"then": 0,
"else": 365
}}
]
}
}
}
}
}}
])
Again that's a slight approximation on the days of the year. MongoDB 3.6 actually would allow you to test the "leap year" by implementing $dateFromParts to determine if 29th February was valid in the current year or not by assembling from the "pieces" we have available.
Work with returned data
Of course all the above is using the aggregation framework to determine the intervals from the array for each person. This would be the advised course if you were intending to "reduce" the data returned by essentially not returning the array items at all, or if you wanted these numbers for further aggregation in reporting to a larger "sum" or "average" statistic from the data.
If on the other hand you actually do want all the data returned for the person including the complete "experiences" array, then it's probably the best course of action to simply apply the calculations "after" all the data is returned from the server as you process each item returned.
The simple application of this would be to "merge" a new field into the results, just like $addFields does but on the "client" side instead:
Model.find().lean().cursor().map( doc =>
Object.assign(doc, {
"difference":
((doc.experiences.map(e => e.to).indexOf(null) === -1)
? Math.max.apply(null, doc.experiences.map(e => e.to))
: new Date() )
- Math.min.apply(null, doc.experiences.map(e => e.from)
})
).toArray((err, result) => {
// do something with result
})
That's just applying the same logic represented in the first aggregation example to a "client" side processing of the result cursor. Since you are using mongoose, the .cursor() method actually returns us a Cursor object from the underlying driver, of which mongoose normally hides away for "convenience". Here we want it because it gives us access to some handy methods.
The Cursor.map() is one such handy method which allows use to apply a "transform" on the content returned from the server. Here we use Object.assign() to "merge" a new property to the returned document. We could alternately use Array.map() on the "array" returned by mongoose by "default", but processing inline looks a little cleaner, as well as being a bit more efficient.
In fact Array.map() is the main tool here in manipulation since where we applied statements like "$experiences.to" in the aggregation statement, we apply on the "client" using doc.experiences.map(e => e.to), which does the same thing "transforming" the array of objects into an "array of values" for the specified field instead.
This allows the same checking using Array.indexOf() against the array of values, and also the Math.min() and Math.max() are used in the same way, implementing apply() to use those "mapped" array values as the argument values to the functions.
Finally of course since we still have a Cursor being returned, we convert this back into the more typical form you would work with mongoose results as an "array" using Cursor.toArray(), which is exactly what mongoose does "under the hood" for you on it's default requests.
The Query.lean() us a mongoose modifier which basically says to return and expect "plain JavaScript Objects" as opposed to "mongoose documents" matched to the schema with applied methods that are again the default return. We want that because we are "manipulating" the result. Again the alternate is to do the manipulation "after" the default array is returned, and convert via .toObject() which is present on all mongoose documents, in the event that "serializing virtual properties" is important to you.
So this is essentially a "mirror" of that first aggregation approach, yet applied to "client side" logic instead. As stated, it generally makes more sense to do it this way when you actually want ALL of the properties in the document in results anyway. The simple reason being that it makes no real since to add "additional" data to the results returned "before" you return those from the server. So instead, simply apply the transform "after" the database returns them.
Also much like above, the same client transformation approaches can be applied as was demonstrated in ALL the aggregation examples. You can even employ external libraries for date manipulation which give you "helpers" for some of the "raw math" approaches here.
you can achieve this with the aggregation framework like this:
db.collection.aggregate([
{
$unwind:"$experiences"
},
{
$sort:{
"experiences.from":1
}
},
{
$group:{
_id:null,
"from":{
$first:"$experiences.from"
},
"to":{
$last:{
$ifNull:[
"$to",
new Date()
]
}
}
}
},
{
$project:{
"diff":{
$subtract:[
"$to",
"$from"
]
}
}
}
])
This returns:
{ "_id" : null, "diff" : NumberLong("65357827142") }
Which is the difference in ms between the two dates, see $subtract for details
You can get the year by adding this additional stage to the end of the pipeline:
{
$project:{
"year":{
$floor:{
$divide:[
"$diff",
1000*60*60*24*365
]
}
}
}
}
This would then return:
{ "_id" : null, "year" : 2 }
i have an api in rest that return the following object :
The object
[
{
"attributes": {
"CodAP" : 1,
"Period": 1991,
"People": 6000,
"Child" : 3000
}
},
{
"attributes": {
"CodAP" : 1,
"Period": 2000,
"People": 5000,
"Child" : 1000
}
}
]
Explanation
I need to add these values in the following sequence in the table:
Versions
I am using the version of lodash : 4.17.4
Also am using vue.js at the version: 2.3.3
Note: I even asked this question but could not express myself in the best way. Now I think it's clearer what I need. follows my previous post: Grouping an object with two attributes
Maybe you mean this?
_.groupBy(obj, (v) => v.attributes.CodAP)
https://jsfiddle.net/wk7zL0gr/
I'm using MongoDB 2.6.6
I have these documents in a MongoDB collection and here is an example:
{ ..., "field3" : { "one" : [ ISODate("2014-03-18T05:47:33Z"),ISODate("2014-06-02T20:00:25Z") ] }, ...}
{ ..., "field3" : { "two" : [ ISODate("2014-03-18T05:47:33Z"),ISODate("2014-06-02T20:00:25Z") ] }, ...}
{ ..., "field3" : { "three" : [ ISODate("2014-03-18T05:47:39Z"),ISODate("2014-03-19T20:18:38Z") ] }, ... }
I would like the merge these documents in one field. For an example, I would like the new result to be as follows:
{ "field3", : { "all" : [ ISODate("2014-03-18T05:47:39Z"),ISODate("2014-03-19T20:18:38Z"),...... ] },}
I'm just not sure any more how to have that result!
Doesn't really leave much to go on here but you can arguably get the kind of merged result with mapReduce:
db.collection.mapReduce(
function() {
var field = this.field3;
Object.keys(field).forEach(function(key) {
field[key].forEach(function(date) {
emit( "field3", { "all": [date] } )
});
});
},
function (key,values) {
var result = { "all": [] };
values.forEach(function(value) {
value.all.forEach(function(date) {
result.all.push( date );
});
});
result.all.sort(function(a,b) { return a.valueOf()-b.valueOf() });
return result;
},
{ "out": { "inline": 1 } }
)
Which being mapReduce is not exactly in the same output format given it's own restrictions for doing things:
{
"results" : [
{
"_id" : "field3",
"value" : {
"all" : [
ISODate("2014-03-18T05:47:33Z"),
ISODate("2014-03-18T05:47:33Z"),
ISODate("2014-03-18T05:47:39Z"),
ISODate("2014-03-19T20:18:38Z"),
ISODate("2014-06-02T20:00:25Z"),
ISODate("2014-06-02T20:00:25Z")
]
}
}
],
"timeMillis" : 86,
"counts" : {
"input" : 3,
"emit" : 6,
"reduce" : 1,
"output" : 1
},
"ok" : 1
}
Since the aggregation here into a single document is fairly arbitrary you could pretty much argue that you simply take the same kind of approach in client code.
At any rate this is only going to be useful over a relatively small set of data with next to the same sort of restrictions on the client processing. More than the 16MB BSON limit for MongoDB, but certainly limited by memory to be consumed.
So I presume you would need to add a "query" argument but it's not really clear from your question. Either using mapReduce or your client code, you are basically going to need to follow this sort of process to "mash" the arrays together.
I would personally go with the client code here.
I have multiple documents with this schema, each document is per product per day:
{
_id:{},
app_id:'DHJFK67JDSJjdasj909',
date:'2014-08-07',
event_count:32423,
event_count_per_type: {
0:322,
10:4234,
20:653,
30:7562
}
}
I would like to get the sum of each event_type for a particular date_range.
This is the output I am looking for where each event type has been summed across all the documents. The keys for event_count_per_type can be anything, so I need something that can loop through each of them as opposed to be having to be implicit with their names.
{
app_id:'DHJFK67JDSJjdasj909',
event_count:324236456,
event_count_per_type: {
0:34234222,
10:242354,
20:456476,
30:56756
}
}
I have been trying several queries so far, this is the best I have got so far but the sub document values are not summed:
db.events.aggregate(
{
$match: {app_id:'DHJFK67JDSJjdasj909'}
},
{
$group: {
_id: {
app_id:'$app_id',
},
event_count: {$sum:'$event_count'},
event_count_per_type: {$sum:'$event_count_per_type'}
}
},
{
$project: {
_id:0,
app_id:'$_id.app_id',
event_count:1,
event_count_per_type:1
}
}
)
The output I am seeing is a value of 0 for the event_count_per_type key, instead of an object. I could modify the schema so the keys are on the top level of the document but that will still mean that I need to have an entry in the group statement for each key, which as I do not know what the key names will be I cannot do.
Any help would be appreciated, I am willing to change my schema if need be and also to try mapReduce (although from the documentation it seems like the performance is bad.)
As stated, processing documents like this is not possible with the aggregation framework unless you are actually going to supply all of the keys, such as:
db.events.aggregate([
{ "$group": {
"_id": "$app_id",
"event_count": { "$sum": "$event_count" },
"0": { "$sum": "$event_count_per_type.0" },
"10": { "$sum": "$event_count_per_type.10" }
"20": { "$sum": "$event_count_per_type.20" }
"30": { "$sum": "$event_count_per_type.30" }
}}
])
But you do of course have to explicitly specify every key you wish to work on. This is true of both the aggregation framework and general query operations in MongoDB, as to access elements notated in this "sub-document" form you need to specify the "exact path" to the element in order to do anything with it.
The aggregation framework and general queries have no concept of "traversal", which mean they cannot process "each key" of a document. That requires a language construct in order to do which is not provided in these interfaces.
Generally speaking though, using a "key name" as a data point where it's name actually represents a "value" is a bit of an "anti-pattern". A better way to model this would be to use an array and represent your "type" as a value by itself:
{
"app_id": "DHJFK67JDSJjdasj909",
"date: ISODate("2014-08-07T00:00:00.000Z"),
"event_count": 32423,
"events": [
{ "type": 0, "value": 322 },
{ "type": 10, "value": 4234 },
{ "type": 20, "value": 653 },
{ "type": 30, "value": 7562 }
]
}
Also noting that the "date" is now a proper date object rather than a string, which is also something that is good practice to do. This sort of data though is easy to process with the aggregation framework:
db.events.aggregate([
{ "$unwind": "$events" },
{ "$group": {
"_id": {
"app_id": "$app_id",
"type": "$events.type"
},
"event_count": { "$sum": "$event_count" },
"value": { "$sum": "$value" }
}},
{ "$group": {
"_id": "$_id.app_id",
"event_count": { "$sum": "$event_count" },
"events": { "$push": { "type": "$_id.type", "value": "$value" } }
}}
])
That shows a two stage grouping that first gets the totals per "type" without specifying each "key" since you no longer have to, then returns as a single document per "app_id" with the results in an array as they were originally stored. This data form is generally much more flexible for looking at certain "types" or even the "values" within a certain range.
Where you cannot change the structure then your only option is mapReduce. This allows you to "code" the traversal of the keys, but since this requires JavaScript interpretation and execution it is not as fast as the aggregation framework:
db.events.mapReduce(
function() {
emit(
this.app_id,
{
"event_count": this.event_count,
"event_count_per_type": this.event_count_per_type
}
);
},
function(key,values) {
var reduced = { "event_count": 0, "event_count_per_type": {} };
values.forEach(function(value) {
for ( var k in value.event_count_per_type ) {
if ( !redcuced.event_count_per_type.hasOwnProperty(k) )
reduced.event_count_per_type[k] = 0;
reduced.event_count_per_type += value.event_count_per_type;
}
reduced.event_count += value.event_count;
})
},
{
"out": { "inline": 1 }
}
)
That will essentially traverse and combine the "keys" and sum up the values for each one found.
So you options are either:
Change the structure and work with standard queries and aggregation.
Stay with the structure and require JavaScript processing and mapReduce.
It depends on your actual needs, but in most cases restructuring yields benefits.