Related
I have a document that holds lists containing nested objects. The document simplified looks like this:
{
"username": "user",
"listOne": [
{
"name": "foo",
"qnty": 5
},
{
"name": "bar",
"qnty": 3
},
],
"listTwo": [
{
"id": 1,
"qnty": 13
},
{
"id": 2,
"qnty": 9
},
]
}
And I need to update quantity in lists based on an indentifier. For list one it was easy. I was doing something like this:
db.collection.findOneAndUpdate(
{
"username": "user",
"listOne.name": name
},
{
$inc: {
"listOne.$.qnty": qntyChange,
}
}
)
Then I would catch whenever find failed because there was no object in the list with that name and nothing was updated, and do a new operation with $push. Since this is a rarer case, it didn't bother me to do two queries in the database collection.
But now I had to also add list two to the document. And since the identifiers are not the same I would have to query them individually. Meaning four searches in the database collection, in the worst case scenario, if using the same strategy I was using before.
So, to avoid this, I wrote an update using an aggregation pipeline. What it does is:
Look if there is an object in the list one with the queried identifier.
If true, map through the entire array and:
2.1) Return the same object if the identifier is different.
2.2) Return object with the quantity changed when identifier matches.
If false, push a new object with this identifier to the list.
Repeat for list two
This is the pipeline for list one:
db.coll1.updateOne(
{
"username": "user"
},
[{
"$set": {
"listOne": {
"$cond": {
"if": {
"$in": [
name,
"$listOne.name"
]
},
"then": {
"$map": {
"input": "$listOne",
"as": "one",
"in": {
"$cond": {
"if": {
"$eq": [
"$$one.name",
name
]
},
"then": {
"$mergeObjects": [
"$$one",
{
"qnty": {
"$add": [
"$$one.qnty",
qntyChange
]
}
}
]
},
"else": "$$one"
}
}
}
},
"else": {
"$concatArrays": [
"$listOne",
[
{
"name": name,
"qnty": qntyChange
}
]
]
}
}
}
}
}]
);
Entire pipeline can be foun on this Mongo Playgorund.
So my question is about how efficient is this. As I am paying for server time, I would like to use an efficient solution to this problem. Querying the collection four times, or even just twice but at every call, seems like a bad idea, as the collection will have thousands of entries. The two lists, on the other hand, are not that big, and should not exceed a thousand elements each. But the way it's written it looks like it will iterate over each list about two times.
And besides, what worries me the most, is if when I use map to change the list and return the same object, in cases where the identifier does not match, does MongoDB rewrite these elements too? Because not only would that increase my time on the server rewriting the entire list with the same objects, but it would also count towards the bytes size of my write operation, which are also charged by MongoDB.
So if anyone has a better solution to this, I'm all ears.
According to this SO answer,
What you actually do inside of the document (push around an array, add a field) should not have any significant impact on the total cost of the operation
So, in your case, your array operations should not be causing a heavy impact on the total cost.
I'm creating a query but i came a section when i don't have idea that how do it. I have one array that have for example two items
//filter array
const filterArray=r.expr(['parking', 'pool'])
and also i have one table with follows records:
[
{
"properties": {
"facilities": [
"parking"
],
"name": "Suba"
}
},
{
"properties": {
"facilities": [
"parking",
"pool",
"pet friendly"
],
"name": "Kennedy",
}
},
{
"properties": {
"facilities": [
"parking",
"pool"
],
"name": "Soacha"
}
},
{
"properties": {
"facilities": [
"parking",
"pet friendly",
"GYM"
],
"name": "Sta Librada"
}
},
]
I need filter the records with the array but i need that record has all items of array filter. If the record has more item of array filter not is problem, i need if contains all items of array filter get that record. On this case I need all records that have the facilities "pool" and "parking"
Current query
Current query but it also return records with one or two items of the filter array
r.db('aucroom').table('hosts')
.filter(host=>
host('properties')('facilities').contains(val=>{
return filterArray.contains(val2=>val2.eq(val))
})
)
.orderBy('properties')
.pluck(['properties'])
results that I desire wait
Like the image example:
If you want a strict match of two arrays (same number of elements, same order), then use .eq()
array1.eq(array2)
If you want the first array to contain all elements of the second array, then use .setIntersection(), just note array2 should contain distinct elements (a set):
array1.setIntersection(array2).eq(array2)
I have the following sub-documents:
experiences: [
{
"workExperienceId" : ObjectId("59f8064e68d1f61441bec94a"),
"workType" : "Full Time",
"functionalArea" : "Law",
"company" : "Company A",
"title" : "new",
"from" : ISODate("2010-10-13T00:00:00.000Z"),
"to" : ISODate("2012-10-13T00:00:00.000Z"),
"_id" : ObjectId("59f8064e68d1f61441bec94b"),
"currentlyWorking" : false
},
...
...
{
"workExperienceId" : ObjectId("59f8064e68d1f61441bec94a"),
"workType" : "Full Time",
"functionalArea" : "Law",
"company" : "Company A",
"title" : "new",
"from" : ISODate("2014-10-14T00:00:00.000Z"),
"to" : ISODate("2015-12-13T00:00:00.000Z"),
"_id" : ObjectId("59f8064e68d1f61441bec94c"),
"currentlyWorking" : false
},
{
"workExperienceId" : ObjectId("59f8064e68d1f61441bec94a"),
"workType" : "Full Time",
"functionalArea" : "Law",
"company" : "Company A",
"title" : "new",
"from" : ISODate("2017-10-13T00:00:00.000Z"),
"to" : null,
"_id" : ObjectId("59f8064e68d1f61441bec94d"),
"currentlyWorking" : true
},
{
"workExperienceId" : ObjectId("59f8064e68d1f61441bec94a"),
"workType" : "Full Time",
"functionalArea" : "Law",
"company" : "Company A",
"title" : "new",
"from" : ISODate("2008-10-14T00:00:00.000Z"),
"to" : ISODate("2009-12-13T00:00:00.000Z"),
"_id" : ObjectId("59f8064e68d1f61441bec94c"),
"currentlyWorking" : false
},
]
As you see, there may not be date ordered within sequential date and maybe a non ordered date. Above data is for each user. So what I want is to get total years of experience for each user in year format. When to field is null and currentlyWorking is true then it means that I am currently working on that company.
Aggregation
Using the aggregation framework you could apply $indexOfArray where you have it available:
Model.aggregate([
{ "$addFields": {
"difference": {
"$subtract": [
{ "$cond": [
{ "$eq": [{ "$indexOfArray": ["$experiences.to", null] }, -1] },
{ "$max": "$experiences.to" },
new Date()
]},
{ "$min": "$experiences.from" }
]
}
}}
])
Failing that as long as the "latest" is always the last in the array, using $arrayElemAt:
Model.aggregate([
{ "$addFields": {
"difference": {
"$subtract": [
{ "$cond": [
{ "$eq": [{ "$arrayElemAt": ["$experiences.to", -1] }, null] },
new Date(),
{ "$max": "$experiences.to" }
]},
{ "$min": "$experiences.from" }
]
}
}}
])
That's pretty much the most efficient ways to do this, as a single pipeline stage applying $min and $max operators. For $indexOfArray you would need MongoDB 3.4 at least, and for simply using $arrayElemAt you can have MongoDB 3.2, which is the minimal version you should be running in production environments anyway.
One pass, means it gets done fast with little overhead.
The brief parts are the $min and $max allow you to extract the appropriate values directly from the array elements, being the "smallest" value of "from" and the largest value of "to" within the array. Where available the $indexOfArray operator can return the matched index from a provided array ( in this case from "to" values ) where a specified value ( as null here ) exists. If it's there the index of that value is returned, and where it is not the value of -1 is returned indicating that it is not found.
We use $cond which is a "ternary" or if..then..else operator to determine that when the null is not found then you want the $max value from "to". Of course when it is found this is the else where the value of the current Date which is fed into the aggregation pipeline as an external parameter on execution is returned instead.
The alternate case for a MongoDB 3.2 is that you instead "presume" the last element of your array is the most recent employment history item. In generally would be best practice to order these items so the most recent was either the "last" ( as seems to be indicated in your question ) or the "first" entry of the array. It would be logical to keep these entries in such order as opposed to relying on sorting the list at runtime.
So when using a "known" position such as "last", we can use the $arrayElemAt operator to return the value from the array at the specified position. Here it is -1 for the "last" element. The "first" element would be 0, and could arguably be applied to geting the "smallest" value of "from" as well, since you should have your array in order. Again $cond is used to transpose the values depending on whether null is returned. As an alternate to $max you can even use $ifNull to swap the values instead:
Model.aggregate([
{ "$addFields": {
"difference": {
"$subtract": [
{ "$ifNull": [{ "$arrayElemAt": ["$experiences.to", -1] }, new Date()] },
{ "$min": "$experiences.from" }
]
}
}}
])
That operator essentially switches out the values returned if the response of the first condition is null. So since we are grabbing the value from the "last" element already, we can "presume" that this does mean the "largest" value of "to".
The $subtract is what actually returns the "difference", since when you "subtract" one date from another the difference is returned as the milliseconds value between the two. This is how BSON Dates are actually internally stored, and it's the common internal date storage of date formats being the "milliseconds since epoch".
If you want the interval in a specific duration such as "years", then it's a simple matter of applying the "date math" to change from the milliseconds difference between the date values. So adjust by dividing out from the interval ( also showing $arrayElemAt on the "from" just for completeness ):
Model.aggregate([
{ "$addFields": {
"difference": {
"$floor": {
"$divide": [
{ "$subtract": [
{ "$ifNull": [{ "$arrayElemAt": ["$experiences.to", -1] }, new Date()] },
{ "$arrayElemAt": ["$experiences.from", 0] }
]},
1000 * 60 * 60 * 24 * 365
]
}
}
}}
])
That uses $divide as a math operator and 1000 milliseconds 60 for each of seconds and minutes, 24 hours and 365 days as the divisor value. The $floor "rounds down" the number from decimal places. You can do whatever you want there, but it "should" be used "inline" and not in separate stages, which simply add to processing overhead.
Of course, the presumption of 365 days is an "approximation" at best. If you want something more complete, then you can instead apply the date aggregation operators to the values to get a more accurate reading. So here, also applying $let to declare as "variables" for later manipulation:
Model.aggregate([
{ "$addFields": {
"difference": {
"$let": {
"vars": {
"to": { "$ifNull": [{ "$arrayElemAt": ["$experiences.to", -1] }, new Date()] },
"from": { "$arrayElemAt": ["$experiences.from", 0] }
},
"in": {
"years": {
"$subtract": [
{ "$subtract": [
{ "$year": "$$to" },
{ "$year": "$$from" }
]},
{ "$cond": {
"if": { "$gt": [{ "$month": "$$to" },{ "$month": "$$from" }] },
"then": 0,
"else": 1
}}
]
},
"months": {
"$add": [
{ "$subtract": [
{ "$month": "$$to" },
{ "$month": "$$from" }
]},
{ "$cond": {
"if": { "$gt": [{ "$month": "$$to" },{ "$month": "$$from" }] },
"then": 0,
"else": 12
}}
]
},
"days": {
"$add": [
{ "$subtract": [
{ "$dayOfYear": "$$to" },
{ "$dayOfYear": "$$from" }
]},
{ "$cond": {
"if": { "$gt": [{ "$month": "$$to" },{ "$month": "$$from" }] },
"then": 0,
"else": 365
}}
]
}
}
}
}
}}
])
Again that's a slight approximation on the days of the year. MongoDB 3.6 actually would allow you to test the "leap year" by implementing $dateFromParts to determine if 29th February was valid in the current year or not by assembling from the "pieces" we have available.
Work with returned data
Of course all the above is using the aggregation framework to determine the intervals from the array for each person. This would be the advised course if you were intending to "reduce" the data returned by essentially not returning the array items at all, or if you wanted these numbers for further aggregation in reporting to a larger "sum" or "average" statistic from the data.
If on the other hand you actually do want all the data returned for the person including the complete "experiences" array, then it's probably the best course of action to simply apply the calculations "after" all the data is returned from the server as you process each item returned.
The simple application of this would be to "merge" a new field into the results, just like $addFields does but on the "client" side instead:
Model.find().lean().cursor().map( doc =>
Object.assign(doc, {
"difference":
((doc.experiences.map(e => e.to).indexOf(null) === -1)
? Math.max.apply(null, doc.experiences.map(e => e.to))
: new Date() )
- Math.min.apply(null, doc.experiences.map(e => e.from)
})
).toArray((err, result) => {
// do something with result
})
That's just applying the same logic represented in the first aggregation example to a "client" side processing of the result cursor. Since you are using mongoose, the .cursor() method actually returns us a Cursor object from the underlying driver, of which mongoose normally hides away for "convenience". Here we want it because it gives us access to some handy methods.
The Cursor.map() is one such handy method which allows use to apply a "transform" on the content returned from the server. Here we use Object.assign() to "merge" a new property to the returned document. We could alternately use Array.map() on the "array" returned by mongoose by "default", but processing inline looks a little cleaner, as well as being a bit more efficient.
In fact Array.map() is the main tool here in manipulation since where we applied statements like "$experiences.to" in the aggregation statement, we apply on the "client" using doc.experiences.map(e => e.to), which does the same thing "transforming" the array of objects into an "array of values" for the specified field instead.
This allows the same checking using Array.indexOf() against the array of values, and also the Math.min() and Math.max() are used in the same way, implementing apply() to use those "mapped" array values as the argument values to the functions.
Finally of course since we still have a Cursor being returned, we convert this back into the more typical form you would work with mongoose results as an "array" using Cursor.toArray(), which is exactly what mongoose does "under the hood" for you on it's default requests.
The Query.lean() us a mongoose modifier which basically says to return and expect "plain JavaScript Objects" as opposed to "mongoose documents" matched to the schema with applied methods that are again the default return. We want that because we are "manipulating" the result. Again the alternate is to do the manipulation "after" the default array is returned, and convert via .toObject() which is present on all mongoose documents, in the event that "serializing virtual properties" is important to you.
So this is essentially a "mirror" of that first aggregation approach, yet applied to "client side" logic instead. As stated, it generally makes more sense to do it this way when you actually want ALL of the properties in the document in results anyway. The simple reason being that it makes no real since to add "additional" data to the results returned "before" you return those from the server. So instead, simply apply the transform "after" the database returns them.
Also much like above, the same client transformation approaches can be applied as was demonstrated in ALL the aggregation examples. You can even employ external libraries for date manipulation which give you "helpers" for some of the "raw math" approaches here.
you can achieve this with the aggregation framework like this:
db.collection.aggregate([
{
$unwind:"$experiences"
},
{
$sort:{
"experiences.from":1
}
},
{
$group:{
_id:null,
"from":{
$first:"$experiences.from"
},
"to":{
$last:{
$ifNull:[
"$to",
new Date()
]
}
}
}
},
{
$project:{
"diff":{
$subtract:[
"$to",
"$from"
]
}
}
}
])
This returns:
{ "_id" : null, "diff" : NumberLong("65357827142") }
Which is the difference in ms between the two dates, see $subtract for details
You can get the year by adding this additional stage to the end of the pipeline:
{
$project:{
"year":{
$floor:{
$divide:[
"$diff",
1000*60*60*24*365
]
}
}
}
}
This would then return:
{ "_id" : null, "year" : 2 }
How to refer to each property of an object in an array of objects in MongoDB MapReduce JavaScript query?
Here is my data:
{
"_id": ObjectId("544ae3de7a6025f0470041a7"),
"name": "Bundle 4",
"product_groups": [
{
"name": "camera group",
"products": [
{
"$ref": "products",
"$id": ObjectId("531a2fcd26718dbd3200002a"),
"$db": "thisDB"
},
{
"$ref": "products",
"$id": ObjectId("538baf7c26718d0a55000043"),
"$db": "thisDB"
},
{
"$ref": "products",
"$id": ObjectId("538baf7c26718d0a55000045"),
"$db": "thisDB"
}
]
},
{
"name": "lens group",
"products": [
{
"$ref": "products",
"$id": ObjectId("531e3ce926718d0d45000112"),
"$db": "thisDB"
},
{
"$ref": "products",
"$id": ObjectId("531e3ce926718d0d45000113"),
"$db": "thisDB"
}
]
}
]
}
Here is my map function: (for simplicity I took out the reduce option since it doesn't matter if the map doesn't work right)
var map = function() { emit(this.product_groups, this.product_groups.products); };
db.instant_rebates.mapReduce(
map,
{
out: "map_reduce_example",
query: {"_id": ObjectId("544ae3de7a6025f0470041a7")}
}
);
However the problem is that the "value" field in the result always comes up as "undefined". Why? Why doesn't this.product_groups.products return the products array? How do I fix this?
Also, I want it to do is to emit TWICE, once for each of the two product_groups. But so far it only emits ONCE. How do I fix that?
Under mapReduce operations the documents are presented as JavaScript objects so you need to treat them as such and traverse them. That means processing each member of the array:
var map = function() {
this.product_groups.forEach(function(group) {
emit( group.name, { products: group.products } );
});
};
var reduce = function(){};
db.instant_rebates.mapReduce(
map,
reduce,
{
out: "map_reduce_example",
query: {"_id": ObjectId("544ae3de7a6025f0470041a7")}
}
);
The requirements of the "emit" function is both a "key" and a "value" to be presented as arguments which are emitted. The "value" must be singular therefore to emit an "array" of data you need to wrap this under the property of an object. The "key" must be a singular value as it's intent to it be used as the "grouping key" in the reduce operation, and the "name" field should be sufficient at least for example.
Naturally since there is a top level array in the document you process "each element" as is done with the function, and then each result is "emitted" so there are "two" results emitted from this one document.
You also need to at least define a "reduce" function even if it never gets called because all of the emitted keys are different, as is the case here.
So, it's JavaScript. Treat a list structure as a list.
Please note. This is all your question is about. If you want to ask further questions on mapReduce then please ask other questions. Don't ask any more of this one. I don't want to talk about your field naming, or go into detail of how this seems to be working towards "how do I pull in data from the other collection", which is something you cannot do.
After reading many Stackoverflow questions, blogs, and documentation I still cannot figure out why this particular iteration over any array is not working.
I am using jQuery and javascript(obviously) to pull a GeoJSON file and then going over the properties of the resulting object to pull desired key/value pairs. As I find those pairs I want to insert then into another array object. The object is created as I expected however when I attempt to go over the newly created object nothing happens and if I try to find its length it returns a length of 0.
This is where I pull the records:
_recordsFromGeoJSON: function(inputText) {
var retRecords = {},
$.getJSON(this.GeoJSONUrl, function(data) {
var geoJSONdata = data;
$.each(geoJSONdata.features, function(fkey, fvalue) {
$.each(fvalue.properties, function(pkey, pvalue) {
var re = new RegExp(inputText, "i");
var retest = re.test(pvalue);
if (retest) {
retRecords[pvalue] = fvalue.geometry.coordinates;
return;
}
});
});
});
return retRecords;
},
This is the code for the interation over the new object:
for(var key in this._retRecords) {
//this function will never run
var always = foo(bar);
}
Some sample GeoJSON:
{
"type": "FeatureCollection",
"features": [
{ "type": "Feature", "id": 0, "properties": { "NAME": "14 PARK PLACE PH 4", "AREAID": 3.0, "STR12M": 0.0, "CLS12M": 6.0, "STR4M": 0.0, "CLS4M": 0.0, "TOTAL": 164.0, "OCC": 112.0, "NFU": 0.0, "UNC": 3.0, "DVL": 49.0, "UDVL": 0.0 }, "geometry": { "type": "Point", "coordinates": [ -93.27512816536759, 37.044305883435001 ] } }
,
{ "type": "Feature", "id": 1, "properties": { "NAME": "ALPHA MEADOWS NORTH", "AREAID": 8.0, "STR12M": 0.0, "CLS12M": 0.0, "STR4M": 0.0, "CLS4M": 0.0, "TOTAL": 12.0, "OCC": 0.0, "NFU": 0.0, "UNC": 0.0, "DVL": 0.0, "UDVL": 0.0 }, "geometry": { "type": "Point", "coordinates": [ -92.839131163095786, 37.119205483765143 ] } }
]
}
When I console.log(this._retRecords); Chrome reports shows the object with all the properties I expected from the dataset:
Object
14 PARK PLACE PH 4: Array[2]
0: -93.27512816536759
1: 37.044305883435
length: 2
__proto__: Array[0]
ALPHA MEADOWS NORTH: Array[2]
0: -92.839131163095786
1: 37.119205483765143
length: 2
__proto__: Array[0]
Using both methods given on this question report 0 length.
I am quite certain I am missing something fundamental but I cannot find what it is. Any help, criticism, alternative methods would be great!
It appears that you don't understand that your getJSON() function starts immediately (e.g. sends the request) and then returns immediately long before the getJSON function has completed it's work. It's work will be done sometime later when the completion function is called. Thus retRecords is not yet populated when the _recordsFromGeoJSON() function returns.
This is asynchronous programming. The completion function for getJSON will be called sometime LATER, long after _recordsFromGeoJSON() returns. Thus, you cannot treat it like synchronous, serial programming.
Instead, retRecords is only known in the completion function or in any function you pass the data to and call from that completion function. This is how asynchronous programming works in javascript. You must initiate all further processing of the getJSON() result from the completion function. And, you can't return the result from _recordsFromGeoJSON() because the result is not yet known when that function returns. This is a different way of coding and it a bit of a pain, but it is how you have to deal with asynchronous operations in javascript.