MongoDB Finding duplicates sharing multiple fields using MapReduce - javascript

I am trying to find duplicates in a Mongo version 2.4 database that is being used for production and therefore cannot be updated. Since aggregate does not exist in 2.4, I cannot use the aggregate pipeline to find duplicates, therefore I am trying to find a solution using MapReduce.
I have tried the following set of map, reduce, and finalize functions, through MongoVUE's Map Reduce interface, and they returned nothing after running for less than a second on a 3,000,000 record collection that definitely has duplicates on the indicated fields. Obviously something went wrong, but MongoVUE did not show any error messages or helpful indications.
function Map() {
emit(
{name: this.name, LocationId: this.LocationId,
version: this.version},
{count:1, ScrapeDate: this.ScrapeDate}
);
}
function Reduce(key, values) {
var reduced = {count:0, ScrapeDate:''2000-01-01''};
values.forEach(function(val) {
reduced.count += val.count;
if (reduced.ScrapeDate.localeCompare(val.ScrapeDate) < 0)
reduced.ScrapeDate=val.ScrapeDate;
});
return reduced;
return values[0];
}
function Finalize(key, reduced) {
if (reduced.count > 1)
return reduced;
}
I just need to find any instance of multiple records that share the same name, LocationId, and version, and ideally display the most recent ScrapeDate of such a record.

Your map-reduce code worked without any issues, though for a very small dataset. I think return values[0]; in the reduce function would be a copy paste error. You could try the same through the mongo shell.
Since aggregate does not exist in 2.4, I cannot use the aggregate pipeline to find duplicates, therefore I am trying to find a solution
using MapReduce.
You got it wrong here, db.collection.aggregate(pipeline, options) was introduced in the version 2.2.
Here is how it could be done with the aggregation framework, but it would not be preferred since your dataset is very huge, and the $sort operator has memory limit of 10% of RAM, in v2.4.
db.collection.aggregate(
[
// sort the records, based on the 'ScrapeDate' field, in descending order.
{$sort:{"ScrapeDate":-1}},
// group by the key fields, and take the 'ScrapeDate' of the first document,
// Since it is in sorted order, the first document would contain the
// highest field value.
{$group:{"_id":{"name":"$name","LocationId":"$LocationId","version":"$version"}
,"ScrapeDate":{$first:"$ScrapeDate"}
,"count":{$sum:1}}
},
// output only the group, having documents greater than 1.
{$match:{"count":{$gt:1}}}
]
);
Coming to your Map-reduce functions, it ran without issues on my test data.
db.collection.insert({"name":"c","LocationId":1,"version":1,"ScrapeDate":"2000-01-01"});
db.collection.insert({"name":"c","LocationId":1,"version":1,"ScrapeDate":"2001-01-01"});
db.collection.insert({"name":"c","LocationId":1,"version":1,"ScrapeDate":"2002-01-01"});
db.collection.insert({"name":"d","LocationId":1,"version":1,"ScrapeDate":"2002-01-01"});
running the map-reduce,
db.collection.mapReduce(Map,Reduce,{out:{"inline":1},finalize:Finalize});
o/p:
{
"results" : [
{
"_id" : {
"name" : "c",
"LocationId" : 1,
"version" : 1
},
"value" : {
"count" : 3,
"ScrapeDate" : "2002-01-01"
}
},
{
"_id" : {
"name" : "d",
"LocationId" : 1,
"version" : 1
},
"value" : null
}
],
"timeMillis" : 0,
"counts" : {
"input" : 4,
"emit" : 4,
"reduce" : 1,
"output" : 2
},
"ok" : 1,
}
Notice that the output contains value:null for a record which doesn't have any duplicates.
This is due to your finalize function:
function Finalize(key, reduced) {
if (reduced.count > 1)
return reduced; // returned null by default for keys with single value,
// i.e count=1
}
The finalize function do not filter out keys. So you can't get only the keys that are duplicates. You will get all the keys, in the map-reduce output. In your finalize functions, you can just not show their values, which is what you are doing.

Related

Query firebase to return if value more than number

I want to get data from Firebase.
This is more or less my data structure:
"Reports" : {
"N06Jrz5hx6Q9bcVDBBUrF3GKSTp2" : 2,
"eLLfNlWLkTcImTRqrYnU0nWuu9P2" : 2
},
"Users":{
"N06Jrz5hx6Q9bcVDBBUrF3GKSTp2" : {
"completedWorks" : {
...
},
"reports" : {
"-LHs0yxUXn-TQC7z_MJM" : {
"category" : "Niewyraźne zdjęcie",
"creatorID" : "z8DxcXyehgMhRyMqmf6q8LpCYfs1",
"reportedID" : "N06Jrz5hx6Q9bcVDBBUrF3GKSTp2",
"resolved" : false,
"text" : "heh",
"workID" : "-LHs-aZJkAhEf1RHVasg"
},
"-LHs1hzlL4roUJfMlvyA" : {
"category" : "Zdjęcie nie przedstawia zadania",
"creatorID" : "z8DxcXyehgMhRyMqmf6q8LpCYfs1",
"reportedID" : "N06Jrz5hx6Q9bcVDBBUrF3GKSTp2",
"resolved" : false,
"text" : "",
"workID" : "-LHs-aZJkAhEf1RHVasg"
}
},
"userType" : "company",
"verified" : true
},
}
So as you can see the number of reports is listed in the Reports part. How can I make Firebase return only the ids of the users where the report number is over or equal 3?
Something like this (this will not work, but I hope kind of shows what I was thinking about):
firebase.database().ref('Reports').orderBy(whatHere?).moreThen(2).on('value', snap => {
Is this even doable like this? If yes how could I do it? I want to grab the IDs of the users where reports are >= 3
You're looking for orderByValue():
firebase.database().ref('Reports').orderByValue().startAt(3).on('value', snapshot => {
snapshot.forEach(reportSnapshot => {
console.log(reportSnapshot.key);
})
})
Also check out the Firebase documentation on ordering data.
There are two options for doing that but not exactly the way you wants. You have to use javascript for further processing. One is to use limitToLast after using order by. which will give the last numbers from the result.
firebase.database().ref('Reports').orderBy(reportid).limitToLast(2).on('value', snap => {
Or use startAt and endAt to skip and fetch the result as offset which can provide the data between two reportId.
firebase.database().ref('Reports').orderBy(reportid).
.startAt(reportIdStart)
.endAt(reportIdLast)
.limitToLast(15)
According Firebase documentation:
Using startAt(), endAt(), and equalTo() allows you to choose arbitrary
starting and ending points for your queries
To filter data, you can combine any of the limit or range methods with an order-by method when constructing a query.
Unlike the order-by methods, you can combine multiple limit or range
functions. For example, you can combine the startAt() and endAt()
methods to limit the results to a specified range of values.
For more information go through documentation on filtering data

mongodb pull an array from several object in an array of one document [duplicate]

I have a Mongo document which holds an array of elements.
I'd like to reset the .handled attribute of all objects in the array where .profile = XX.
The document is in the following form:
{
"_id": ObjectId("4d2d8deff4e6c1d71fc29a07"),
"user_id": "714638ba-2e08-2168-2b99-00002f3d43c0",
"events": [{
"handled": 1,
"profile": 10,
"data": "....."
} {
"handled": 1,
"profile": 10,
"data": "....."
} {
"handled": 1,
"profile": 20,
"data": "....."
}
...
]
}
so, I tried the following:
.update({"events.profile":10},{$set:{"events.$.handled":0}},false,true)
However it updates only the first matched array element in each document. (That's the defined behaviour for $ - the positional operator.)
How can I update all matched array elements?
With the release of MongoDB 3.6 ( and available in the development branch from MongoDB 3.5.12 ) you can now update multiple array elements in a single request.
This uses the filtered positional $[<identifier>] update operator syntax introduced in this version:
db.collection.update(
{ "events.profile":10 },
{ "$set": { "events.$[elem].handled": 0 } },
{ "arrayFilters": [{ "elem.profile": 10 }], "multi": true }
)
The "arrayFilters" as passed to the options for .update() or even
.updateOne(), .updateMany(), .findOneAndUpdate() or .bulkWrite() method specifies the conditions to match on the identifier given in the update statement. Any elements that match the condition given will be updated.
Noting that the "multi" as given in the context of the question was used in the expectation that this would "update multiple elements" but this was not and still is not the case. It's usage here applies to "multiple documents" as has always been the case or now otherwise specified as the mandatory setting of .updateMany() in modern API versions.
NOTE Somewhat ironically, since this is specified in the "options" argument for .update() and like methods, the syntax is generally compatible with all recent release driver versions.
However this is not true of the mongo shell, since the way the method is implemented there ( "ironically for backward compatibility" ) the arrayFilters argument is not recognized and removed by an internal method that parses the options in order to deliver "backward compatibility" with prior MongoDB server versions and a "legacy" .update() API call syntax.
So if you want to use the command in the mongo shell or other "shell based" products ( notably Robo 3T ) you need a latest version from either the development branch or production release as of 3.6 or greater.
See also positional all $[] which also updates "multiple array elements" but without applying to specified conditions and applies to all elements in the array where that is the desired action.
Also see Updating a Nested Array with MongoDB for how these new positional operators apply to "nested" array structures, where "arrays are within other arrays".
IMPORTANT - Upgraded installations from previous versions "may" have not enabled MongoDB features, which can also cause statements to fail. You should ensure your upgrade procedure is complete with details such as index upgrades and then run
db.adminCommand( { setFeatureCompatibilityVersion: "3.6" } )
Or higher version as is applicable to your installed version. i.e "4.0" for version 4 and onwards at present. This enabled such features as the new positional update operators and others. You can also check with:
db.adminCommand( { getParameter: 1, featureCompatibilityVersion: 1 } )
To return the current setting
UPDATE:
As of Mongo version 3.6, this answer is no longer valid as the mentioned issue was fixed and there are ways to achieve this. Please check other answers.
At this moment it is not possible to use the positional operator to update all items in an array. See JIRA http://jira.mongodb.org/browse/SERVER-1243
As a work around you can:
Update each item individually
(events.0.handled events.1.handled
...) or...
Read the document, do the edits
manually and save it replacing the
older one (check "Update if
Current" if you want to ensure
atomic updates)
What worked for me was this:
db.collection.find({ _id: ObjectId('4d2d8deff4e6c1d71fc29a07') })
.forEach(function (doc) {
doc.events.forEach(function (event) {
if (event.profile === 10) {
event.handled=0;
}
});
db.collection.save(doc);
});
I think it's clearer for mongo newbies and anyone familiar with JQuery & friends.
This can also be accomplished with a while loop which checks to see if any documents remain that still have subdocuments that have not been updated. This method preserves the atomicity of your updates (which many of the other solutions here do not).
var query = {
events: {
$elemMatch: {
profile: 10,
handled: { $ne: 0 }
}
}
};
while (db.yourCollection.find(query).count() > 0) {
db.yourCollection.update(
query,
{ $set: { "events.$.handled": 0 } },
{ multi: true }
);
}
The number of times the loop is executed will equal the maximum number of times subdocuments with profile equal to 10 and handled not equal to 0 occur in any of the documents in your collection. So if you have 100 documents in your collection and one of them has three subdocuments that match query and all the other documents have fewer matching subdocuments, the loop will execute three times.
This method avoids the danger of clobbering other data that may be updated by another process while this script executes. It also minimizes the amount of data being transferred between client and server.
This does in fact relate to the long standing issue at http://jira.mongodb.org/browse/SERVER-1243 where there are in fact a number of challenges to a clear syntax that supports "all cases" where mutiple array matches are found. There are in fact methods already in place that "aid" in solutions to this problem, such as Bulk Operations which have been implemented after this original post.
It is still not possible to update more than a single matched array element in a single update statement, so even with a "multi" update all you will ever be able to update is just one mathed element in the array for each document in that single statement.
The best possible solution at present is to find and loop all matched documents and process Bulk updates which will at least allow many operations to be sent in a single request with a singular response. You can optionally use .aggregate() to reduce the array content returned in the search result to just those that match the conditions for the update selection:
db.collection.aggregate([
{ "$match": { "events.handled": 1 } },
{ "$project": {
"events": {
"$setDifference": [
{ "$map": {
"input": "$events",
"as": "event",
"in": {
"$cond": [
{ "$eq": [ "$$event.handled", 1 ] },
"$$el",
false
]
}
}},
[false]
]
}
}}
]).forEach(function(doc) {
doc.events.forEach(function(event) {
bulk.find({ "_id": doc._id, "events.handled": 1 }).updateOne({
"$set": { "events.$.handled": 0 }
});
count++;
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp();
}
});
});
if ( count % 1000 != 0 )
bulk.execute();
The .aggregate() portion there will work when there is a "unique" identifier for the array or all content for each element forms a "unique" element itself. This is due to the "set" operator in $setDifference used to filter any false values returned from the $map operation used to process the array for matches.
If your array content does not have unique elements you can try an alternate approach with $redact:
db.collection.aggregate([
{ "$match": { "events.handled": 1 } },
{ "$redact": {
"$cond": {
"if": {
"$eq": [ { "$ifNull": [ "$handled", 1 ] }, 1 ]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}}
])
Where it's limitation is that if "handled" was in fact a field meant to be present at other document levels then you are likely going to get unexepected results, but is fine where that field appears only in one document position and is an equality match.
Future releases ( post 3.1 MongoDB ) as of writing will have a $filter operation that is simpler:
db.collection.aggregate([
{ "$match": { "events.handled": 1 } },
{ "$project": {
"events": {
"$filter": {
"input": "$events",
"as": "event",
"cond": { "$eq": [ "$$event.handled", 1 ] }
}
}
}}
])
And all releases that support .aggregate() can use the following approach with $unwind, but the usage of that operator makes it the least efficient approach due to the array expansion in the pipeline:
db.collection.aggregate([
{ "$match": { "events.handled": 1 } },
{ "$unwind": "$events" },
{ "$match": { "events.handled": 1 } },
{ "$group": {
"_id": "$_id",
"events": { "$push": "$events" }
}}
])
In all cases where the MongoDB version supports a "cursor" from aggregate output, then this is just a matter of choosing an approach and iterating the results with the same block of code shown to process the Bulk update statements. Bulk Operations and "cursors" from aggregate output are introduced in the same version ( MongoDB 2.6 ) and therefore usually work hand in hand for processing.
In even earlier versions then it is probably best to just use .find() to return the cursor, and filter out the execution of statements to just the number of times the array element is matched for the .update() iterations:
db.collection.find({ "events.handled": 1 }).forEach(function(doc){
doc.events.filter(function(event){ return event.handled == 1 }).forEach(function(event){
db.collection.update({ "_id": doc._id },{ "$set": { "events.$.handled": 0 }});
});
});
If you are aboslutely determined to do "multi" updates or deem that to be ultimately more efficient than processing multiple updates for each matched document, then you can always determine the maximum number of possible array matches and just execute a "multi" update that many times, until basically there are no more documents to update.
A valid approach for MongoDB 2.4 and 2.2 versions could also use .aggregate() to find this value:
var result = db.collection.aggregate([
{ "$match": { "events.handled": 1 } },
{ "$unwind": "$events" },
{ "$match": { "events.handled": 1 } },
{ "$group": {
"_id": "$_id",
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": null,
"count": { "$max": "$count" }
}}
]);
var max = result.result[0].count;
while ( max-- ) {
db.collection.update({ "events.handled": 1},{ "$set": { "events.$.handled": 0 }},{ "multi": true })
}
Whatever the case, there are certain things you do not want to do within the update:
Do not "one shot" update the array: Where if you think it might be more efficient to update the whole array content in code and then just $set the whole array in each document. This might seem faster to process, but there is no guarantee that the array content has not changed since it was read and the update is performed. Though $set is still an atomic operator, it will only update the array with what it "thinks" is the correct data, and thus is likely to overwrite any changes occurring between read and write.
Do not calculate index values to update: Where similar to the "one shot" approach you just work out that position 0 and position 2 ( and so on ) are the elements to update and code these in with and eventual statement like:
{ "$set": {
"events.0.handled": 0,
"events.2.handled": 0
}}
Again the problem here is the "presumption" that those index values found when the document was read are the same index values in th array at the time of update. If new items are added to the array in a way that changes the order then those positions are not longer valid and the wrong items are in fact updated.
So until there is a reasonable syntax determined for allowing multiple matched array elements to be processed in single update statement then the basic approach is to either update each matched array element in an indvidual statement ( ideally in Bulk ) or essentially work out the maximum array elements to update or keep updating until no more modified results are returned. At any rate, you should "always" be processing positional $ updates on the matched array element, even if that is only updating one element per statement.
Bulk Operations are in fact the "generalized" solution to processing any operations that work out to be "multiple operations", and since there are more applications for this than merely updating mutiple array elements with the same value, then it has of course been implemented already, and it is presently the best approach to solve this problem.
First: your code did not work because you were using the positional operator $ which only identifies an element to update in an array but does not even explicitly specify its position in the array.
What you need is the filtered positional operator $[<identifier>]. It would update all elements that match an array filter condition.
Solution:
db.collection.update({"events.profile":10}, { $set: { "events.$[elem].handled" : 0 } },
{
multi: true,
arrayFilters: [ { "elem.profile": 10 } ]
})
Visit mongodb doc here
What the code does:
{"events.profile":10} filters your collection and return the documents matching the filter
The $set update operator: modifies matching fields of documents it acts on.
{multi:true} It makes .update() modifies all documents matching the filter hence behaving like updateMany()
{ "events.$[elem].handled" : 0 } and arrayFilters: [ { "elem.profile": 10 } ]
This technique involves the use of the filtered positional array with arrayFilters. the filtered positional array here $[elem] acts as a placeholder for all elements in the array fields that match the conditions specified in the array filter.
Array filters
You can update all elements in MongoDB
db.collectioname.updateOne(
{ "key": /vikas/i },
{ $set: {
"arr.$[].status" : "completed"
} }
)
It will update all the "status" value to "completed" in the "arr" Array
If Only one document
db.collectioname.updateOne(
 { key:"someunique", "arr.key": "myuniq" },
 { $set: {
   "arr.$.status" : "completed",
   "arr.$.msgs":  {
        "result" : ""
        }
   
 } }
)
But if not one and also you don't want all the documents in the array to update then you need to loop through the element and inside the if block
db.collectioname.find({findCriteria })
.forEach(function (doc) {
doc.arr.forEach(function (singlearr) {
if (singlearr check) {
singlearr.handled =0
}
});
db.collection.save(doc);
});
I'm amazed this still hasn't been addressed in mongo. Overall mongo doesn't seem to be great when dealing with sub-arrays. You can't count sub-arrays simply for example.
I used Javier's first solution. Read the array into events then loop through and build the set exp:
var set = {}, i, l;
for(i=0,l=events.length;i<l;i++) {
if(events[i].profile == 10) {
set['events.' + i + '.handled'] = 0;
}
}
.update(objId, {$set:set});
This can be abstracted into a function using a callback for the conditional test
The thread is very old, but I came looking for answer here hence providing new solution.
With MongoDB version 3.6+, it is now possible to use the positional operator to update all items in an array. See official documentation here.
Following query would work for the question asked here. I have also verified with Java-MongoDB driver and it works successfully.
.update( // or updateMany directly, removing the flag for 'multi'
{"events.profile":10},
{$set:{"events.$[].handled":0}}, // notice the empty brackets after '$' opearor
false,
true
)
Hope this helps someone like me.
I've been looking for a solution to this using the newest driver for C# 3.6 and here's the fix I eventually settled on. The key here is using "$[]" which according to MongoDB is new as of version 3.6. See https://docs.mongodb.com/manual/reference/operator/update/positional-all/#up.S[] for more information.
Here's the code:
{
var filter = Builders<Scene>.Filter.Where(i => i.ID != null);
var update = Builders<Scene>.Update.Unset("area.$[].discoveredBy");
var result = collection.UpdateMany(filter, update, new UpdateOptions { IsUpsert = true});
}
For more context see my original post here:
Remove array element from ALL documents using MongoDB C# driver
$[] operator selects all nested array ..You can update all array items with '$[]'
.update({"events.profile":10},{$set:{"events.$[].handled":0}},false,true)
Reference
Please be aware that some answers in this thread suggesting use $[] is WRONG.
db.collection.update(
{"events.profile":10},
{$set:{"events.$[].handled":0}},
{multi:true}
)
The above code will update "handled" to 0 for all elements in "events" array, regardless of its "profile" value. The query {"events.profile":10} is only to filter the whole document, not the documents in the array. In this situation it is a must to use $[elem] with arrayFilters to specify the condition of array items so Neil Lunn's answer is correct.
Actually, The save command is only on instance of Document class.
That have a lot of methods and attribute. So you can use lean() function to reduce work load.
Refer here. https://hashnode.com/post/why-are-mongoose-mongodb-odm-lean-queries-faster-than-normal-queries-cillvawhq0062kj53asxoyn7j
Another problem with save function, that will make conflict data in with multi-save at a same time.
Model.Update will make data consistently.
So to update multi items in array of document. Use your familiar programming language and try something like this, I use mongoose in that:
User.findOne({'_id': '4d2d8deff4e6c1d71fc29a07'}).lean().exec()
.then(usr =>{
if(!usr) return
usr.events.forEach( e => {
if(e && e.profile==10 ) e.handled = 0
})
User.findOneAndUpdate(
{'_id': '4d2d8deff4e6c1d71fc29a07'},
{$set: {events: usr.events}},
{new: true}
).lean().exec().then(updatedUsr => console.log(updatedUsr))
})
Update array field in multiple documents in mongo db.
Use $pull or $push with update many query to update array elements in mongoDb.
Notification.updateMany(
{ "_id": { $in: req.body.notificationIds } },
{
$pull: { "receiversId": req.body.userId }
}, function (err) {
if (err) {
res.status(500).json({ "msg": err });
} else {
res.status(200).json({
"msg": "Notification Deleted Successfully."
});
}
});
if you want to update array inside array
await Booking.updateOne(
{
userId: req.currentUser?.id,
cart: {
$elemMatch: {
id: cartId,
date: date,
//timeSlots: {
//$elemMatch: {
//id: timeSlotId,
//},
//},
},
},
},
{
$set: {
version: booking.version + 1,
'cart.$[i].timeSlots.$[j].spots': spots,
},
},
{
arrayFilters: [
{
'i.id': cartId,
},
{
'j.id': timeSlotId,
},
],
new: true,
}
);
I tried the following and its working fine.
.update({'events.profile': 10}, { '$set': {'events.$.handled': 0 }},{ safe: true, multi:true }, callback function);
// callback function in case of nodejs

What is the addressing issue with my complex document update() in MongoDB?

I have been unable to reach into my MongoDB collection and change a value in a complex document. I have tried more variations than the one example shown below, all sorts of variations, but they fail.
I want to change the Value of the Key "air" from "rain" to "clear". In real life, I will not know that the current Value of the Key "air" is "rain".
Note, I am not using the MongoDB _id Object and would like to accomplish this without using it.
3 documents in the weatherSys collection:
{
"_id" : ObjectId("58a638fb1831c61917f921c5"),
"SanFrancisco" : [
{ "sky" : "grey" },
{ "air" : "rain" },
{ "ground" : "wet" }
]
}
{
"_id" : ObjectId("58a638fb1831c61917f921c6"),
"LosAngeles" : [
{ "sky" : "grey" },
{ "air" : "rain" },
{ "ground" : "wet" }
]
}
{
"_id" : ObjectId("58a638fb1831c61917f921c7"),
"SanDiego" : [
{ "sky" : "grey" },
{ "air" : "rain" },
{ "ground" : "wet" }
]
}
var docKey = "LosAngeles";
var subKey = "air";
var newValue = "clear";
var query = {};
//var queryKey = docKey + ".$";
query[query] = subKey; // query = { }
var set = {};
var setKey = docKey + ".0." + subKey;
set[setKey] = newValue; // set = { "weather.0.air" : "clear" }
db.collection('weatherSys').update(query, { $set: set }, function(err, result) {
if (err) throw err;
});
UPDATE-1:
Ok, so I was hoping I could find a layout a bit simpler than you had suggested but I failed. Everything I tried was not addressable at the "air" Key level. So I copy and pasted your exact JSON into my collection and ran it. I'm using MongoChef to manipulate and test the collection.
Here is my new layout drived from pasting your JSON in 3 times to create 3 documents:
When I then attempted to update the "San Francisco" document's "air" key I got an unexpected result. Rather than updating "air":"dry" it created a new "air" key in the "San Francisco" Object:
So I thought ok, lets try the update again and see what happens:
As you can see it updated the "air" key that it had previously created. I could fight this out and try to make it work "my" way but I just want it to work so I reconfigure my collection layout again, along the lines of what is "working":
And run the update again:
Then I verify it by running the update again:
It works, I am updating properly in a multi-document environment. So this is my current working collection layout:
I have a couple of questions about this-
I am using the top level Key "weather" in every document. It adds nothing to the information within the document. Is there a layout design change that would not necessitate that Key and the overhead it brings along?
Lets say I have to use the "weather" key. Its value is an array, but that array only has one element, the Object which contains the Keys: city, sky, air, and ground. Does addressing necessitate the use of an array with only one element? Or could I get rid of it. Instead of "weather":[{}] could the design be "weather":{} or would I get into non addressability issues again?
It appears I can now update() any of the Values for the Keys: air, sky, and ground, but what is the find() structure to say READ the Value of the Key "ground" in one of the documents?
----> OK, I think I've got this question #3-
db.weatherSys.find({ "weather.city" : "San Francisco" }, { "weather.ground": 1 })
In the original collection layout that you had suggested, could you explain to me why it did not update as you and I had expected but instead created a new the "city" object?
A lot here. I appreciate your sticking with it.
You can't use positional operator for querying the array by its key.
You can access the weather array by index, but that means you know the array index.
For example if you want to update air element value in weather array.
db.collection('weatherSys').update( {}, { $set: { "weather.1.air" : "clear"} } );
Update:
Unfortunately, I can't see any way to update the values without knowing the array index for key.
You don't need query object as your keys are unique .
db.collection('weatherSys').update( {}, { $set: { "SanFrancisco.1.air" : "clear"} } );
or
Other variant if you want to make sure the key exists.
db.collection('weatherSys').update( { "SanFrancisco": { $exists: true } }, { $set: { "SanFrancisco.1.air" : "clear"} } );
Not sure if you can but if you can update your structure to below.
{
"_id" : ObjectId("58a638fb1831c61917f921c5"),
"weather" : [
{
"city": "LosAngeles",
"sky" : "grey" ,
"air" : "rain" ,
"ground" : "wet"
}
]
}
You can now use $positional operator for update.
db.collection('weatherSys').update( {"weather.city":"LosAngeles"}, { $set: { "weather.$.air" : "clear"} } );
I am using the top level Key "weather" in every document. It adds
nothing to the information within the document. Is there a layout
design change that would not necessitate that Key and the overhead it
brings along?
The only layout that I can think of is promoting all the embedded properties to the top level. Sorry, not sure why I didn't think of this the first time around. Sometimes you just need a right question to get the right answer.
{
"_id" : ObjectId("58a638fb1831c61917f921c5"),
"city": "LosAngeles",
"sky" : "grey",
"air" : "rain",
"ground" : "wet"
}
All the updates will be simply top level updates.
db.collection('weatherSys').update( {"city":"LosAngeles"}, { $set: { "air" : "clear"} } );
Lets say I have to use the "weather" key. Its value is an array, but
that array only has one element, the Object which contains the Keys:
city, sky, air, and ground. Does addressing necessitate the use of an
array with only one element? Or could I get rid of it. Instead of
"weather":[{}] could the design be "weather":{} or would I get into
non addressability issues again?
N/A if you are okay with first suggestion.
It appears I can now update() any of the Values for the Keys: air,
sky, and ground, but what is the find() structure to say READ the
Value of the Key "ground" in one of the documents?
db.weatherSys.find({ "city" : "San Francisco" }, { "ground": 1 })
In the original collection layout that you had suggested, could you
explain to me why it did not update as you and I had expected but
instead created a new the "city" object?
That is a copy paste error. I meant to suggest the working layout you have right now. Updated my previous layout.

MongoDB aggregation: How to extract the field in the results

all!
I'm new to MongoDB aggregation, after aggregating, I finally get the result:
"result" : [
{
"_id" : "531d84734031c76f06b853f0"
},
{
"_id" : "5316739f4031c76f06b85399"
},
{
"_id" : "53171a7f4031c76f06b853e5"
},
{
"_id" : "531687024031c76f06b853db"
},
{
"_id" : "5321135cf5fcb31a051e911a"
},
{
"_id" : "5315b2564031c76f06b8538f"
}
],
"ok" : 1
The data is just what I'm looking for, but I just want to make it one step further, I hope my data will be displayed like this:
"result" : [
"531d84734031c76f06b853f0",
"5316739f4031c76f06b85399",
"53171a7f4031c76f06b853e5",
"531687024031c76f06b853db",
"5321135cf5fcb31a051e911a",
"5315b2564031c76f06b8538f"
],
"ok" : 1
Yes, I just want to get all the unique id in a plain string array, is there anything I could do? Any help would be appreciated!
All MongoDB queries produce "key/value" pairs in the result document. All MongoDB content is basically a BSON document in this form, which is just "translated" back to native code form by the driver to the language it is implemented in.
So the aggregation framework alone is never going to produce a bare array of just the values as you want. But you can always just transform the array of results, as after all it is only an array
var result = db.collection.aggregate(pipeline);
var response = result.result.map(function(x) { return x._id } );
Also note that the default behavior in the shell and a preferred option is that the aggregation result is actually returned as a cursor from MongoDB 2.6 and onwards. Since this is in list form rather than as a distinct document you would process differently:
var response = db.collection.aggregate(pipeline).map(function(x) {
return x._id;
})

how to push a dictionary to a nested array with mongodb?

i have data that looks like this in my database
> db.whocs_up.find()
{ "_id" : ObjectId("52ce212cb17120063b9e3869"), "project" : "asnclkdacd", "users" : [ ] }
and i tried to add to the 'users' array like thus:
> db.whocs_up.update({'users.user': 'usex', 'project' : 'asnclkdacd' },{ '$addToSet': { 'users': {'user':'userx', 'lastactivity' :2387843543}}},true)
but i get the following error:
Cannot apply $addToSet modifier to non-array
same thing happens with push operator, what im i doing wrong?
im on 2.4.8
i tried to follow this example from here:
MongoDB - Update objects in a document's array (nested updating)
db.bar.update( {user_id : 123456, "items.item_name" : {$ne : "my_item_two" }} ,
{$addToSet : {"items" : {'item_name' : "my_item_two" , 'price' : 1 }} } ,
false ,
true)
the python tag is because i was working with python when i ran into this, but it does nto work on the mongo shell as you can see
EDIT ============================== GOT IT TO WORK
apparently if i modify the update from
db.whocs_up.update({'users.user': 'usex', 'project' : 'asnclkdacd' },{ '$addToSet': { 'users': {'user':'userx', 'lastactivity' :2387843543}}},true)
to this:
db.whocs_up.update({'project' : 'asnclkdacd' },{ '$addToSet': { 'users': {'user':'userx', 'lastactivity' :2387843543}}},true)
it works, but can anyone explain why the two do not achieve the same thing, in my understanding they should have referenced the same document and hence done the same thing,
What does the addition of 'users.user': 'userx' change in the update? does it refer to some inner document in the array rather than the document as a whole?
This is a known bug in MongoDB (SERVER-3946). Currently, an update with $push/$addToSet with a query on the same field does not work as expected.
In the general case, there are a couple of workarounds:
Restructure your update operation to not have to query on a field that is also to be updated using $push/$addToSet (as you have done above).
Use the $all operator in the query, supplying a single-value array containing the lookup value. e.g. instead of this:
db.foo.update({ x : "a" }, { $addToSet : { x : "b" } }, true)
do this:
db.foo.update({ x : { $all : ["a"] } }, { $addToSet : { x : "b" } } , true)
In your specific case, I think you need to re-evaluate the operation you're trying to do. The update operation you have above will add a new array entry for each unique (user, lastactivity) pair, which is probably not what you want. I assume you want a unique entry for each user.
Consider changing your schema so that you have one document per user:
{
_id : "userx",
project : "myproj",
lastactivity : 123,
...
}
The update operation then becomes something like:
db.users.update({ _id : "userx" }, { $set : { lastactivity : 456 } })
All users in a given project may still be looked up efficiently by adding a secondary index on project.
This schema also avoids the unbounded document growth of the above schema, which is better for performance.

Categories