mongodb - $lookup pipeline using COLLSCAN instead of index - javascript

I'm trying to use indexing on my $lookup pipeline but it doesn't seem to be working as intended.
Here's my query:
db.map_levels.explain().aggregate([
{
$lookup:
{
from:
"map_level_revisions",
pipeline:
[
{
$match:
{
$expr:
{
$eq:
[
"$account_id",
ObjectId("5b66ca21d6b54f479bef62a4")
]
}
}
}
],
as:
"revisions"
}
},
])
and here's the explanation:
{
"stages" : [
{
"$cursor" : {
"query" : {
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test-creator.map_levels",
"indexFilterSet" : false,
"parsedQuery" : {
},
"winningPlan" : {
"stage" : "COLLSCAN",
"direction" : "forward"
},
"rejectedPlans" : [ ]
}
}
},
{
"$lookup" : {
"from" : "map_level_revisions",
"as" : "revisions",
"let" : {
},
"pipeline" : [
{
"$match" : {
"$expr" : {
"$eq" : [
"$account_id",
ObjectId("5b66ca21d6b54f479bef62a4")
]
}
}
}
]
}
}
],
"ok" : 1
}
How do I make it use an index instead?
Just a note, the query does return the document.

The collection scan in your explain output is referring to the map_levels collection, as noted in the queryPlanner.namespace value. The $lookup stage merges data from another collection into the current pipeline. Since you haven't specified any query stages before the $lookup, the map_levels collection will be iterated using a collection scan. If an entire collection is being loaded without any filtering or sort criteria, a collection scan has less overhead than iterating an index and fetching the documents.
You can avoid the current collection scan by adding a $match stage before your $lookup (assuming you don't want to process the full map_levels collection).
How can I check the index used by $lookup?
Unfortunately query explain output does not (as at MongoDB 4.0) indicate index usage for $lookup stages. A workaround for this would be running explain using your lookup's pipeline as a top level aggregation query.
There's a relevant issue to watch/upvote in the MongoDB Issue tracker: SERVER-22622: Improve $lookup explain to indicate query plan on the "from" collection.

Related

Insert value inside array within Mongo DB documents using bulk write [duplicate]

I want to show products by ids (56e641d4864e5b780bb992c6 and 56e65504a323ee0812e511f2) and show price after subtracted by discount if available.
I can count the final price using aggregate, but this return all document in a collection, how to make it return only the matches ids
"_id" : ObjectId("56e641d4864e5b780bb992c6"),
"title" : "Keyboard",
"discount" : NumberInt(10),
"price" : NumberInt(1000)
"_id" : ObjectId("56e65504a323ee0812e511f2"),
"title" : "Mouse",
"discount" : NumberInt(0),
"price" : NumberInt(1000)
"_id" : ObjectId("56d90714a48d2eb40cc601a5"),
"title" : "Speaker",
"discount" : NumberInt(10),
"price" : NumberInt(1000)
this is my query
productModel.aggregate([
{
$project: {
title : 1,
price: {
$cond: {
if: {$gt: ["$discount", 0]}, then: {$subtract: ["$price", {$divide: [{$multiply: ["$price", "$discount"]}, 100]}]}, else: "$price"
}
}
}
}
], function(err, docs){
if (err){
console.log(err)
}else{
console.log(docs)
}
})
and if i add this $in query, it returns empty array
productModel.aggregate([
{
$match: {_id: {$in: ids}}
},
{
$project: {
title : 1,
price: {
$cond: {
if: {$gt: ["$discount", 0]}, then: {$subtract: ["$price", {$divide: [{$multiply: ["$price", "$discount"]}, 100]}]}, else: "$price"
}
}
}
}
], function(err, docs){
if (err){
console.log(err)
}else{
console.log(docs)
}
})
Your ids variable will be constructed of "strings", and not ObjectId values.
Mongoose "autocasts" string values for ObjectId into their correct type in regular queries, but this does not happen in the aggregation pipeline, as in described in issue #1399.
Instead you must do the correct casting to type manually:
ids = ids.map(function(el) { return mongoose.Types.ObjectId(el) })
Then you can use them in your pipeline stage:
{ "$match": { "_id": { "$in": ids } } }
The reason is because aggregation pipelines "typically" alter the document structure, and therefore mongoose makes no presumption that the "schema" applies to the document in any given pipeline stage.
It is arguable that the "first" pipeline stage when it is a $match stage should do this, since indeed the document is not altered. But right now this is not how it happens.
Any values that may possibly be "strings" or at least not the correct BSON type need to be manually cast in order to match.
In the mongoose , it work fine with find({_id:'606c1ceb362b366a841171dc'})
But while using the aggregate function we have to use the mongoose object to convert the _id as object eg.
$match: { "_id": mongoose.Types.ObjectId("606c1ceb362b366a841171dc") }
This will work fine.
You can simply convert your id to
let id = mongoose.Types.ObjectId(req.query.id);
and then match
{ $match: { _id: id } },
instead of:
$match: { _id: "6230415bf48824667a417d56" }
use:
$match: { _id: ObjectId("6230415bf48824667a417d56") }
Use this
$match: { $in : [ {_id: mongoose.Types.ObjectId("56e641d4864e5b780bb992c6 ")}, {_id: mongoose.Types.ObjectId("56e65504a323ee0812e511f2")}] }
Because Mongoose autocasts string values for ObjectId into their correct type in regular queries, but this does not happen in the aggregation pipeline. So we need to define ObjectId cast in pipeline queries.

Meteor: Return only single object in nested array within collection

I'm attempting to filter returned data sets with Meteor's find().fetch() to contain just a single object, it doesn't appear very useful if I query for a single subdocument but instead I receive several, some not even containing any of the matched terms.
I have a simple mixed data collection that looks like this:
{
"_id" : ObjectId("570d20de3ae6b49a54ee01e7"),
"name" : "Entertainment",
"items" : [
{
"_id" : ObjectId("57a38b5f2bd9ac8225caff06"),
"slug" : "this-is-a-long-slug",
"title" : "This is a title"
},
{
"_id" : ObjectId("57a38b835ac9e2efc0fa09c6"),
"slug" : "mc",
"title" : "Technology"
}
]
}
{
"_id" : ObjectId("570d20de3ae6b49a54ee01e8"),
"name" : "Sitewide",
"items" : [
{
"_id" : ObjectId("57a38bc75ac9e2efc0fa09c9"),
"slug" : "example",
"name" : "Single Example"
}
]
}
I can easily query for a specific object in the nested items array with the MongoDB shell as this:
db.categories.find( { "items.slug": "mc" }, { "items.$": 1 } );
This returns good data, it contains just the single object I want to work with:
{
"_id" : ObjectId("570d20de3ae6b49a54ee01e7"),
"items" : [
{
"_id" : ObjectId("57a38b985ac9e2efc0fa09c8")
"slug" : "mc",
"name" : "Single Example"
}
]
}
However, if a similar query within Meteor is directly attempted:
/* server/publications.js */
Meteor.publish('categories.all', function () {
return Categories.find({}, { sort: { position: 1 } });
});
/* imports/ui/page.js */
Template.page.onCreated(function () {
this.subscribe('categories.all');
});
Template.page.helpers({
items: function () {
var item = Categories.find(
{ "items.slug": "mc" },
{ "items.$": 1 } )
.fetch();
console.log('item: %o', item);
}
});
The outcome isn't ideal as it returns the entire matched block, as well as every object in the nested items array:
{
"_id" : ObjectId("570d20de3ae6b49a54ee01e7"),
"name" : "Entertainment",
"boards" : [
{
"_id" : ObjectId("57a38b5f2bd9ac8225caff06")
"slug" : "this-is-a-long-slug",
"name" : "This is a title"
},
{
"_id" : ObjectId("57a38b835ac9e2efc0fa09c6")
"slug" : "mc",
"name" : "Technology"
}
]
}
I can then of course filter the returned cursor even further with a for loop to get just the needed object, but this seems unscalable and terribly inefficient while dealing with larger data sets.
I can't grasp why Meteor's find returns a completely different set of data than MongoDB's shell find, the only reasonable explanation is both function signatures are different.
Should I break up my nested collections into smaller collections and take a more relational database approach (i.e. store references to ObjectIDs) and query data from collection-to-collection, or is there a more powerful means available to efficiently filter large data sets into single objects that contain just the matched objects as demonstrated above?
The client side implementation of Mongo used by Meteor is called minimongo. It currently only implements a subset of available Mongo functionality. Minimongo does not currently support $ based projections. From the Field Specifiers section of the Meteor API:
Field operators such as $ and $elemMatch are not available on the client side yet.
This is one of the reasons why you're getting different results between the client and the Mongo shell. The closest you can get with your original query is the result you'll get by changing "items.$" to "items":
Categories.find(
{ "items.slug": "mc" },
{ "items": 1 }
).fetch();
This query still isn't quite right though. Minimongo expects your second find parameter to be one of the allowed option parameters outlined in the docs. To filter fields for example, you have to do something like:
Categories.find(
{ "items.slug": "mc" },
{
fields: {
"items": 1
}
}
).fetch();
On the client side (with Minimongo) you'll then need to filter the result further yourself.
There is another way of doing this though. If you run your Mongo query on the server, you won't be using Minimongo, which means projections are supported. As a quick example, try the following:
/server/main.js
const filteredCategories = Categories.find(
{ "items.slug": "mc" },
{
fields: {
"items.$": 1
}
}
).fetch();
console.log(filteredCategories);
The projection will work, and the logged results will match the results you see when using the Mongo console directly. Instead of running your Categories.find on the client side, you could instead create a Meteor Method that calls your Categories.find on the server, and returns the results back to the client.

MongoDB Finding duplicates sharing multiple fields using MapReduce

I am trying to find duplicates in a Mongo version 2.4 database that is being used for production and therefore cannot be updated. Since aggregate does not exist in 2.4, I cannot use the aggregate pipeline to find duplicates, therefore I am trying to find a solution using MapReduce.
I have tried the following set of map, reduce, and finalize functions, through MongoVUE's Map Reduce interface, and they returned nothing after running for less than a second on a 3,000,000 record collection that definitely has duplicates on the indicated fields. Obviously something went wrong, but MongoVUE did not show any error messages or helpful indications.
function Map() {
emit(
{name: this.name, LocationId: this.LocationId,
version: this.version},
{count:1, ScrapeDate: this.ScrapeDate}
);
}
function Reduce(key, values) {
var reduced = {count:0, ScrapeDate:''2000-01-01''};
values.forEach(function(val) {
reduced.count += val.count;
if (reduced.ScrapeDate.localeCompare(val.ScrapeDate) < 0)
reduced.ScrapeDate=val.ScrapeDate;
});
return reduced;
return values[0];
}
function Finalize(key, reduced) {
if (reduced.count > 1)
return reduced;
}
I just need to find any instance of multiple records that share the same name, LocationId, and version, and ideally display the most recent ScrapeDate of such a record.
Your map-reduce code worked without any issues, though for a very small dataset. I think return values[0]; in the reduce function would be a copy paste error. You could try the same through the mongo shell.
Since aggregate does not exist in 2.4, I cannot use the aggregate pipeline to find duplicates, therefore I am trying to find a solution
using MapReduce.
You got it wrong here, db.collection.aggregate(pipeline, options) was introduced in the version 2.2.
Here is how it could be done with the aggregation framework, but it would not be preferred since your dataset is very huge, and the $sort operator has memory limit of 10% of RAM, in v2.4.
db.collection.aggregate(
[
// sort the records, based on the 'ScrapeDate' field, in descending order.
{$sort:{"ScrapeDate":-1}},
// group by the key fields, and take the 'ScrapeDate' of the first document,
// Since it is in sorted order, the first document would contain the
// highest field value.
{$group:{"_id":{"name":"$name","LocationId":"$LocationId","version":"$version"}
,"ScrapeDate":{$first:"$ScrapeDate"}
,"count":{$sum:1}}
},
// output only the group, having documents greater than 1.
{$match:{"count":{$gt:1}}}
]
);
Coming to your Map-reduce functions, it ran without issues on my test data.
db.collection.insert({"name":"c","LocationId":1,"version":1,"ScrapeDate":"2000-01-01"});
db.collection.insert({"name":"c","LocationId":1,"version":1,"ScrapeDate":"2001-01-01"});
db.collection.insert({"name":"c","LocationId":1,"version":1,"ScrapeDate":"2002-01-01"});
db.collection.insert({"name":"d","LocationId":1,"version":1,"ScrapeDate":"2002-01-01"});
running the map-reduce,
db.collection.mapReduce(Map,Reduce,{out:{"inline":1},finalize:Finalize});
o/p:
{
"results" : [
{
"_id" : {
"name" : "c",
"LocationId" : 1,
"version" : 1
},
"value" : {
"count" : 3,
"ScrapeDate" : "2002-01-01"
}
},
{
"_id" : {
"name" : "d",
"LocationId" : 1,
"version" : 1
},
"value" : null
}
],
"timeMillis" : 0,
"counts" : {
"input" : 4,
"emit" : 4,
"reduce" : 1,
"output" : 2
},
"ok" : 1,
}
Notice that the output contains value:null for a record which doesn't have any duplicates.
This is due to your finalize function:
function Finalize(key, reduced) {
if (reduced.count > 1)
return reduced; // returned null by default for keys with single value,
// i.e count=1
}
The finalize function do not filter out keys. So you can't get only the keys that are duplicates. You will get all the keys, in the map-reduce output. In your finalize functions, you can just not show their values, which is what you are doing.

MongoDB aggregation: How to extract the field in the results

all!
I'm new to MongoDB aggregation, after aggregating, I finally get the result:
"result" : [
{
"_id" : "531d84734031c76f06b853f0"
},
{
"_id" : "5316739f4031c76f06b85399"
},
{
"_id" : "53171a7f4031c76f06b853e5"
},
{
"_id" : "531687024031c76f06b853db"
},
{
"_id" : "5321135cf5fcb31a051e911a"
},
{
"_id" : "5315b2564031c76f06b8538f"
}
],
"ok" : 1
The data is just what I'm looking for, but I just want to make it one step further, I hope my data will be displayed like this:
"result" : [
"531d84734031c76f06b853f0",
"5316739f4031c76f06b85399",
"53171a7f4031c76f06b853e5",
"531687024031c76f06b853db",
"5321135cf5fcb31a051e911a",
"5315b2564031c76f06b8538f"
],
"ok" : 1
Yes, I just want to get all the unique id in a plain string array, is there anything I could do? Any help would be appreciated!
All MongoDB queries produce "key/value" pairs in the result document. All MongoDB content is basically a BSON document in this form, which is just "translated" back to native code form by the driver to the language it is implemented in.
So the aggregation framework alone is never going to produce a bare array of just the values as you want. But you can always just transform the array of results, as after all it is only an array
var result = db.collection.aggregate(pipeline);
var response = result.result.map(function(x) { return x._id } );
Also note that the default behavior in the shell and a preferred option is that the aggregation result is actually returned as a cursor from MongoDB 2.6 and onwards. Since this is in list form rather than as a distinct document you would process differently:
var response = db.collection.aggregate(pipeline).map(function(x) {
return x._id;
})

how to push a dictionary to a nested array with mongodb?

i have data that looks like this in my database
> db.whocs_up.find()
{ "_id" : ObjectId("52ce212cb17120063b9e3869"), "project" : "asnclkdacd", "users" : [ ] }
and i tried to add to the 'users' array like thus:
> db.whocs_up.update({'users.user': 'usex', 'project' : 'asnclkdacd' },{ '$addToSet': { 'users': {'user':'userx', 'lastactivity' :2387843543}}},true)
but i get the following error:
Cannot apply $addToSet modifier to non-array
same thing happens with push operator, what im i doing wrong?
im on 2.4.8
i tried to follow this example from here:
MongoDB - Update objects in a document's array (nested updating)
db.bar.update( {user_id : 123456, "items.item_name" : {$ne : "my_item_two" }} ,
{$addToSet : {"items" : {'item_name' : "my_item_two" , 'price' : 1 }} } ,
false ,
true)
the python tag is because i was working with python when i ran into this, but it does nto work on the mongo shell as you can see
EDIT ============================== GOT IT TO WORK
apparently if i modify the update from
db.whocs_up.update({'users.user': 'usex', 'project' : 'asnclkdacd' },{ '$addToSet': { 'users': {'user':'userx', 'lastactivity' :2387843543}}},true)
to this:
db.whocs_up.update({'project' : 'asnclkdacd' },{ '$addToSet': { 'users': {'user':'userx', 'lastactivity' :2387843543}}},true)
it works, but can anyone explain why the two do not achieve the same thing, in my understanding they should have referenced the same document and hence done the same thing,
What does the addition of 'users.user': 'userx' change in the update? does it refer to some inner document in the array rather than the document as a whole?
This is a known bug in MongoDB (SERVER-3946). Currently, an update with $push/$addToSet with a query on the same field does not work as expected.
In the general case, there are a couple of workarounds:
Restructure your update operation to not have to query on a field that is also to be updated using $push/$addToSet (as you have done above).
Use the $all operator in the query, supplying a single-value array containing the lookup value. e.g. instead of this:
db.foo.update({ x : "a" }, { $addToSet : { x : "b" } }, true)
do this:
db.foo.update({ x : { $all : ["a"] } }, { $addToSet : { x : "b" } } , true)
In your specific case, I think you need to re-evaluate the operation you're trying to do. The update operation you have above will add a new array entry for each unique (user, lastactivity) pair, which is probably not what you want. I assume you want a unique entry for each user.
Consider changing your schema so that you have one document per user:
{
_id : "userx",
project : "myproj",
lastactivity : 123,
...
}
The update operation then becomes something like:
db.users.update({ _id : "userx" }, { $set : { lastactivity : 456 } })
All users in a given project may still be looked up efficiently by adding a secondary index on project.
This schema also avoids the unbounded document growth of the above schema, which is better for performance.

Categories