Related
My data looks like this
{
"_id": "62f77d806f24c09f0acae163",
"name": "Test product",
"attributes": [
{
"attribute_name": "Shape",
"attribute_value": "Square"
},
{
"attribute_name": "Color",
"attribute_value": "Red"
}
]
}
I am using the aggregate method to filter results where I want to find products where "attribute_name" is "shape" and the "attribute_value" is "Square" AND "attribute_name" is "Color" and the "attribute_value" is "Red"
Basically I am building a filter feature in my application and basis the data passed to the API I want to get the products.
I have tried this:
let lookup = {
$match: {
$and: [
{
'attributes.attribute_label': 'Shape',
'attributes.attribute_value': {
$in: ['Square']
},
},
{
'attributes.attribute_label': 'Color',
'attributes.attribute_value': {
$in: ['Red']
},
}
],
}
};
let products = await productsModel.aggregate(lookup);
At first it seemed like it worked, but then I noticed it doesn't work properly, it matches
'attributes.attribute_value': {
$in: ['Red']
},
so if it finds "Red" in "attribute_label" which can be anything other than "Color" it will still return the results.
Any help is appreciated
I want to be able to get results based on the values for each attribute name
For e.g data passed might be this
Shape=Square,Color=Red,Green
I want to get the products which matches this, where the object with attribute_label of Color contains the attribute_value of Red or Green.
Is it a typo? Once you are using "attributes.attribute_label" and once "attributes.attribute_name".
This should work with attributes.attrubite_name (not label!)
[
{
'$match': {
'$and': [
{
'attributes.attribute_name': 'Shape'
}, {
'attributes.attribute_value': {
'$in': [
'Square'
]
}
}, {
'attributes.attribute_name': 'Color'
}, {
'attributes.attribute_value': {
'$in': [
'Red'
]
}
}
]
}
}
]
I have an existing ejs query as below:
let queryBody = ejs.Request()
.size(0)
.query(
ejs.BoolQuery()
.must(
ejs.RangeQuery('hour_time_stamp').gte(this.lastDeviceDate).lte(this.lastDeviceDate)
)
)
.agg(ejs.TermsAggregation('market_agg').field('market').order('sum', 'asc').size(50000)
.agg(ejs.SumAggregation('sum').field('num_devices'))
)
currently the field('market') returns the values where data for market is present. There is data in the database for missing values for market as well, which I need to access. How do I do that?
EDIT:
Values for market in ES is either null or field is missing. I wrote ES query to get all those fields but I am not able to incorporate an ejs query for the same. Any idea how this can be done?
{
"query": {
"bool": {
"should": [
{
"exists": {
"field": "market"
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "market"
}
}
]
}
}
]
}
}
}
As per your problem you need a way to group the empty market fields too.
So for that you can use the "missing" value parameter. It defines how the values which are missing(as in your case) are grouped. So you query in json form will be modified like below :-
{
"query":
{
"must": [
"range": {
"hour_time_stamp": {
"gte": lastDeviceDate,
"lte": lastDeviceDate
}
}
]
},
"aggs": {
"market_agg" : {
"market": {
"missing": "empty_markets",
"order": { "sum": "asc" }
}
},
"sum_agg": {
"sum" : { "field" : "num_devices" }
}
}
}
Or in your code it could be done by adding missing parameter like this.
let queryBody = ejs.Request()
.size(0)
.query(
ejs.BoolQuery()
.must(
ejs.RangeQuery('hour_time_stamp').gte(this.lastDeviceDate).lte(this.lastDeviceDate)
)
)
.agg(ejs.TermsAggregation('market_agg').field('market').missing('empty_markets').order('sum', 'asc').size(50000)
.agg(ejs.SumAggregation('sum').field('num_devices'))
)
I'm getting an error on this elastic search for terms. The error message is
"[parsing_exception] [terms] unknown token [START_ARRAY] after [activeIds], with { line=1 & col=63 }"
Active Ids is an array of unique ids. It sort of looks like
const activeIds = [ '157621a1-d892-4f4b-80ca-14feddb837a0',
'd04c5c93-a22c-48c3-a3b0-c79a61bdd923',
'296d40d9-f316-4560-bbc9-001d6f46858b',
'2f8c6c37-588d-4d24-9e69-34b6dd7366c2',
'ba0508dd-0e76-4be8-8b6e-9e938ab4abed',
'ab076ed9-1dd5-4987-8842-15f1b995bc0d',
'ea6b0cff-a64f-4ce3-844e-b36d9f161e6f' ]
let items = await es.search({
"index": table,
"body": {
"from": 0, "size": 25,
"query": {
"terms" : {
"growerId" : {
activeIds
}
},
"bool": {
"must_not": [
{ "match":
{
"active": false
}
},
],
"must": [
{ "query_string" :
{
"query": searchQuery,
"fields": ["item_name"]
}
}
],
}
}
}
})
Appreciate the help!
Edit: Answering this question- "What's the expected result? Can you elaborate and share some sample data? – Nishant Saini 15 hours ago"
I'll try to elaborate a bit.
1) Overall I'm trying to retrieve items that belong to active users. There are 2 tables: user and items. So I'm initially running an ES that returns all the users that contain { active: true } from the user table
2) Running that ES returns an array of ids which I'm calling activeIds. The array looks like what I've already displayed in my example. So this works so far (let me know if you want to see the code for that, but if I'm getting an expected result then I don't think we need that now)
3) Now I want to search through the items table, and retrieve only the items that contain one of the active ids. So an item should look like:
4) expected result is retrieve an array of objects that match the growerId with one of the activeIds. So if I do a search query for "flowers", a single expected result should look like:
[ { _index: 'items-dev',
_type: 'items-dev_type',
_id: 'itemId=fc68dadf-21c8-43c2-98d2-cf574f71f06d',
_score: 11.397207,
_source:
{ itemId: 'fc68dadf-21c8-43c2-98d2-cf574f71f06d',
'#SequenceNumber': '522268700000000025760905838',
item_name: 'Flowers',
grower_name: 'Uhs',
image: '630b5d6e-566f-4d55-9d31-6421eb2cff87.jpg',
dev: true,
growerId: 'd04c5c93-a22c-48c3-a3b0-c79a61bdd923',
sold_out: true,
'#timestamp': '2018-12-20T16:09:38.742599',
quantity_type: 'Pounds',
active: true,
pending_inventory: 4,
initial_quantity: 5,
price: 10,
item_description: 'Field of flowers' } },
So here the growerId matches activeIds[1]
But if I do a search for "invisible", which is created by a an inactive user, I get:
[ { _index: 'items-dev',
_type: 'items-dev_type',
_id: 'itemId=15200473-93e1-477c-a1a7-0b67831f5351',
_score: 1,
_source:
{ itemId: '15200473-93e1-477c-a1a7-0b67831f5351',
'#SequenceNumber': '518241400000000004028805117',
item_name: 'Invisible too',
grower_name: 'Field of Greens',
image: '7f37d364-e768-451d-997f-8bb759343300.jpg',
dev: true,
growerId: 'f25040f4-3b8c-4306-9eb5-8b6c9ac58634',
sold_out: false,
'#timestamp': '2018-12-19T20:47:16.128934',
quantity_type: 'Pounds',
pending_inventory: 5,
initial_quantity: 5,
price: 122,
item_description: 'Add' } },
Now that growerId does not match any of the ids in activeIds.
5) Using the code you helped with, it's returning 0 items.
Let me know if you need more detail. I've been working on this for a bit too long :\
Terms query accept array of terms so the terms query should be defined as below:
"terms": {
"growerId": activeIds
}
You might face other errors as well after making the above correction. So below is full query which might help you:
{
"from": 0,
"size": 25,
"query": {
"bool": {
"must_not": [
{
"match": {
"active": false
}
}
],
"must": [
{
"query_string": {
"query": searchQuery,
"fields": [
"item_name"
]
}
},
{
"terms": {
"growerId": activeIds
}
}
]
}
}
}
I have simplified my specific problem so it's easier to understand, but the data I want to aggregate are user events on a video player page, and it looks like this:
{_id:"5963796a46d12ed9891f8c80",eventName:"Click Freature 1",creation:1499691279492},
{_id:"59637a5a46d12ed9891f8e0d",eventName:"Video Play",creation:1499691608106},
{_id:"59637a9546d12ed9891f8e90",eventName:"Click Freature 1",creation:1499691664633},
{_id:"59637c0f46d12ed9891f9146",eventName:"Video Pause",creation:1499692055335}
So the events are consistent and on a chronological order. Let's say I want to count the number of times the user clicked feature 1, but only when the video is playing.
I believe I would have to have some control variable like "isVideoPlaying", that is set to true when a "Video Play" event comes up, and is set to false in case of a "Video Pause", and then add the "Click Feature 1" events to the count only when it's set to ture.
Is there any way to do something like that?
Is there a way to get and set a control variable during the
aggregation process?
No, there is no way to keep track of previous/next when aggregation pipeline is executed.
The idea is to convert the events for each event into its own time array values.
You have two options.
Breakdown
Video Play : [1,5,7]
Video Pause : [3,6,10]
Features : [2,4,8,9]
Play-Features : 2 8,9
Video play-pause pair : [1,3],[5,6],[7,10]
Pause-Features : 4
Video pause-play pair : [3,5],[6,7],[10,-]
Expected Output
{count:3}
First Option: (You do all the work in aggregation pipeline )
Use extra stages to transform the documents into the events-array structure.
Consider below documents
db.collection.insertMany([
{eventName:"Video Play",creation:1},
{eventName:"Click Features 1",creation:2},
{eventName:"Video Pause",creation:3},
{eventName:"Click Features 1",creation:4},
{eventName:"Video Play",creation:5},
{eventName:"Video Pause",creation:6},
{eventName:"Video Play",creation:7},
{eventName:"Click Features 1",creation:8},
{eventName:"Click Features 1",creation:9},
{eventName:"Video Pause",creation:10}
]);
You can use below aggregation
The aggregation below uses two $group stage to convert the events into its time array followed by $project stage to project ($let) each event creations array into a variables.
For logic explanation inside $let see the option 2
db.collection.aggregate([
{
"$sort": {
"eventName": 1,
"creation": 1
}
},
{
"$group": {
"_id": "$eventName",
"creations": {
"$push": "$creation"
}
}
},
{
"$group": {
"_id": "null",
"events": {
"$push": {
"eventName": "$_id",
"creations": "$creations"
}
}
}
},
{
"$project": {
"count": {
"$let": {
"vars": {
"video_play_events": {
"$arrayElemAt": [
"$events.creations",
{
"$indexOfArray": [
"$events.eventName",
"Video Play"
]
}
]
},
"click_features_event": {
"$arrayElemAt": [
"$events.creations",
{
"$indexOfArray": [
"$events.eventName",
"Click Features 1"
]
}
]
},
"video_pause_events": {
"$arrayElemAt": [
"$events.creations",
{
"$indexOfArray": [
"$events.eventName",
"Video Pause"
]
}
]
}
},
"in": {*}
}
}
}
}
])
*You have events creations array for each event at this point. Insert below aggregation code and replace $video_play_events with $$video_play_events and so on to access variables from $let stage.
Second Option: ( You save events in its own array )
db.collection.insert([
{
"video_play_events": [
1,
5,
7
],
"click_features_event": [
2,
4,
8,
9
],
"video_pause_events": [
3,
6,
10
]
}
])
You can manage the array growth by adding extra field "count" to limit the no of events you can store in one document.
You can have multiple document for a chosen time slice.
This will simplify the aggregation to below.
The aggregation below iterates over video_play_events and filters all the click features for each play and pause pair (pl and pu).
$size to count no of features elements between each play and pause pair followed by $map + $sum to count all features event for all play pause pairs.
db.collection.aggregate([
{
"$project": {
"count": {
"$sum": {
"$map": {
"input": {
"$range": [
0,
{
"$subtract": [
{
"$size": "$video_play_events"
},
1
]
}
]
},
"as": "z",
"in": {
"$let": {
"vars": {
"pl": {
"$arrayElemAt": [
"$video_pause_events",
"$$z"
]
},
"pu": {
"$arrayElemAt": [
"$video_play_events",
{
"$add": [
1,
"$$z"
]
}
]
}
},
"in": {
"$size": {
"$filter": {
"input": "$click_features_event",
"as": "fe",
"cond": {
"$and": [
{
"$gt": [
"$$fe",
"$$pl"
]
},
{
"$lt": [
"$$fe",
"$$pu"
]
}
]
}
}
}
}
}
}
}
}
}
}
}
])
Notes:
You run the risk of hitting 16 MB document limit based on no of documents you are trying to aggregate in both cases.
You can use async module to run parallel queries with appropriate filters to contain the data you are aggregating followed by client side logic to count all the parts.
I have a projection stage as follows,
{
'name': {$ifNull: [ '$invName', {} ]},,
'info.type': {$ifNull: [ '$invType', {} ]},
'info.qty': {$ifNull: [ '$invQty', {} ]},
'info.detailed.desc': {$ifNull: [ '$invDesc', {} ]}
}
I am projecting empty object({}) in case of a field not present, because if sorting is performed in a field and the field doesn't exist, that document is coming first in sort order(Sort Documents Without Existing Field to End of Results). Next stage is sorting and wanted non-existing fields to come last in sorting order. This is working as expected.
Now, I want to remove those fields which are having empty object as values(if info.detailed.desc is empty info.detailed should not be there in output). I could do this in node level using lodash like this(https://stackoverflow.com/a/38278831/6048928). But I am trying to do this in mongodb level. Is it possible? I tried $redact, but it is filtering out entire document. Is is possible to PRUNE or DESCEND fields of a document based on value?
Removing properties completely from documents is not a trivial thing. The basics are that the server itself has not had any way of doing this prior to MongoDB 3.4 and the introduction of $replaceRoot, which essentially allows an expression to be returned as the document context.
Even with that addition it's somewhat impractical to do so without further features of $objectToArray and $arrayToObject as introduced in MongoDB 3.4.4. But to run through the cases.
Working with a quick sample
{ "_id" : ObjectId("59adff0aad465e105d91374c"), "a" : 1 }
{ "_id" : ObjectId("59adff0aad465e105d91374d"), "a" : {} }
Conditionally return root object
db.junk.aggregate([
{ "$replaceRoot": {
"newRoot": {
"$cond": {
"if": { "$ne": [ "$a", {} ] },
"then": "$$ROOT",
"else": { "_id": "$_id" }
}
}
}}
])
That's a pretty simple principle and can in fact be applied to any nested property to remove it's sub-keys but would require various levels of nesting $cond or even $switch to apply possible conditions. The $replaceRoot of course is needed for "top level" removal since it's the only way to conditionally express top level keys to return.
So whilst you can in theory use $cond or $switch to decide what to return, it's generally cumbersome and you would want something more flexible.
Filter the Empty Objects
db.junk.aggregate([
{ "$replaceRoot": {
"newRoot": {
"$arrayToObject": {
"$filter": {
"input": { "$objectToArray": "$$ROOT" },
"cond": { "$ne": [ "$$this.v", {} ] }
}
}
}
}}
])
This is where $objectToArray and $arrayToObject come into use. Instead of writing out the conditions for every possibly key we just convert the object contents into an "array" and apply $filter on the array entries to decide what to keep.
The $objectToArray translates any object into an array of documents representing each property as "k" for the name of the key and "v" for the value from that property. Since these are now accessible as "values", then you can use methods like $filter to inspect the each array entry and discard the unwanted ones.
Finally $arrayToObject takes the "filtered" content and translates those "k" and "v" values back into property names and values as a resulting object. In this way, the "filter" conditions removes any properties from the result object that did not meet the criteria.
A Return to $cond
db.junk.aggregate([
{ "$project": {
"a": { "$cond": [{ "$eq": [ "$a", {} ] }, "$$REMOVE", "$a" ] }
}}
])
MongoDB 3.6 introduces a new player with the $$REMOVE constant. This is a new feature that can be applied with $cond in order to decide whether or not to show the property at all. So that is another approach when of course the release is available.
In all those above cases the "a" property is not returned when the value is the empty object that we wanted to test for removal.
{ "_id" : ObjectId("59adff0aad465e105d91374c"), "a" : 1 }
{ "_id" : ObjectId("59adff0aad465e105d91374d") }
More Complex Structures
Your specific ask here is for data containing nested properties. So continuing on from the outlined approaches we can work with demonstrating how that is done.
First some sample data:
{ "_id" : ObjectId("59ae03bdad465e105d913750"), "a" : 1, "info" : { "type" : 1, "qty" : 2, "detailed" : { "desc" : "this thing" } } }
{ "_id" : ObjectId("59ae03bdad465e105d913751"), "a" : 2, "info" : { "type" : 2, "qty" : 3, "detailed" : { "desc" : { } } } }
{ "_id" : ObjectId("59ae03bdad465e105d913752"), "a" : 3, "info" : { "type" : 3, "qty" : { }, "detailed" : { "desc" : { } } } }
{ "_id" : ObjectId("59ae03bdad465e105d913753"), "a" : 4, "info" : { "type" : { }, "qty" : { }, "detailed" : { "desc" : { } } } }
Applying the filter method
db.junk.aggregate([
{ "$replaceRoot": {
"newRoot": {
"$arrayToObject": {
"$filter": {
"input": {
"$concatArrays": [
{ "$filter": {
"input": { "$objectToArray": "$$ROOT" },
"cond": { "$ne": [ "$$this.k", "info" ] }
}},
[
{
"k": "info",
"v": {
"$arrayToObject": {
"$filter": {
"input": { "$objectToArray": "$info" },
"cond": {
"$not": {
"$or": [
{ "$eq": [ "$$this.v", {} ] },
{ "$eq": [ "$$this.v.desc", {} ] }
]
}
}
}
}
}
}
]
]
},
"cond": { "$ne": [ "$$this.v", {} ] }
}
}
}
}}
])
This needs more complex handling because of the nested levels. In the main case here you need to look at the "info" key here independently and remove any sub-properties that do not qualify first. Since you need to return "something", we basically then need to remove the "info" key itself when all of it's inner properties are removed. This is the reason for the nested filter operations on each set of results.
Applying $cond with $$REMOVE
Where available this would at first seem a more logical choice, so it helps to look at this from the most simplified form first:
db.junk.aggregate([
{ "$addFields": {
"info.type": {
"$cond": [
{ "$eq": [ "$info.type", {} ] },
"$$REMOVE",
"$info.type"
]
},
"info.qty": {
"$cond": [
{ "$eq": [ "$info.qty", {} ] },
"$$REMOVE",
"$info.qty"
]
},
"info.detailed.desc": {
"$cond": [
{ "$eq": [ "$info.detailed.desc", {} ] },
"$$REMOVE",
"$info.detailed.desc"
]
}
}}
])
But then you need to look at the output this actually produces:
/* 1 */
{
"_id" : ObjectId("59ae03bdad465e105d913750"),
"a" : 1.0,
"info" : {
"type" : 1.0,
"qty" : 2.0,
"detailed" : {
"desc" : "this thing"
}
}
}
/* 2 */
{
"_id" : ObjectId("59ae03bdad465e105d913751"),
"a" : 2.0,
"info" : {
"type" : 2.0,
"qty" : 3.0,
"detailed" : {}
}
}
/* 3 */
{
"_id" : ObjectId("59ae03bdad465e105d913752"),
"a" : 3.0,
"info" : {
"type" : 3.0,
"detailed" : {}
}
}
/* 4 */
{
"_id" : ObjectId("59ae03bdad465e105d913753"),
"a" : 4.0,
"info" : {
"detailed" : {}
}
}
Whilst the other keys are removed the "info.detailed" still stays around because there is nothing that actually tests at this level. In fact you simply cannot express this in simple terms, so the only way to work around this is to evaluate the object as an expression and then apply additional filtering an conditions on each level of output to see where the empty objects still reside, and remove them:
db.junk.aggregate([
{ "$addFields": {
"info": {
"$let": {
"vars": {
"info": {
"$arrayToObject": {
"$filter": {
"input": {
"$objectToArray": {
"type": { "$cond": [ { "$eq": [ "$info.type", {} ] },"$$REMOVE", "$info.type" ] },
"qty": { "$cond": [ { "$eq": [ "$info.qty", {} ] },"$$REMOVE", "$info.qty" ] },
"detailed": {
"desc": { "$cond": [ { "$eq": [ "$info.detailed.desc", {} ] },"$$REMOVE", "$info.detailed.desc" ] }
}
}
},
"cond": { "$ne": [ "$$this.v", {} ] }
}
}
}
},
"in": { "$cond": [ { "$eq": [ "$$info", {} ] }, "$$REMOVE", "$$info" ] }
}
}
}}
])
That approach as with the plain $filter method actually removes "all" empty objects from the results:
/* 1 */
{
"_id" : ObjectId("59ae03bdad465e105d913750"),
"a" : 1.0,
"info" : {
"type" : 1.0,
"qty" : 2.0,
"detailed" : {
"desc" : "this thing"
}
}
}
/* 2 */
{
"_id" : ObjectId("59ae03bdad465e105d913751"),
"a" : 2.0,
"info" : {
"type" : 2.0,
"qty" : 3.0
}
}
/* 3 */
{
"_id" : ObjectId("59ae03bdad465e105d913752"),
"a" : 3.0,
"info" : {
"type" : 3.0
}
}
/* 4 */
{
"_id" : ObjectId("59ae03bdad465e105d913753"),
"a" : 4.0
}
Doing it all in Code
So everything here really depends on latest features or indeed "coming features" to be available in the MongoDB version you are using. Where these are not available the alternate approach is to simply remove the empty objects from the results returned by the cursor.
It's often the most sane thing to do, and really is all you require unless the aggregation pipeline needs to continue past the point where the fields are being removed. Even then, you probably should be logically working around that and leave the final results to cursor processing.
As JavaScript for the shell you can use the following approach, and the principles essentially stay the same no matter which actual language implementation:
db.junk.find().map( d => {
let info = Object.keys(d.info)
.map( k => ({ k, v: d.info[k] }))
.filter(e => !(
typeof e.v === 'object' &&
( Object.keys(e.v).length === 0 || Object.keys(e.v.desc).length === 0 )
))
.reduce((acc,curr) => Object.assign(acc,{ [curr.k]: curr.v }),{});
delete d.info;
return Object.assign(d,(Object.keys(info).length !== 0) ? { info } : {})
})
Which is pretty much the native language way of stating the same as the examples above being that where one of the expected properties contains an empty object, remove that property from the output completely.
I have removed the brands object in the output JSON using $project at end of the aggregation pipeline
db.Product.aggregate([
{
$lookup: {
from: "wishlists",
let: { product: "$_id" },
pipeline: [
{
$match: {
$and: [
{ $expr: { $eq: ["$$product", "$product"] } },
{ user: userId }
]
}
}
],
as: "isLiked"
}
},
{
$lookup: {
from: "brands",
localField: "brand",
foreignField: "_id",
as: "brands"
}
},
{
$addFields: {
isLiked: { $arrayElemAt: ["$isLiked.isLiked", 0] }
}
},
{
$unwind: "$brands"
},
{
$addFields: {
"brand.name": "$brands.name" ,
"brand._id": "$brands._id"
}
},
{
$match:{ isActive: true }
},
{
$project: { "brands" : 0 }
}
]);
$group: {
_id: '$_id',
tasks: {
$addToSet: {
$cond: {
if: {
$eq: [
{
$ifNull: ['$tasks.id', ''],
},
'',
],
},
then: '$$REMOVE',
else: {
id: '$tasks.id',
description: '$tasks.description',
assignee: {
$cond: {
if: {
$eq: [
{
$ifNull: ['$tasks.assignee._id', ''],
},
'',
],
},
then: undefined,
else: {
id: '$tasks.assignee._id',
name: '$tasks.assignee.name',
thumbnail: '$tasks.assignee.thumbnail',
status: '$tasks.assignee.status',
},
},
},
},
},
},
},
}