Related
I'm trying to query my dataset for two purposes:
Match a term (resellable = true)
Order the results by their price
lowest to highest
Data set/doc is:
"data" : {
"resellable" : true,
"startingPrice" : 0,
"id" : "4emEe_r_x5DRCc5",
"buyNowPrice" : 0.006493, //Changes per object
"sub_title" : "test 1",
"title" : "test 1",
"category" : "Education",
}
//THREE OBJECTS WITH THE VALUES OF 0.006, 0.7, 1.05 FOR BUYNOWPRICE
I have three objects of these with different buyNowPrice
Query with agg is:
{
"query": {
"bool": {
"must": [
{
"term": {
"data.resellable": true
}
}
]
}
},
"from": 0,
"size": 5,
"aggs": {
"lowestPrice": {
"terms": {
"field": "data.buyNowPrice",
"order": {
"lowest_price": "desc"
}
},
"aggs": {
"lowest_price": {
"min": {
"field": "data.buyNowPrice"
}
},
"lowest_price_top_hits": {
"top_hits": {
"size": 5,
"sort": [
{
"data.buyNowPrice": {
"order": "desc"
}
}
]
}
}
}
}
}
}
The query works fine, and the results are 3 objects that have resellable = true
The issue is, the agg is not organizing the results based off the lowest buy now price.
Each result, the order of buyNowPrice is: 1.06, 0.006, 0.7 - which is not ordered properly.
Switching to desc has no affect, so I don't believe the agg is running at all?
EDIT:
Using the suggestion below my query now looks like:
{
"query": {
"bool": {
"must": [
{
"term": {
"data.resellable": true
}
}
]
}
},
"from": 0,
"size": 5,
"aggs": {
"lowestPrice": {
"terms": {
"field": "data.buyNowPrice",
"order": {
"lowest_price": "asc"
}
},
"aggs": {
"lowest_price": {
"min": {
"field": "data.buyNowPrice"
}
},
"lowest_price_top_hits": {
"top_hits": {
"size": 5
}
}
}
}
}
}
With the results of the query being:
total: { value: 3, relation: 'eq' },
max_score: 0.2876821,
hits: [
{
_index: 'education',
_type: 'listing',
_id: '4emEe_r_x5DRCc5', <--- buyNowPrice of 0.006
_score: 0.2876821,
_source: [Object]
},
{
_index: 'education',
_type: 'listing',
_id: '4ee_r_x5DRCc5', <--- buyNowPrice of 1.006
_score: 0.18232156,
_source: [Object]
},
{
_index: 'education',
_type: 'listing',
_id: '4444_r_x5DRCc5', <--- buyNowPrice of 0.7
_score: 0.18232156,
_source: [Object]
}
]
}
EDIT 2:
Removing the query for resellable = true the aggregation will sort properly and return the items in the proper order. But with the query for resellable included, it does not.
I'm assuming this has to do with the _score property overriding the sorting from agg? How would this be fixed
You can use a bucket sort aggregation that is a parent pipeline
aggregation which sorts the buckets of its parent multi-bucket
aggregation. Zero or more sort fields may be specified together with
the corresponding sort order.
Adding a working example (using the same index data as given in the question), search query, and search result
Search Query:
{
"query": {
"bool": {
"must": [
{
"term": {
"data.resellable": true
}
}
]
}
},
"from": 0,
"size": 5,
"aggs": {
"source": {
"terms": {
"field": "data.buyNowPrice"
},
"aggs": {
"latest": {
"top_hits": {
"_source": {
"includes": [
"data.buyNowPrice",
"data.id"
]
}
}
},
"highest_price": {
"max": {
"field": "data.buyNowPrice"
}
},
"bucket_sort_order": {
"bucket_sort": {
"sort": {
"highest_price": {
"order": "desc"
}
}
}
}
}
}
}
}
Search Result:
"buckets": [
{
"key": 1.0499999523162842,
"doc_count": 1,
"highest_price": {
"value": 1.0499999523162842
},
"latest": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.08701137,
"hits": [
{
"_index": "stof_64364468",
"_type": "_doc",
"_id": "3",
"_score": 0.08701137,
"_source": {
"data": {
"id": "4emEe_r_x5DRCc5",
"buyNowPrice": 1.05 <-- note this
}
}
}
]
}
}
},
{
"key": 0.699999988079071,
"doc_count": 1,
"highest_price": {
"value": 0.699999988079071
},
"latest": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.08701137,
"hits": [
{
"_index": "stof_64364468",
"_type": "_doc",
"_id": "2",
"_score": 0.08701137,
"_source": {
"data": {
"id": "4emEe_r_x5DRCc5",
"buyNowPrice": 0.7 <-- note this
}
}
}
]
}
}
},
{
"key": 0.006000000052154064,
"doc_count": 1,
"highest_price": {
"value": 0.006000000052154064
},
"latest": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.08701137,
"hits": [
{
"_index": "stof_64364468",
"_type": "_doc",
"_id": "1",
"_score": 0.08701137,
"_source": {
"data": {
"id": "4emEe_r_x5DRCc5",
"buyNowPrice": 0.006 <-- note this
}
}
}
]
}
}
}
]
Update 1:
If you modify your search query as :
{
"query": {
"bool": {
"must": [
{
"term": {
"data.resellable": true
}
}
]
}
},
"aggs": {
"lowestPrice": {
"terms": {
"field": "data.buyNowPrice",
"order": {
"lowest_price": "asc" <-- change the order here
}
},
"aggs": {
"lowest_price": {
"min": {
"field": "data.buyNowPrice"
}
},
"lowest_price_top_hits": {
"top_hits": {
"size": 5
}
}
}
}
}
}
Running the above search query also, you will get your required results.
I'm converting MongoDB Query to Elasticsearch in NodeJS platform. While developing I'm facing some difficulties with grouping and filtering data (getting nested objects like hits.hits._source) within Elasticsearch Query like we doing in MongoDB Query.
Example:-
UserModel.aggregate([
{
$match: {
uId: req.body.uId, timestamp: { $gte: req.body.date, $lte: new Date() }
},
},
{
$group: {
_id: "$eId",
location: {
$push: {
time: "$timestamp", lat: "$lat"
}
},
timestamp: {
$push: "$timestamp"
},
testId: { $first: "$testId" },
}
},
{
$project: {
eId: 1, location: 1, testId: 1, max: { $max: "$timestamp" }
}
},
{ $unwind: { path: "$location", preserveNullAndEmptyArrays: true } },
{
$redact: {
$cond: {
if: { $eq: ["$location.time", "$max"] },
then: "$$DESCEND",
else: "$$PRUNE"
}
}
},
{
$project: {
eId: 1, latitude: "$location.lat", testId: 1
}
},
]).exec(function (err, result) {
console.log(result)
});
What will be the equivalent query in Elasticsearch?
I'm looking for solution with grouping, unwinding and projecting (MongoDB concepts to Elasticsearch) required data with minimal nested response.
Thanks in Advance.
EDIT:-
Adding Elasticsearch Document:-
{
"timestamp": "2019-10-08T:02:50:15.54Z",
"status" : 1,
"eId": "5d5d7ce0c89852e7bad4a407",
"location": [
2.000,
34.5664111801
],
"zId": "5d5d7ce0c89852e7bad4a4ef"
},
{
"timestamp": "2019-10-09T:02:50:15.54Z",
"status" : 1,
"eId": "5d5d7ce0c89852e7bad4a408",
"location": [
2.100,
35.5664111801
],
"zId": "5d5d7ce0c89852e7bad4a4ef"
},
{
"timestamp": "2019-10-09T:03:50:15.54Z",
"status" : 1,
"eId": "5d5d7ce0c89852e7bad4a407",
"location": [
4.100,
35.5664111801
],
"zId": "5d5d7ce0c89852e7bad4a4ef"
},
{
"timestamp": "2019-10-09T:03:40:15.54Z",
"status" : 1,
"eId": "5d5d7ce0c89852e7bad4a407",
"location": [
2.100,
35.5664111801
],
"zId": "5d5d7ce0c89852e7bad4a4e1"
},
{
"timestamp": "2019-10-10T:03:40:15.54Z",
"status" : 1,
"eId": "5d5d7ce0c89852e7bad4a407",
"location": [
3.100,
35.5664111801
],
"zId": "5d5d7ce0c89852e7bad4a4e1"
}
Match with status =1, and Group By eId
With that results, group by timestamp and get max timestamp value
Expected Result:-
[
{
"_id": "5d5d7ce0c89852e7bad4a407",
"max": "2019-10-10T:03:40:15.54Z", // max timestamp
"zId": [
"5d5d7ce0c89852e7bad4a4e1",
"5d5d7ce0c89852e7bad4a4ef"
]
},
{
"_id": "5d5d7ce0c89852e7bad4a408",
"max": "2019-10-09T:02:50:15.54Z",
"zId": [
"5d5d7ce0c89852e7bad4a4ef"
]
}, // ...etc
]
Thanks for the documents. Sadly, I do not know any way to retrieve only the documents having the max timestamp field value.
The following query will allow you to filter by status and group by eId then get the max timestamp value, but it will not return the documents having the max timestamp value.
{
"size": 0,
"query": {
"term": {
"status": 1
}
},
"aggregations": {
"eId_group": {
"terms": {
"field": "eId"
},
"aggregations": {
"max_timestamp": {
"max": {
"field": "timestamp"
}
}
}
}
}
}
This second query use a top_hits aggregation to retrieve the documents grouped by eId. The returned documents are sorted by decreasing timestamp value so the documents having the max timestamp will be firsts, but you may also get documents with different timestamps.
{
"size": 0,
"query": {
"term": {
"status": 1
}
},
"aggregations": {
"eId_group": {
"terms": {
"field": "eId"
},
"aggregations": {
"max_timestamp": {
"max": {
"field": "timestamp"
}
},
"top_documents": {
"top_hits": {
"size": 20,
"sort": { "timestamp": "desc"}
}
}
}
}
}
}
I used the following mapping for the index
PUT /test_index
{
"mappings": {
"properties": {
"timestamp": {
"type": "date"
},
"eId": {
"type": "keyword"
},
"zId": {
"type": "keyword"
},
"status": {
"type": "keyword"
}
}
}
}
I have a json object like this:
JsonQuery = '{
"from": 0,
"size": 200,
"sort": [{
"Modified": {
"order": "desc"
}
}],
"query": {
"bool": {
"must": [{
"term": {
"CollectionId": {
"value": "abcd"
}
}
}, {
"terms": {
"Container": ["en-us"]
}
}],
"must_not": [{
"wildcard": {
"_type": {
"value": "##"
}
}
}, {
"bool": {
"filter": {
"exists": {
"field": "DynamicProperties.MainSpec"
}
},
"filter": {
"exists": {
"field": "DynamicProperties.ExtendedSpec"
}
}
}
}]
}
}
}';
I am creating a javascript object by doing
var obj = JSON.parse(JsonQuery);
I go in chrome console and I write obj and hit enter and it displays the object properly but when I try to access the property of the object, it keeps saying undefined.
For example: I am using obj.size.
You only have a gremlin on your first line (JsonQuery = '), try to remove it and retry.
jsonQuery = '{"from":0,"size":200,"sort":[{"Modified":{"order":"desc"}}],"query":{"bool":{"must":[{"term":{"CollectionId":{"value":"abcd"}}},{"terms":{"Container":["en-us"]}}],"must_not":[{"wildcard":{"_type":{"value":"##"}}},{"bool":{"filter":{"exists":{"field":"DynamicProperties.MainSpec"}},"filter":{"exists":{"field":"DynamicProperties.ExtendedSpec"}}}}]}}}';
var obj = JSON.parse(jsonQuery);
console.log(obj.size);
Here is my sample Data:
{
"_id": {
"$oid": "5654a8f0d487dd1434571a6e"
},
"ValidationDate": {
"$date": "2015-11-24T13:06:19.363Z"
},
"DataRaw": " WL 00100100012015-08-28 02:44:17+0000+ 16.81 8.879 1084.00",
"ReadingsAreValid": true,
"locationID": " WL 001",
"Readings": {
"pH": {
"value": 8.879
},
"SensoreDate": {
"value": {
"$date": "2015-08-28T02:44:17.000Z"
}
},
"temperature": {
"value": 16.81
},
"Conductivity": {
"value": 1084
}
},
"HMAC":"ecb98d73fcb34ce2c5bbcc9c1265c8ca939f639d791a1de0f6275e2d0d71a801"
}
I am trying to group average values by two hours interval and have the following aggregation query.
Query = [{"$unwind":"$Readings"},
{'$group' : { "_id": {
"year": { "$year": "$Readings.SensoreDate.value" },
"dayOfYear": { "$dayOfYear": "$Readings.SensoreDate.value" },
"interval": {
"$subtract": [
{ "$hour": "$Readings.SensoreDate.value"},
{ "$mod": [{ "$hour": "$Readings.SensoreDate.value"},2]}
]
}
}},
'AverageTemp' : { '$avg' : '$Readings.temperature.value'}, "AveragePH": {"$avg" : "$Readings.pH.value"}, "AverageConduc": {"$avg" : "$Readings.Conductivity.value"}}
, {"$limit":10}]
This gives me an error saying
A pipeline stage specification object must contain exactly one field. and I have done all research but can't get the desired results.
After some formatting, your present aggregation pipeline looks like:
Query = [
{ "$unwind": "$Readings" },
{
'$group' : {
"_id": {
"year": { "$year": "$Readings.SensoreDate.value" },
"dayOfYear": { "$dayOfYear": "$Readings.SensoreDate.value" },
"interval": {
"$subtract": [
{ "$hour": "$Readings.SensoreDate.value"},
{
"$mod": [
{ "$hour": "$Readings.SensoreDate.value" },
2
]
}
]
}
}
},
'AverageTemp' : { '$avg' : '$Readings.temperature.value' },
"AveragePH": { "$avg" : "$Readings.pH.value" },
"AverageConduc": { "$avg" : "$Readings.Conductivity.value" }
},
{ "$limit": 10 }
]
with which mongo is complaining
A pipeline stage specification object must contain exactly one field.
because it's failing to recognise the misplaced fields
'AverageTemp' : { '$avg' : '$Readings.temperature.value' },
"AveragePH": { "$avg" : "$Readings.pH.value" },
"AverageConduc": { "$avg" : "$Readings.Conductivity.value" }
A correct pipeline should have these fields within the $group pipeline stage, so a working pipeline follows:
Query = [
{ "$unwind": "$Readings" },
{
"$group" : {
"_id": {
"year": { "$year": "$Readings.SensoreDate.value" },
"dayOfYear": { "$dayOfYear": "$Readings.SensoreDate.value" },
"interval": {
"$subtract": [
{ "$hour": "$Readings.SensoreDate.value"},
{
"$mod": [
{ "$hour": "$Readings.SensoreDate.value" },
2
]
}
]
}
},
"AverageTemp" : { "$avg" : "$Readings.temperature.value" },
"AveragePH": { "$avg" : "$Readings.pH.value" },
"AverageConduc": { "$avg" : "$Readings.Conductivity.value" }
}
},
{ "$limit": 10 }
]
How can I add a filter after an $lookup or is there any other method to do this?
My data collection test is:
{ "_id" : ObjectId("570557d4094a4514fc1291d6"), "id" : 100, "value" : "0", "contain" : [ ] }
{ "_id" : ObjectId("570557d4094a4514fc1291d7"), "id" : 110, "value" : "1", "contain" : [ 100 ] }
{ "_id" : ObjectId("570557d4094a4514fc1291d8"), "id" : 120, "value" : "1", "contain" : [ 100 ] }
{ "_id" : ObjectId("570557d4094a4514fc1291d9"), "id" : 121, "value" : "2", "contain" : [ 100, 120 ] }
I select id 100 and aggregate the childs:
db.test.aggregate([ {
$match : {
id: 100
}
}, {
$lookup : {
from : "test",
localField : "id",
foreignField : "contain",
as : "childs"
}
}]);
I get back:
{
"_id":ObjectId("570557d4094a4514fc1291d6"),
"id":100,
"value":"0",
"contain":[ ],
"childs":[ {
"_id":ObjectId("570557d4094a4514fc1291d7"),
"id":110,
"value":"1",
"contain":[ 100 ]
},
{
"_id":ObjectId("570557d4094a4514fc1291d8"),
"id":120,
"value":"1",
"contain":[ 100 ]
},
{
"_id":ObjectId("570557d4094a4514fc1291d9"),
"id":121,
"value":"2",
"contain":[ 100, 120 ]
}
]
}
But I want only childs that match with "value: 1"
At the end I expect this result:
{
"_id":ObjectId("570557d4094a4514fc1291d6"),
"id":100,
"value":"0",
"contain":[ ],
"childs":[ {
"_id":ObjectId("570557d4094a4514fc1291d7"),
"id":110,
"value":"1",
"contain":[ 100 ]
},
{
"_id":ObjectId("570557d4094a4514fc1291d8"),
"id":120,
"value":"1",
"contain":[ 100 ]
}
]
}
The question here is actually about something different and does not need $lookup at all. But for anyone arriving here purely from the title of "filtering after $lookup" then these are the techniques for you:
MongoDB 3.6 - Sub-pipeline
db.test.aggregate([
{ "$match": { "id": 100 } },
{ "$lookup": {
"from": "test",
"let": { "id": "$id" },
"pipeline": [
{ "$match": {
"value": "1",
"$expr": { "$in": [ "$$id", "$contain" ] }
}}
],
"as": "childs"
}}
])
Earlier - $lookup + $unwind + $match coalescence
db.test.aggregate([
{ "$match": { "id": 100 } },
{ "$lookup": {
"from": "test",
"localField": "id",
"foreignField": "contain",
"as": "childs"
}},
{ "$unwind": "$childs" },
{ "$match": { "childs.value": "1" } },
{ "$group": {
"_id": "$_id",
"id": { "$first": "$id" },
"value": { "$first": "$value" },
"contain": { "$first": "$contain" },
"childs": { "$push": "$childs" }
}}
])
If you question why would you $unwind as opposed to using $filter on the array, then read Aggregate $lookup Total size of documents in matching pipeline exceeds maximum document size for all the detail on why this is generally necessary and far more optimal.
For releases of MongoDB 3.6 and onwards, then the more expressive "sub-pipeline" is generally what you want to "filter" the results of the foreign collection before anything gets returned into the array at all.
Back to the answer though which actually describes why the question asked needs "no join" at all....
Original
Using $lookup like this is not the most "efficient" way to do what you want here. But more on this later.
As a basic concept, just use $filter on the resulting array:
db.test.aggregate([
{ "$match": { "id": 100 } },
{ "$lookup": {
"from": "test",
"localField": "id",
"foreignField": "contain",
"as": "childs"
}},
{ "$project": {
"id": 1,
"value": 1,
"contain": 1,
"childs": {
"$filter": {
"input": "$childs",
"as": "child",
"cond": { "$eq": [ "$$child.value", "1" ] }
}
}
}}
]);
Or use $redact instead:
db.test.aggregate([
{ "$match": { "id": 100 } },
{ "$lookup": {
"from": "test",
"localField": "id",
"foreignField": "contain",
"as": "childs"
}},
{ "$redact": {
"$cond": {
"if": {
"$or": [
{ "$eq": [ "$value", "0" ] },
{ "$eq": [ "$value", "1" ] }
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}}
]);
Both get the same result:
{
"_id":ObjectId("570557d4094a4514fc1291d6"),
"id":100,
"value":"0",
"contain":[ ],
"childs":[ {
"_id":ObjectId("570557d4094a4514fc1291d7"),
"id":110,
"value":"1",
"contain":[ 100 ]
},
{
"_id":ObjectId("570557d4094a4514fc1291d8"),
"id":120,
"value":"1",
"contain":[ 100 ]
}
]
}
Bottom line is that $lookup itself cannot "yet" query to only select certain data. So all "filtering" needs to happen after the $lookup
But really for this type of "self join" you are better off not using $lookup at all and avoiding the overhead of an additional read and "hash-merge" entirely. Just fetch the related items and $group instead:
db.test.aggregate([
{ "$match": {
"$or": [
{ "id": 100 },
{ "contain.0": 100, "value": "1" }
]
}},
{ "$group": {
"_id": {
"$cond": {
"if": { "$eq": [ "$value", "0" ] },
"then": "$id",
"else": { "$arrayElemAt": [ "$contain", 0 ] }
}
},
"value": { "$first": { "$literal": "0"} },
"childs": {
"$push": {
"$cond": {
"if": { "$ne": [ "$value", "0" ] },
"then": "$$ROOT",
"else": null
}
}
}
}},
{ "$project": {
"value": 1,
"childs": {
"$filter": {
"input": "$childs",
"as": "child",
"cond": { "$ne": [ "$$child", null ] }
}
}
}}
])
Which only comes out a little different because I deliberately removed the extraneous fields. Add them in yourself if you really want to:
{
"_id" : 100,
"value" : "0",
"childs" : [
{
"_id" : ObjectId("570557d4094a4514fc1291d7"),
"id" : 110,
"value" : "1",
"contain" : [ 100 ]
},
{
"_id" : ObjectId("570557d4094a4514fc1291d8"),
"id" : 120,
"value" : "1",
"contain" : [ 100 ]
}
]
}
So the only real issue here is "filtering" any null result from the array, created when the current document was the parent in processing items to $push.
What you also seem to be missing here is that the result you are looking for does not need aggregation or "sub-queries" at all. The structure that you have concluded or possibly found elsewhere is "designed" so that you can get a "node" and all of it's "children" in a single query request.
That means just the "query" is all that is really needed, and the data collection ( which is all that is happening since no content is really being "reduced" ) is just a function of iterating the cursor result:
var result = {};
db.test.find({
"$or": [
{ "id": 100 },
{ "contain.0": 100, "value": "1" }
]
}).sort({ "contain.0": 1 }).forEach(function(doc) {
if ( doc.id == 100 ) {
result = doc;
result.childs = []
} else {
result.childs.push(doc)
}
})
printjson(result);
This does exactly the same thing:
{
"_id" : ObjectId("570557d4094a4514fc1291d6"),
"id" : 100,
"value" : "0",
"contain" : [ ],
"childs" : [
{
"_id" : ObjectId("570557d4094a4514fc1291d7"),
"id" : 110,
"value" : "1",
"contain" : [
100
]
},
{
"_id" : ObjectId("570557d4094a4514fc1291d8"),
"id" : 120,
"value" : "1",
"contain" : [
100
]
}
]
}
And serves as proof that all you really need to do here is issue the "single" query to select both the parent and children. The returned data is just the same, and all you are doing on either server or client is "massaging" into another collected format.
This is one of those cases where you can get "caught up" in thinking of how you did things in a "relational" database, and not realize that since the way the data is stored has "changed", you no longer need to use the same approach.
That is exactly what the point of the documentation example "Model Tree Structures with Child References" in it's structure, where it makes it easy to select parents and children within one query.