I need a way to match the closest number of an elasticsearch document.
I'm wanting to use elastic search to filter quantifiable attributes and have been able to achieve hard limits using range queries accept that results that are outside of that result set are skipped. I would prefer to have the closest results to multiple filters match.
const query = {
query: {
bool: {
should: [
{
range: {
gte: 5,
lte: 15
}
},
{
range: {
gte: 1979,
lte: 1989
}
}
]
}
}
}
const results = await client.search({
index: 'test',
body: query
})
Say I had some documents that had year and sales. In the snippet is a little example of how it would be done in javascript. It runs through the entire list and calculates a score, then based on that score it sorts them, at no point are results filtered out, they are just organized by relevance.
const data = [
{ "item": "one", "year": 1980, "sales": 20 },
{ "item": "two", "year": 1982, "sales": 12 },
{ "item": "three", "year": 1986, "sales": 6 },
{ "item": "four", "year": 1989, "sales": 4 },
{ "item": "five", "year": 1991, "sales": 6 }
]
const add = (a, b) => a + b
const findClosestMatch = (filters, data) => {
const scored = data.map(item => ({
...item,
// add the score to a copy of the data
_score: calculateDifferenceScore(filters, item)
}))
// mutate the scored array by sorting it
scored.sort((a, b) => a._score.total - b._score.total)
return scored
}
const calculateDifferenceScore = (filters, item) => {
const result = Object.keys(filters).reduce((acc, x) => ({
...acc,
// calculate the absolute difference between the filter and data point
[x]: Math.abs(filters[x] - item[x])
}), {})
// sum the total diffences
result.total = Object.values(result).reduce(add)
return result
}
console.log(
findClosestMatch({ sales: 10, year: 1984 }, data)
)
<script src="https://codepen.io/synthet1c/pen/KyQQmL.js"></script>
I'm trying to achieve the same thing in elasticsearch but having no luck when using a function_score query. eg
const query = {
query: {
function_score: {
functions: [
{
linear: {
"year": {
origin: 1984,
},
"sales": {
origin: 10,
}
}
}
]
}
}
}
const results = await client.search({
index: 'test',
body: query
})
There is no text to search, I'm using it for filtering by numbers only, am I doing something wrong or is this not what elastic search is made for and are there any better alternatives?
Using the above every document still has a default score, and I have not been able to get any filter to apply any modifiers to the score.
Thanks for any help, I new to elasticsearch links to articles or areas of the documentation are appreciated!
You had the right idea, you're just missing a few fields in your query to make it work.
It should look like this:
{
"query": {
function_score: {
functions: [
{
linear: {
"year": {
origin: 1984,
scale: 1,
decay: 0.999
},
"sales": {
origin: 10,
scale: 1,
decay: 0.999
}
}
},
]
}
}
}
The scale field is mandatory as it tells elastic how to decay the score, without it the query just fails.
The decay field is not mandatory, however without it elastic does not really know how to calculate the new score to documents so it will end up giving a default score only to documents in the range of origin + scale which is not useful for us.
source docs.
I also recommend you limit the result size to 1 if you want the top scoring document, otherwise you'll have add a sort phase (either in elastic or in code).
EDIT: (AVOID NULLS)
You can add a filter above the functions like so:
{
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"bool": {
"filter": [
{
"bool": {
"must": [
{
"exists": {
"field": "year"
}
},
{
"exists": {
"field": "sales"
}
},
]
}
}
]
}
},
{
"match_all": {}
}
]
}
},
"functions": [
{
"linear": {
"year": {
"origin": 1999,
"scale": 1,
"decay": 0.999
},
"sales": {
"origin": 50,
"scale": 1,
"decay": 0.999
}
}
}
]
}
}
}
Notice i have a little hack going on using match_all query, this is due to filter query setting the score to 0 so by using the match all query i reset it back to 1 for all matched documents.
This can also be achieved in a more "proper" way by altering the functions, a path i choose not to take.
Related
JSON Object:
{
"students_detail": [
{
"student_id": 1,
"name": "abc",
"roll_number": 10
},
{
"student_id": 2,
"name": "pqr",
"roll_number": 12
}
],
"subject_details": [
{
"subject_id": 1,
"subject_name": "math"
},
{
"subject_id": 2,
"subject_name": "english"
}
],
"exam_details": [
{
"exam_id": 1,
"exam_name": "Prelim"
}
],
"mark_details": [
{
"id": 1,
"exam_id": 1,
"subject_id": 1,
"student_id": 1,
"mark": 51
},
{
"id": 2,
"exam_id": 1,
"subject_id": 2,
"student_id": 2,
"mark": 61
}
]
}
Ouptut:
{
"student_mark_details": [
{
"abc": {
"roll_number": 10,
"Prelim": [
{
"subject_name": "math",
"mark": 51
}
]
},
"pqr": {
"roll_number": 12,
"Prelim": [
{
"subject_name": "english",
"mark": 61
}
]
}
}
]
}
i tried using loops and accesing student_id in both object and comparing them but code gets too messy and complex,is there any way i can use map() or filter() in this or any other method.
i have no idea where to start,my brain is fried i know im asking lot but help will be appreciated (any link/source where i can learn this is fine too)
Your output object really has a weird format: student_mark_details is an array of size 1 that contains an object that has all your students in it. Anyway, this should give you what you need. It is a format that you find often because it is a system with primary key and secondary key used a lot in databases.
The key to manage that is to start with what is at the core of what you are looking for (here, you want to describe students, so you should start from there), and then navigate the informations you need by using the primary/secondary keys. In JS, you can use the find() function in the case where one secondary key can be linked only to one primary key (ex: one mark is linked to one exam), and the filter() function when a secondary key can be linked to multiple secondary keys (ex: a student is linked to many grades).
I am not sure if this is 100% what you need because there are maybe some rules that are not shown in your example, but it solves the problem you submitted here. You might have to test it and change it depending of those rules. I don't know what your level is so I commented a lot
const data = {
"students_detail": [
{
"student_id": 1,
"name": "abc",
"roll_number": 10
},
{
"student_id": 2,
"name": "pqr",
"roll_number": 12
}
],
"subject_details": [
{
"subject_id": 1,
"subject_name": "math"
},
{
"subject_id": 2,
"subject_name": "english"
}
],
"exam_details": [
{
"exam_id": 1,
"exam_name": "Prelim"
}
],
"mark_details": [
{
"id": 1,
"exam_id": 1,
"subject_id": 1,
"student_id": 1,
"mark": 51
},
{
"id": 2,
"exam_id": 1,
"subject_id": 2,
"student_id": 2,
"mark": 61
}
]
}
function format(data) {
const output = {
"student_mark_details": [{}]
};
//I start by looping over the students_detail because in the output we want a sumary by student
data.students_detail.forEach(student => {
//Initialization of an object for a particular student
const individualStudentOutput = {}
const studentId = student.student_id;
const studentName = student.name;
//The rollNumber is easy to get
individualStudentOutput.roll_number = student.roll_number;
//We then want to find the exams that are linked to our student. We do not have that link directly, but we know that our student is linked to some marks
//Finds all the marks that correspond to the student
const studentMarkDetails = data.mark_details.filter(mark => mark.id === studentId);
studentMarkDetails.forEach(individualMark => {
//Finds the exam that corresponds to our mark
const examDetail = data.exam_details.find(exam => individualMark.exam_id === exam.exam_id);
//Finds the subject that corresponds to our mark
const subjectDetail = data.subject_details.find(subject => individualMark.subject_id === subject.subject_id);
//We then create a grade that we will add to our exam
const grade = {
subject_name: subjectDetail.subject_name,
mark: individualMark.mark
}
//We then want to add our grade to our exam, but we don't know if our output has already have an array to represent our exam
//So in the case where it does not exist, we create one
if (!individualStudentOutput[examDetail.exam_name]) {
individualStudentOutput[examDetail.exam_name] = [];
}
//We then add our grade to the exam
individualStudentOutput[examDetail.exam_name].push(grade);
});
//Now that we have finished our individual output for a student, we add it to our object
output.student_mark_details[0][studentName] = individualStudentOutput;
})
return output;
}
console.log(JSON.stringify(format(data)))
Trying to sort the object with the max date. One id may have multiples dates. Below is the format of the object where id:123 has two dates. So I am trying to take the max date for the user 123. I used the sort method and storing the array[0] but still there is something missing.
var arr = [
{
"scores": [
{
"score": 10,
"date": "2021-06-05T00:00:00"
}
],
"id": "3212"
},
{
"scores": [
{
"score": 10,
"date": "2021-06-05T00:00:00"
},
{
"score": 20,
"date": "2021-05-05T00:00:00"
}
],
"id": "123"
},
{
"scores": [
{
"score": 5,
"date": "2021-05-05T00:00:00"
}
],
"id": "321"
}
]
What I tried is
_.each(arr, function (users) {
users.scores = users.scores.filter(scores => new Date(Math.max.apply(null, scores.date)));
return users;
});
Expecting the output to look like the following with the max date selected.
[
{
"scores": [
{
"score": 10,
"date": "2021-06-05T00:00:00"
}
],
"id": "3212"
},
{
"scores": [
{
"score": 10,
"date": "2021-06-05T00:00:00"
}
],
"id": "123"
},
{
"scores": [
{
"score": 5,
"date": "2021-05-05T00:00:00"
}
],
"id": "321"
}
]
Your filter callback function is not performing a comparison to filter the correct element. Also, although applying the "maximum" algorithm on the dates as string would be fine in your case (because of the date format you have), it would be much safer to transform the date strings into date objects to consistantly get correct results regardless of the format.
In the solution below, you can use a combination of Array.map() and Array.sort() to copy and process your data in the correct result.
const data = [{
'scores': [{
'score': 10,
'date': '2021-06-05T00:00:00'
}],
'id': '3212'
}, {
'scores': [{
'score': 10,
'date': '2021-06-05T00:00:00'
}, {
'score': 20,
'date': '2021-05-05T00:00:00'
}],
'id': '123'
}, {
'scores': [{
'score': 5,
'date': '2021-05-05T00:00:00'
}],
'id': '321'
}];
// map the data and return the updated objects as the result
const result = data.map((user) => {
// copy the scores array to not mutate the original data
const sortedScores = user.scores.slice();
// sort the scores array by date descending
sortedScores.sort((a, b) => (new Date(b.date) - new Date(a.date)));
// return the same user with the first score from the sorted array
return {
...user,
scores: [sortedScores[0]]
};
});
console.log(result);
So, I have a json which looks a little bit like this:
{
"data": {
"user": {
"edge_followed_by": {
"count": 22,
"page_info": {
"has_next_page": true,
"end_cursor": "Base64"
},
"edges": [
{
"node": {
"id": "id",
"username": "Username",
"full_name": "played",
"profile_pic_url": "URL"
}
}
]
}
}
}
}
And I want to filter out the username. How do I do that?
You could retrieve it with a map function there
const dataSample = {
"data": {
"user": {
"edge_followed_by": {
"count": 22,
"page_info": {
"has_next_page": true,
"end_cursor": "Base64"
},
"edges": [
{
"node": {
"id": "id",
"username": "Username",
"full_name": "played",
"profile_pic_url": "URL"
}
}
]
}
}
}
}
const getUsernames = data => {
return data.data.user.edge_followed_by.edges.map(e => e.node.username)
}
console.log(getUsernames(dataSample))
:)
This can be a little tricky to understand from the question first of all.
My interpretation of this is you want to extract a username
"Filtering" also could mean you want to remove something from a collection that passes a condition (or test) of some kind.
For example: Removing all even numbers from an array
let x = [1, 2, 4, 5, 6];
let filtered = x.filter(value => value % 2 === 0);
Now, I've looked at your json, and I think the best point of targeting this is by getting the "edges" property and running it through an inbuilt function like map; that could be used to get usernames. The edges is an array as well.
data.user.edge_followed_by.edges.map(userObject => userObject.username)
That would effectively remove all usernames from the edges if your tech stack of choice was javascript.
I got this info from a post like: https://coderin90.com/blog/2019/map-js
So, i'm doing sort with priority that already solved on my previous post: Elastic - Sorting value with priority
But i found the new problem, when i want to filter data with specified field, the query sort of timeInt doesn't work anymore
I've tried this query, but the query sort of timeInt doesn't work. Here is my query:
{
query: {
bool: {
must: {
match: {
'flag_type': "contract"
}
},
should: [
{
match: {
timeInt: {
query: 0,
boost: 3
}
}
},
{
match: {
timeInt: {
query: 1,
boost: 2
}
}
}
]
}
},
sort: [
{ _score: "desc"},
{
timeInt: {
order: "desc"
}
}
]
}
NOTE: If i delete the query sort: { _score: "desc" } desc sort is working properly, but can't boost up value 0 / 1 to the top.
Expected result:
0, 1, 100, 99, 98, etc...
Current result:
If i delete { _score: "desc" }:
100, 99, 98, 97, 96, etc...
With query above:
0, 1, 15, 99, 100, 70, 2, etc...
What's wrong with my query?
Please help me
Thank you.
Your query means: first, sort by score. If several documents have the same score, then sort by the 'timeInt' field among them. In Elasticsearch's view the result is correct.
Apologies for the delay in reply, here is what you can do additionally on top of having the boosting logic.
I've created a custom sorting logic. So basically what this would do is, it would first sort the results based on _score and then based on sort logic of timeInt field, the results would get sorted.
Since I've boosted 0 and 1 and that the values of boost as well as their values in sorting logic based on custom sort I've implemented, would always return 0 and 1 as per what you are looking for.
This should help you get what you are looking for.
POST myintindex/_search
{
"query":{
"bool": {
"must": [
{
"match": {
"flag_type": "contract"
}
}
],
"should": [
{
"match": {
"timeInt": {
"query": "0",
"boost": 200000
}
}
},
{
"match": {
"timeInt": {
"query": "1",
"boost": 10000
}
}
}
]
}
},
"sort":[
{ "_score": {"order": "desc"}},
{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"source":"""
String st = String.valueOf(doc['timeInt'].value);
if(params.scores.containsKey(st)) {
return params.scores[st];
}
return doc['timeInt'].value;
""",
"params":{
"scores":{
"0":200000,
"1":100000
}
}
},
"order":"desc"
}
}
]
}
Once again, I apologise it took me this long to respond, but I hope this helps!
I have a mapreduce function I want to write in mongoDB to count how many times a character has been played with. The relevant part from my json looks like this:
"playerInfo": {
"Player 1": {
"info":{
"characterId":17
}
},
"Player 2": {
"info":{
"characterId":20
}
}
}
I want to count how many times every "characterId" persists in my documents, there are 10 players, from player 1 to player 10.
Two questions:
1. How do I use mapreduce in mongo when I have a number as a part of my key.
2. How do I concatinate string in mapreduce so the code that is shown lower can be correct?
db.LoL.mapReduce( function()
{
for (var i in this.playerInfo)
{
emit(this.playerInfo.'Player '+(i).info.characterId, 1);
}
},
function(keys, values) {
return Array.sum(values)
}, {out: { merge: "map_reduce_example5" } } )
Thank you very much for your answers!
So there are really a couple of things wrong with the structure here and you really "should" change it
The mapReduce is pretty simple since you can just iterate the key names via Object.keys()
db.LoL.mapReduce(
function() {
Object.keys(this.playerInfo).forEach(function(key) {
emit({ "player": key, "characterId": this.playerInfo[key].info.characterId }, 1)
})
},
function(values) { return Array.sum(values) },
{
"query": { "playerInfo": { "$exists": true } }
"out": { "inline": 1 }
}
)
If you instead change the data format to use an array, and properties with values instead of named keys:
{
"playerInfo": [
{ "player": "Player 1", "characterId": 17 },
{ "player": "Player 2", "characterId": 20 }
]
}
Then the .aggregate() method is much faster in processing this, and returns a cursor for large result sets:
db.collection.aggregate([
{ "$unwind": "$playerInfo" },
{ "$group": {
"_id": "$playerInfo",
"count": { "$sum": 1 }
}}
])
With MongoDB 3.4 and greater you can even use on your present structure
db.LoL.aggregate([
{ "$project": {
"playerInfo": { "$objectToArray": "$playerInfo" }
}},
{ "$unwind": "$playerInfo" },
{ "$group": {
"_id": {
"player": "$playerInfo.k",
"characterId": "$playerInfo.v.info.characterId"
},
"count": { "$sum": 1 }
}}
])
Which is basically the same as the mapReduce, only a lot faster due to the native operators used as opposed to JavaScript evaluation, which runs much slower.