I have a search text field in a web GUI for an Elasticsearch index which has two different types of fields that need to be searched on; fulltext (description) and an exact match (id).
Question 1 - How should I add the second exact match query for the id field? When I search for IDs, the exact ID is within the result "set," but it should be the only result.
The description search seems to be working correctly, just not the ID search.
"multi_match": {
"fields": ["id", "description"],
"query": query,
"description": {
"fuzziness": 1,
"operator": "and"
}
}
I think that you are looking for something like this. Try it.
{
"query": {
"bool": {
"must": [ {
"match": {
"description": {
"fuzziness": 1,
"query": "yourfuzzinessdescription"
}
}
},
{
"term" : {
"id" : 1
}
}
]
}
}
}
Dani's query structure is probably what you are looking for but perhaps you also need an alternative to the fuzziness aspect of the query. Or maybe not - can you please provide an example of an user input for the description field and what you expect that to match that up with?
Looking at Match Query documentation and Elasticsearch Common Options - fuzziness, that fuzziness is based on Levenshtein Distance. So, that query corresponds to allowing an edit distance of 1 and will allow minor misspellings and such. If you keep the and operator in the original query, then all terms in the query must get matched. Given you have a document with a description like "search server based on Lucene", you will not be able to retrieve that with a description query like "lucene based search server". Using an analyzer with the stop filter and a stemming filter in combination with a match phrase query with a slop would work? But again, it depends on what you are trying.
Related
I am working on a search application that uses algolia for indexing. When the user types a search term into the text input box, we want to populate the autocompletion dropdown with events. Every event belongs to an event category as well.
Example:
{
"category": "Disney",
"events": [
{
"title": "Ice Skating"
},
{
"title": "Peter-Pan"
},
{
"title": "Roller Skating"
}
]
}
If someone searches for "skating", we want to pull back the parent category and the child events "Ice Skating" and "Roller Skating" but omit the "Peter-Pan" event.
Is this type of nested filtering possible with Algolia? If so, how would the filtering work? Would it need to be done with JS, will Algolia handle it for me or do we need to create separate indexes for Event Categories and then Individual Events?
Thanks!
Yes, Algolia will automatically filter out Peter Pan and return items with Skating.
As an example I've got some data setup like so:
announcements: [
{
id: ..,
..
author: {
id: ..,
name: 'Preston',
..
}
},
..
]
If I search Announcements for Preston it'll return any announcement that has Preston in the author attribute. This is the default for Algolia and will search the entire record for your search term. This can be slow.
You can go into the Algolia dashboard, or with the API, and under your index's Ranking tab define what you want to search by and ignore.
The first thing you need to do is adding events.title to the searchable attributes. This will make sure that when the end-user types skat, it will match one of the title.
Then you can check what parts of each result is matched using _highlightResult and more specifically filtering based on the matchLevel:
full is when you have a match and none is when you don't.
This filtering should be possible in JS, in the template you use to display the results.
I’m migrating from Mongo to Firebase with Algolia on top to provide the search. But hitting a snag coming up with a comparable way to search in individual elements of a record.
I have an object that stores when a room is available: from and to. Each record can have many individual from/to combos (see the sample below with 2). I want to be able to run a search something like:
roomavailable.from <= 1522195200 AND roomavailable.to >=1522900799
But only have the query search a match within each element, not any facet in all elements. An element query in Mongo works like that. But if I run that query on the record listed below, it will return the record, because the two roomavailable objects satisfy the .from and .to query. I think.
Is there a way to ensure the search is looking only at matching a pair of .from and .to in an individual object/element?
Below is the pertinent part of the record stored in Algolia so you can see the structure.
"roomavailable": [
{
"_id": "rJbdWvY9M",
"from": 1522195200,
"to": 1522799999
},
{
"_id": "r1H_-vKqz",
"from": 1523923200,
"to": 1524268799
}
],
And here is the Mongo (mongoose) equivalent where its searching inside individual elements (this works):
$elemMatch: {
from: {
$lte: moment(dateArray[0]).utc().startOf('day').format()
},
to: {
$gte: moment(dateArray[1]).utc().endOf('day').format()
}
}
I have also tried this query but it seems to still match either the .from AND .to but in any of the the individual roomavailable elements:
index.search({
query: '',
filters: filters,
facetFilters: [roomavailable.from: 1522195200, roomavailable.to: 1524268799],
attributesToRetrieve: [
"roomavailable",
],
restrictHighlightAndSnippetArrays: true
})
I found a couple posts on Algolia discussing using 1 bracket vs. 2 brackets in the facetFilters. I've tried both. Neither work.
Any suggestions would be awesome. Thanks!
Edit: See discussion on Algolia Discourse:
https://discourse.algolia.com/t/how-to-match-multiple-attributes-in-nested-object-with-numericfilters/4887/8
Hi #kanec, thanks for clarifying your question!
Indeed what #Alefort suggested (using roomavailable in a separate index) would be the easiest option since the query I mentioned above will definitely return the results you want. This will mean that you'll have to query the room availability index separately in order to get which IDs are available, so you'll have to use multiple-queries:
https://www.algolia.com/doc/api-reference/api-methods/multiple-queries/
That said, I asked our core API team to see if there's a more reasonable way to approach this issue, but I fear that this is a filter limit due to performance reasons with arrays. You could transform your data structure in the following and index your rooms as an object instead:
[
{
"roomavailable": {
"0": {
"_id": "rJbdWvY9M",
"from": 1522195200,
"to": 1522799999
},
"1": {
"_id": "r1H_-vKqz",
"from": 1523923200,
"to": 1524268799
}
}
}
]
So you can apply the following filter:
{
"filters": "roomavailable.0.from <= 1522195200 AND roomavailable.0.to >= 1522799999 AND roomavailable.1.from <= 1522195200 AND roomavailable.1.to >=1522900799"
}
The downside of this is that you'll need to know the length of roomavailable in order to build the search query on the front-end (you can do so at indexing time by adding a roomavailable_count property) and also this will probably will be less performant with a considerable number of rooms per item; in this case, switching to a dedicated index makes totally sense for the following reasons:
If in your backend you frequently update available rooms you won't impact the other indices' build time
Filters will perform better (as explained above)
Indexing strategy will be simpler to handle
Let me know what you think about this and if it helps you out.
I'm using Algolia's algoliasearch-client-js and autocomplete.js to power searches on my website. That works.
But I also want to include the excerpt/snippet of the text with which the search query matches. How to do that?
My current code is:
autocomplete('#search-input', { hint: true, autoselect: true }, [
{
source: autocomplete.sources.hits(index, { hitsPerPage: 7 }),
displayKey: 'title',
templates: {
footer: '<span class="search-foot">Powered by <img src="/static/assets/algolia-logo.png" width="47" height="15"></span>',
suggestion: function(suggestion) {
return '<div class="search-lang">' +
suggestion._highlightResult.platform.value +
'</div>' +
suggestion._highlightResult.title.value;
}
}
}
]).on('autocomplete:selected', function(event, suggestion, dataset) {
window.location.href = suggestion.url;
});
To highlight the excerpt that caused the query to match with a record, their FAQ says:
The AttributesToSnippet setting is a way to shorten ("snippet") your
long chunks of text to display them in the search results. Just think
about the small pieces of text displayed below a Google result: it's
built from a subset of the sentences of the page content, includes
your matching keywords, and avoid flooding the search results page.
For example, if you limit the number of words of the "description"
attribute to 10, the "_snippetResult.description.value" attribute of
the JSON answer will only contain the 10 best words of this
description.
There's no example of AttributesToSnippet, however. On their Github documentation I find a bit more info:
attributesToHighlight
scope: settings, search
type: array of strings
default: null
Default list of attributes to highlight. If set to null, all indexed
attributes are highlighted.
A string that contains the list of attributes you want to highlight
according to the query. Attributes are separated by commas. You can
also use a string array encoding (for example ["name","address"]). If
an attribute has no match for the query, the raw value is returned. By
default, all indexed attributes are highlighted (as long as they are
strings). You can use * if you want to highlight all attributes.
I'm struggling with translating their abstract, scattered information into a coherent piece of code. Any suggestions?
attributesToIndex, attributesToHighlight and attributesToSnippet are the three main settings used for highlighting.
attributesToIndex is an index setting (you can set it in your dashboard or your back-end, but not in the front-end).
attributesToHighlight are, if not set, equal to the attributesToIndex. They can be set in your index settings, as attributesToIndex, but can also be overridden at query time (and can only contain attributes also in attributesToIndex)
attributesToSnippet are, if not set, equal to an empty array. Each attribute can have a modifier at the end like :10 to say how much words you want in your snippet. Other than that, they work the same way than attributesToHighlight.
Let's take an example:
Index settings
attributesToIndex: ['title', 'description']
attributesToHighlight: ['title']
attributesToSnippet: ['description:3']
Record
{
"title": "Test article",
"description": "A long long long test description long long long",
"link": "https://test.com/test-article"
}
For the query "test", here's basically the JSON of a suggestion you'd get:
{
"title": "Test article",
"description": "A long long long test description long long long",
"link": "https://test.com/test-article",
"_highlightResult": {
"title": {
"value": "<em>Test article</em>"
}
},
"_snippetResult": {
"description": {
"value": "... long <em>test</em> description ..."
}
}
}
Notice that neither description nor link are in the _highlightResult object.
link was ignored from the search since it's not in the attributesToIndex
description is not in the _highlightResult because it's not in attributesToHighlight.
You can also notice that in both _highlightResult and _snippetResult, the test word is wrapped in <em></em> tags. That's the tags you can use to show which words matched.
I've omitted some attributes of the answer that didn't help understand my answer. You can see them in your browser console by adding a small console.log(suggestion) at the beginning of your suggestion function.
I've fixed the problem myself, due to finding a setting in Algolia's dashboard by pure luck. To make the returned search results return the snippet too, I did two things:
1). There's an option in Algolia's dashboard that's named 'Attributes to snippet', which you can find in the 'Display' tab of the particular index that you're searching with.
In my case, I set that option to the record attribute that I wanted to highlight in my search queries like this:
2). After I configured that setting, I could access _snippetResult in the function for the autocomplete.js library. As you can see in the image above, the attribute that I added to the 'Attributes to snippet' option was 'content', and so I access the words that matched with the search query with suggestion._snippetResult.content.value.
My code now is:
autocomplete('#search-input', { hint: true, autoselect: false }, [
{
source: autocomplete.sources.hits(index, { hitsPerPage: 7 }),
displayKey: 'title',
templates: {
footer: '<span class="search-foot">Powered by <img src="/static/assets/algolia-logo.png" width="47" height="15"></span>',
suggestion: function(suggestion) {
return '<div class="search-lang">' +
suggestion._highlightResult.platform.value +
'</div><div class="search-title">' +
suggestion._highlightResult.title.value +
'</div>' + '<div class="search-snippet">' +
suggestion._snippetResult.content.value + '</div>';
}
}
}
]).on('autocomplete:selected', function(event, suggestion, dataset) {
window.location.href = suggestion.url;
});
So to summarise, there is simply a manual option to enable the return of search snippets instead of having to use attributesToSnippet somewhere in the code.
Say I have a URL: aaa.something.com/id that is found in several collections, in many different fields.
I would like to change it to bbb.something.com/id via regex (or similar) to find and replace only the prefix of the URL string.
The following:
db.tests.find({ "url": /^aaa\.something\.com\// }).forEach(function(doc) {
doc.url = doc.url.replace(/^aaa\.something\.com\//, "bbb.something.com/");
db.tests.update({ "_id": doc._id },{ "$set": { "url": doc.name } });
});
assumes that the field is always known to be url.
But in the database, The URL could be found in a number of locations such as:
content.photo
content.media
content.media[i].data
avatar
url
You can a wildcard text index and then use $text to find documents which match the specified regex. Once you get these docs you can write Javascript code for finding keys which match your regex and replacing them as needed.
I'd like to make a simple "chat" where there is a post and answers for them (only 1 deep), I decided to go this way, so a single document would look like this
{
_id: ObjectId(...),
posted: date,
author: "name",
content: "content",
comments: [
{ posted: date,
author: "name2"},
content: '...'
}, ... ]
}
My question is how should I search in the content this way? I'd first need to look for a match in the "parent" content, then the contents in the comments list. How should I do that?
If you can search for a regex within each content, you could use:
{$or : [
{'content':{$regex:'your search regex'}},
{'comments' : { $elemMatch: { 'content':{$regex:'your search regex'}}}]}
Please note that when fetching for results, upon a match to either a parent or a child you will receive the entire mongo document, containing both the parent and the children.
If you want to avoid this (to be sure what you've found), you can possibly run first a regex query on the parent only, and then on the children only, instead of the single $or query.
For more details on $elemMatch take a look at: docs.mongodb.org/manual/reference/operator/query/elemMatch
As was stated in the comments earlier, the basic query to "find" is just a simple matter of using $or here, which also does short circuit to match on the first condition where that returns true. There is only one array element here so no need for $elemMatch, but just use "dot notation" since multiple field matches are not required:
db.messages.find({
"$or": [
{ "content": { "$regex": ".*Makefile.*" } },
{ "comments.content": { "$regex": ".*Makefile.*" } }
]
})
This does actually match the documents that would meet those conditions, and this is what .find() does. However what you seem to be looking for is something a little "funkier" where you want to "discern" between a "parent" result and a "child" result.
That is a little out of the scope for .find() and such manipulation is actually the domain of other operations with MongoDB. Unfortunately as you are looking for "part of a string" to match as your condition, doing a "logical" equivalent of a $regex operation does not exist in something such as the aggregation framework. It would be the best option if it did, but there is no such comparison operator for this, and a logical comparison is what you want. The same would apply to "text" based searches, as there is still a need to discern the parent from the child.
Not the most ideal approach since it does involve JavaScript processing, but the next best option here is mapReduce().
db.messages.mapReduce(
function() {
// Check parent
if ( this.content.match(re) != null )
emit(
{ "_id": this._id, "type": "P", "index": 0 },
{
"posted": this.posted,
"author": this.author,
"content": this.content
}
);
var parent = this._id;
// Check children
this.comments.forEach(function(comment,index) {
if ( comment.content.match(re) != null )
emit(
{ "_id": parent, "type": "C", "index": index },
{
"posted": comment.posted,
"author": comment.author,
"content": comment.content
}
);
});
},
function() {}, // no reduce as all are unique
{
"query": {
"$or": [
{ "content": { "$regex": ".*Makefile.*" } },
{ "comments.content": { "$regex": ".*Makefile.*" } }
]
},
"scope": { "re": /.*Makefile.*/ },
"out": { "inline": 1 }
}
)
Basically the same query to input as this does select the "documents" you want and really just using "scope" here is it makes it a little easier to pass in the regular expression as an argument without re-writing the JavaScript code to include that value each time.
The logic there is simple enough, just to each "de-normalized" element you are testing to see if the regular expression condition was a match for that particular element. The results are returned "de-normalized" and discern between whether the matched element was a parent or a child.
You could take that further and not bother to check the children if the parent was a match just by moving that to else. In the same way you could even just return the "first" child match by some means or another if that was your desire.
Anyhow this should set you on the path to whatever your final code looks like. But this is the basic approach to the only way to are going to get this distinction to be processed on the server, and client side post processing would follow much the same pattern.