How to Do Firebase Special Character Sensitive Search? - javascript

https://example.com/foods.json?orderBy="title"&startAt="Yoğurt"&endAt="Yoğurt\uf8ff"
With the above link, I can search the data in Firebase Realtime Database Rest Api using startAt, endAt query parameters.
How can I find the results when searching with similar special characters?
For example, yoğurt and yogurt, kızartma and kizartma
I want to get the same result when written both ways.
If this is not possible with Firebase, is it possible with String methods in Javascript or Dart programming language?

There is nothing built into Firebase for performing this type of search, so you will have to do your own work to allow it.
The simplest way is to store a secondary value for each string that you want to search, where you map each character back to a single representative value you want to search on.
A very common mapping is for example to map all text to a single case (uppercase or lowercase) so that searches become case-insensitive. You can do similar for the accented characters, mapping the ğ to g. So by combining these two tricks, Yoğurt becomes yogurt.
Same for the ı you show (and İ), mapping all four I variants in the Turkish alphabet to i.
Doing this for all special characters gives you a single string value (in addition to the user input, which you should also still store) that you can use for searches.
For more on this, I recommend also reading Dan's answer here.

Related

What is the best way to do complicated string search on 5M records ? Application layer or DB layer?

I have a use case where I need to do complicated string matching on records of which there are about 5.1 Million of. When I say complicated string matching, I mean using library to do fuzzy string matching. (http://blog.bripkens.de/fuzzy.js/demo/)
The database we use at work is SAP Hana which is excellent for retrieving and querying because it's in memory so I would like to avoid pulling data out of there and re-populating it in memory on the application layer but at the same time I cannot take advantages of the libraries (there is an API for fuzzy matching in the DB but it's not comprehensive enough for us).
What is the middle ground here? If I do pre-processing and associate words in the DB with certain keywords the user might search for I can cut down the overhead but are there any best practises that are employed when It comes to this ?
If it matters. The list is a list of Billing Descriptors (that show up on CC statements) therefore, the user will search these descriptors to find out which companies the descriptor belongs too.
Assuming your "billing descriptor" is a single column, probably of type (N)VARCHAR I would start with a very simple SAP HANA fuzzy search, e.g.:
SELECT top 100 SCORE() AS score, <more fields>
FROM <billing_documents>
WHERE CONTAINS(<bill_descr_col>, <user_input>, FUZZY(0.7))
ORDER BY score DESC;
Maybe this is already good enough when you want to apply your js library on the result set. If not, I would start to experiment with the similarCalculationMode option, like 'similarcalculationmode=substringsearch' etc. And I would always have a look at the response times, they can be higher when using some of the options.
Only if response times are to high, or many active concurrent users are using your query, I would try to create a fuzzy search index on your search column. If you need more search options, you can also create a fullext index.
But that all really depends on you use case, the values you want to compare etc.
There is a very comprehensive set of features and options for different use cases, check help.sap.com/hana/SAP_HANA_Search_Developer_Guide_en.pdf.
In a project we did a free style search on several address columns (name, surname, company name, post code, street) and we got response times of 100-200ms on ca 6 Mio records WITHOUT using any special indexes.

Breeze JS: Wildcard in Where

Is it possible to somehow use wildcard characters (* or ?) in breeze queries?
For example: I have a Search-Input-Field where I want people to be able to enter these characters so they can search for german names that have umlauts. Example M*ller for Müller or Mueller or Muller.
I already tried % since I hoped that the where-predicate (contains) would get translated to an SQL-LIKE-Statement.
The next thing I would do if nothing helps is to split the string and create different where-predicates that are and/or connected but I'm still hoping for a better solution.

How to approach string length constraints when localization is brought into the equation?

Once there was a search input.
It was responsible for filtering data in a table based on user input.
But this search input was special: it would not do anything unless a minimum of 3 characters was entered.
Not because it was lazy, but because it didn't make sense otherwise.
Everything was good until a new and strange (compared to English) language came to town.
It was Japanese and now the minimum string length of 3 was stupid and useless.
I lost the last few pages of that story. Does anyone remember how it ends?
In order to fix the issue, you obviously need to determine if user's input belongs to certain script(s). The most obvious way to do this is to use Unicode Regular Expressions:
var regexPattern = "[\\p{Katakana}\\p{Hiragana}\\p{Han}]+";
The only issue would be, that JavaScript does not support this kind of regular expressions out of the box. Anyway, you are lucky - there is a JS library called XRegExp and its Scripts add-on seems to exactly what you need. Now, the question is, whether you want to require at least three characters for non-Japanese or non-Chinese users, or do it otherwise - require at least three characters for certain scripts (Latin, Common, Cyrillic, Greek and Hebrew) while allowing any other to be searched on one character. I'd suggest the second solution:
if (XRegExp('[\\p{Latin}\\p{Common}\\p{Cyrillic}\\p{Greek}\\p{Hebrew}]+').test(input)) {
// test for string length and call AJAX if the string is long enough
} else {
// call AJAX search method
}
You might want to pre-compile the regular expression for better performance, but that's basically it.
I guess it mainly depends on where you get that min length variable from. If it's hardcoded, you'd probably better use a dynamic internationalization module:
int.getMinStringLength(int.getCurrentLanguage())
Either you have a dynamic bindings framework such as AngularJS, or you update that module when the user changes the language.
Now maybe you'd want to sort your supported languages by using grouping attributes such as "verbose" and "condensed".

Match Phone Numbers Regardless of Formatting

I've written a query for Mongo to search for a phone number. The gotcha is the phone entry is a String rather than a Number. At first I thought it was working fine, however now I realize that if the query isn't formatted correctly it will not match.
So I guess my question is what's the easiest way of matching a phone number regardless of formatting?
Worst case scenario I use a $where statement and check equality by removing numbers from both the values and doing a regex match on that. Just wondering if there is a more optimal way of doing this?
I would store the phone numbers normalized (e.g. either stripped of non numeric chars, or formatted in a standard format) in the DB in the first place, since they are not already normalized, doing it on the fly for each search request will be expensive, so if you don't have too many entries already (e.g. if this is still all in development), a script that will normalize all entries in one shot (or in several batches during off peek hours if you have a production system) will be possible.
Then your where clause will just normalize the input, and then the search will be much easier.
Same goes for addresses by the way, you have to normalize the data to perform good search, or you'll have to develop some fuzzy matching algorithm, that is simply going to be slower. (and might take you more time than you think)

escape exactly what in javascript

Being a newbie in javascript I came to a situation where I need more information on escaping characters in a string.
Basically I know that in order to escape " I need to replace it with \" but what I don't know is for which characters I need to escape a particular string for. Is there a list of these "characters to escape"? or is it any character that is not a-zA-Z0-9 ?
In my situation, I don't have control over the content that is being displayed on my page. Users enter some text and save it. I then use a webservice to extract them from the database, build a json array of objects, then iterate the array when I need to display them. In this case, I have - naturally - no idea of what the text the user has entered and therefore for what characters I need to escape. I also use jQuery for this specific project (just in case it has a function I am not aware of, to do what I need)
Providing examples would be appreciated but I also want to learn the theory and logic behind it.
Hope someone can be of any help.
There's no need to escape everything that's not a-zA-Z0-9, take a look at this example:
http://www.c-point.com/javascript_tutorial/special_characters.htm
You may also want to check out this site which holds information about escaping string, especially URLs, etc. etc.
http://www.the-art-of-web.com/javascript/escape/

Categories