Fuzzy searching using regex

Fuzzy searching using regex - javascript

I have an object with a key as some name to be searched and value its score, which looks like:
{
'a': 344,
'apple': 345354,
'orange': 320,
'mango': 39999,
.
.
.
}
The map has around half a million keys. I need to do a fuzzy search to create a test auto suggest like functionality (which supports typos) for some text like ornge and it should match orange and also return all words where matches occur. Also, matches could occur anywhere in the string, not just at the beginning. How could I do this using regex? I was trying the following:
(?=.*\ornge\b).*
but it does not work. How could I do this?
Using regex might not be the best solution. Could you suggest an alternative method?

Using regex isn't going to be all that simple. I agree with #apple apple that it would make sense to look for a library. I was intrigued about how long it would take to stub out a basic one by hand so I whipped this up, feel free to use/improve on it to optimize for working with that larger list that you have.
The basic gist is that you set a threshold for similarity in length between the input and expected output, and a minimum letter match score that you want to aim for.
For each word in your set, you run a comparison function between that word and the input. In the comparison function, you test each letter of the input against the word to see if it exists - if so, remove that letter from the word and move to the next one (to avoid inflated scores from multiple matches to the same letter: i.e. the string aaaaaa would score a 6 against apple if the a isn't removed after the first match)
If you wanted to enable multiple suggestions, you could replace the while(++i < listlen) loop with wordlist.filter, and return all words that fall above a certain score threshold.
const wordlist = ["apple", "orange", "mango", "fruit", "banana", "kiwi", "grapefruit"],
listlen = wordlist.length,
wordLengthDifferential = 1,
scoreDifferential = 3
function compare(str, word){
let score = 0
str = str.split('')
word = word.split('')
while(str.length){
let idx = word.indexOf(str.pop())
if(idx > -1){
++score
word.splice(idx, 1)
}
}
return score
}
function getSuggestion(str){
let highScore = 0, suggestion = null, i = -1
while(++i < listlen){
let word = wordlist[i]
if(Math.abs(word.length - str.length) <= wordLengthDifferential) {
let score = compare(str, word)
console.log(str, word, score)
if(score > highScore && score >= scoreDifferential){
suggestion = word
highScore = score
}
}
}
return suggestion || "no relevant matches"
}
document.querySelector('button').onclick = e => console.log(getSuggestion(document.querySelector('input').value))
document.addEventListener('keydown', e => {
if(e.keyCode == 13) console.log(getSuggestion(document.querySelector('input').value))
})
<input type="text" /><button>Go</button><em>(or press enter)</em>

For text input inputVal and your object obj.:
let regexString = '';
inputVal.split('').forEach(char => regexString += char + '.*');
const rgx = new RegExp(regexString, 'i');
const result = Object.keys(obj).filter(key => key.match(rgx));

Related

Add spaces before Capital Letters then turn them to lowercase string

I'm trying to make a function that caps space in which it takes input like "iLikeSwimming" then it outputs "i like swimming".
This is my try:
function isUpper(str) {
return !/[a-z]/.test(str) && /[A-Z]/.test(str);
}
function capSpace(txt) {
var arr = Array.from(txt);
for (let i = 1; i < txt.length; i++){
if (isUpper(txt[i]) == true) {
arr.splice((i),0,' ')
}
}
return arr.join('').toString().toLowerCase();
}
It's good for strings with only one capital letter, however, it gets kind of weird with more than one.
Example Input and outputs:
Inputs:
capSpace("iLikeSwimming"); capSpace("helloWorld");
Outputs:
'i lik eswimming' 'hello world'
I'd really appreciate it if someone can point the issue with my code. I know there are other questions "similar" to this, but I'm trying to learn my mistake rather than just copying, I couldn't make sense of any of the other questions. Thank you!

The reason why it gets weird with strings that have more than 1 capital letter is that every time you find one, you add a blank space which makes the following indices increase in a single unit.
It's a simple workaround: just place a counter splitCount to keep track of how many spaces you've added and sum it with the index i to correct the indices.
function isUpper(str) {
return !/[a-z]/.test(str) && /[A-Z]/.test(str);
}
function capSpace(txt) {
var arr = Array.from(txt);
var splitCount = 0; // added a counter
for (let i = 1; i < txt.length; i++){
if (isUpper(txt[i]) === true) {
// sum it with i
arr.splice((i + splitCount),0,' ')
splitCount++; // increase every time you split
}
}
return arr.join('').toString().toLowerCase();
}
console.log(capSpace('iLikeSwimming'))

1) You can simply achieve this using regex and string replace method
const capSpace = (str) => str.replace(/([A-Z])/g, (match) => ` ${match.toLowerCase()}`);
console.log(capSpace("iLikeSwimming"));
console.log(capSpace("helloWorld"));
2) You can also do with split, map and join
const capSpace = (str) =>
str
.split("")
.map((s) => (/[A-Z]/.test(s) ? ` ${s.toLowerCase()}` : s))
.join("");
console.log(capSpace("iLikeSwimming"));
console.log(capSpace("helloWorld"));

Here's a simple one I made. Matches capital letters then replaces them.
const testString = "ILoveMoney";
function caps2Spaces(str) {
const matches = str.match(/[A-Z]/g);
for (const letter of matches) {
str = str.replace(letter, ` ${letter.toLowerCase()}`)
}
return str.trim();
}
console.log(caps2Spaces(testString));

How to replace words in string and ignore numbers in string

What i'm trying to achieve is searching value in array if it matches specific variable, then replace it with other variable (which is translation for this word), array value consist number and i need only to translate words without touching the number. Here is an example.
var arr = ["18 pages"];
var item = "18 pages";
var translate = "pagina's";
if(arr.indexOf(item) !== -1) {
arr[0] = arr[0].replace(/[^0-9 ]/, translate);
alert(arr);
}
Output is: 18 pagina'sages
Expected output: 18 pagina's
So it needs only to translate words and keep numbers.
How can i do it properly?

Another possibility for solving this problem would be to find multiple characters surrounded by boundaries and replace these. This would be the regular expression: \b[a-zA-Z]+\b. Here is an executable example:
let arr = ["18 pages"];
let item = "18 pages";
let translate = "pagina's";
if (arr.indexOf(item) !== -1) {
arr[0] = arr[0].replace(/\b[a-zA-Z]+\b/, translate);
console.log(arr);
}
If you run this snippet you will get the expected output: 18 pagina's.
Update:
Another alternative which would first match a character and then match any characters excluding numbers ([a-zA-Z][^0-9]+) would be able to replace more complex expressions and not just a single word.
Here is a running example based on the fiddle in https://jsfiddle.net/9uta5bo4/2/:
let arr = "18 pagina’s per minuut";
let item = "pagina’s per minuut";
let translate = "pages par minute";
if (arr.indexOf(item) !== -1) {
arr = arr.replace(/[a-zA-Z][^0-9]+/, translate);
console.log(arr);
}
If you run this fiddle you will see the output:
18 pages par minute
Another alternative to [a-zA-Z][^0-9]+ is [^0-9 ][^0-9]+ which finds any character which is not a number or space first and then anything which is not a number multiple times.

You're only matching a single character with [^0-9 ]. I suspect you want [^0-9 ]+.

let arr = ["18 pages"];
let item = "18 pages";
let translate = "pagina's";
if (arr.indexOf(item) !== -1) {
var res = arr[0].replace(/[a-zA-Z]+/g, translate); //replace only letters
var res2 = arr[0].replace(/[^0-9|\s]+/g, translate); // ecxlcude numbers and space
var res3 = arr[0].replace(/[^\d|\s]+/, translate); // ecxlcude numbers and space
console.log(res);
console.log(res2);
console.log(res3);
}

Checking user input against an array, act as a guess Javascript

I am a new aspiring dev and I am trying to figure out how to build a game of hangman using vanilla js only. I have put together a key event listner, and have got it to console log the inputs. I have also got it to print the letters pushed into a "letters guessed" array.
document.addEventListener("keypress", letterPressed);
function letterPressed(event) {
var letter = String.fromCharCode(event.keyCode);
guessedLetters.push(letter);
document.getElementById("lettersGuessed").innerHTML = guessedLetters;
console.log(guessedLetters)
}
I also have an array of choices of words
var wordList = ["Ravens", "Cardinals", "Falcons", "Bills",
"Panthers", "Bears", "Bengals", "Browns", "Cowboys",
"Broncos", "Lions", "Packers", "Texans", "Colts",
"Jaguars", "Cheifs", "Chargers", "Rams",
"Dolphins", "Vikings", "Patriots", "Saints",
"Giants", "Jets", "Raiders", "Eagles", "Steelers",
"Forty Niners", "Seahawks", "Buccaneers", "Titans",
"Redskins"];
and a for loop picking the random word from this array, converting it to "_" strings in the length of the word, and printing it to the html document in a div id of "spaces".
var wordBlanks = [];
var guessedLetters = [];
var randomWord = wordList[Math.floor(Math.random() * wordList.length)];
for (var i = 0; i < randomWord.length; i++) {
wordBlanks[i] = "_";
console.log(wordBlanks,randomWord);
document.getElementById("spaces").innerHTML = wordBlanks.join(" ");
};
Where would I even want to go from here? I want to check input from the keystrokes (or the letters guessed array, im not sure which would be best) against the word thats chosen and have the "_" strings reveal the correct guesses when guessed correctly.
My question is more regarding pointing me in the right direction. So, I can properly teach myself. Any words of advice?
Thank you!

You'll have to come to terms with the upper/lowercase issue first, but after that, something like this will work:
const randomWord = "BEARS";
const guessedLetters = ["S", "O", "E"];
const wordBlanks = randomWord.split('')
.map(letter => guessedLetters.indexOf(letter) >= 0 ? letter : "_")
.join(' ');
console.log(wordBlanks);

Instead of putting into the guessedLetters array, try to see if the presses letter can be found in the randomWord.
If so, find the position of the letter and let it replace the appropriate _ space in the wordBlank array.
Something like:
function letterPressed(event) {
var letter = String.fromCharCode(event.keyCode);
if (randomWord.indexOf(letter) >= 0)
wordBlank[(randomWord.indexOf(letter)] = letter;
console.log(wordBlank);
}
Note that a letter may have multiple occurences in a word.

RegExp "i" case insensitive VS toLowerCase() (javascript)

I'm hoping someone can explain to me why I need to use "toLowerCase()" if I'm already using a regular expression that is case insensitive "i".
The exercise is a pangram that can accept numbers and non-ascii characters, but all letters of the alphabet MUST be present in lower case, upper case, or mixed. I wasn't able to solve this exercise correctly until I added "toLowerCase()". This is one of the javascript exercises from exercism.io. Below is my code:
var Pangram = function (sentence) {
this.sentence = sentence;
};
Pangram.prototype.isPangram = function (){
var alphabet = "abcdefghijklmnopqrstuvwxyz", mustHave = /^[a-z]+$/gi,
x = this.sentence.toLowerCase(), isItValid = mustHave.test(x);
for (var i = 0; i < alphabet.length; i++){
if (x.indexOf(alphabet[i]) === -1 && isItValid === false){
return false;
}
}
return true;
};
module.exports = Pangram;

The regex may not be doing what you think it's doing. Here is your code commented with what's going on:
Pangram.prototype.isPangram = function (){
var alphabet = "abcdefghijklmnopqrstuvwxyz", mustHave = /^[a-z]+$/gi,
x = this.sentence.toLowerCase(), isItValid = mustHave.test(x);
// for every letter in the alphabet
for (var i = 0; i < alphabet.length; i++){
// check the following conditions:
// letter exists in the sentence (case sensitive)
// AND sentence contains at least one letter between a-z (start to finish, case insensitive)
if (x.indexOf(alphabet[i]) === -1 && isItValid === false){
return false;
}
}
return true;
}
The logic that is checking whether each letter is present has nothing to do with the regex, the two are serving separate purposes. In fact, based on your description of the problem, the regex will cause your solution to fail in some cases. For example, assume we have the string "abcdefghijklmnopqrstuvwxyz-". In that case your regex will test false even though this sentence should return true.
My advice would be to remove the regex, use toLowerCase on the sentence, and iterate through the alphabet checking if the sentence has each letter - which you seems to be the track you were on.
Below is a sample solution with some tests. Happy learning!
function isPangram (str) {
const alphabet = 'abcdefghijklmnopqrstuvwxyz'
const strChars = new Set(str.toLowerCase().split(''))
return alphabet.split('').every(char => strChars.has(char))
}
const tests = [
"abc",
"abcdefghijklmnopqrstuvwxyz",
"abcdefghijklmnopqRstuvwxyz",
"abcdefghijklmnopqRstuvwxyz-",
]
tests.forEach(test => {
console.log(test, isPangram(test))
})

It's because you're manually checking for lowercase letters:
if (x.indexOf(alphabet[i]) === -1)
alphabet[i] will be one of your alphabet string, which you have defined as lowercase.
It looks like you don't need the regex at all here, or at least it's not doing what you think it's doing. Since your regex only allows for alpha characters, it will fail if your sentence has any spaces.

Checking if combination of any amount of strings exists

I'm solving a puzzle and I have an idea of how to solve this problem, but I would like some guidance and hints.
Suppose I have the following, Given n amount of words to input, and m amount of word combos without spaces, I will have some functionality as the following.
4
this
is
my
dog
5
thisis // outputs 1
thisisacat // 0, since a or cat wasnt in the four words
thisisaduck // 0, no a or cat
thisismy // 1 this,is,my is amoung the four words
thisismydog // 1
My thoughts
First What I was thinking of doing is storing those first words into an array. After that, I check if any of those words is the first word of those 5 words
Example: check if this is in the first word thisis. It is! Great, now remove that this, from thisis to get simply just is, now delete the original string that corresponded to that equality and keep iterating over the left overs (now is,my,dog are available). If we can keep doing this process, until we get an empty string. We return 1, else return 0!
Are my thoughts on the right track? I think this would be a good approach (By the way I would like to implement this in javascript)

Sorting words from long to short may in some cases help to find a solution quicker, but it is not a guarantee. Sentences that contain the longest word might only have a solution if that longest word is not used.
Take for instance this test case:
Words: toolbox, stool, boxer
Sentence: stoolboxer
If "toolbox" is taken as a word in that sentence, then the remaining characters cannot be matched with other valid words. Yet, there is a solution, but only if the word "toolbox" is not used.
Solution with a Regular Expression
When regular expressions are allowed as part of the solution, then it is quite simple. For the above example, the regular expression would be:
^(toolbox|stool|boxer)*$
If a sentence matches that expression, it is a solution. If not, then not. This is quite straightforward, and doesn't really require an algorithm. All is done by the regular expression interpreter. Here is a snippet:
var words = ['this','is','a','string'];
var sentences = ['thisis','thisisastring','thisisaduck','thisisastringg','stringg'];
var regex = new RegExp('^(' + words.join('|') + ')*$');
sentences.forEach(sentence => {
// search returns a position. It should be 0:
console.log(sentence + ': ' + (sentence.search(regex) ? 'No' : 'Yes'));
});
But using regular expressions in an algorithm-challenge feels like cheating: you don't really write the algorithm, but rely on the regular expression implementation to do the job for you.
Without Regular Expressions
You could use this algorithm: first check whether a word matches at the start of the input sentence, and if so, remove that first occurrence from it. Then repeat this for the remaining part of the sentence. If this can be repeated until no characters are left over, you have a solution.
If characters are left over which cannot be matched with any word... well, then you cannot really conclude there is no solution for that sentence. It might be that some earlier made word choice was the wrong one, and there was an alternative. So to cope with that, your algorithm could backtrack and try other words.
This principle can be implemented through recursion. To gain memory-efficiency, you could leave the original sentence in-tact, and work with an index in that sentence instead.
The algorithm is implemented in arrow-function testString:
var words = ['this','is','a','string'];
var sentences = ['thisis','thisisastring','thisisaduck','thisisastringg','stringg'];
var testString = (words, str, i = 0) =>
i >= str.length || words.some( word =>
str.substr(i, word.length) == word && testString(words, str, i + word.length)
);
sentences.forEach(sentence => {
console.log(sentence + ': ' + (testString(words, sentence) ? 'Yes' : 'No'));
});
Or, the same in non-arrow-function syntax:
var words = ['this','is','a','string'];
var sentences = ['thisis','thisisastring','thisisaduck','thisisastringg','stringg'];
var testString = function (words, str, i = 0) {
return i >= str.length || words.some(function (word) {
return str.substr(i, word.length) == word
&& testString(words, str, i + word.length);
});
}
sentences.forEach(function (sentence) {
console.log(sentence + ': ' + (testString(words, sentence) ? 'Yes' : 'No'));
});
... and without some(), forEach() or ternary operator:
var words = ['this','is','a','string'];
var sentences = ['thisis','thisisastring','thisisaduck','thisisastringg','stringg'];
function testString (words, str, i = 0) {
if (i >= str.length) return true;
for (var k = 0; k < words.length; k++) {
var word = words[k];
if (str.substr(i, word.length) == word
&& testString(words, str, i + word.length)) {
return true;
}
}
}
for (var n = 0; n < sentences.length; n++) {
var sentence = sentences[n];
if (testString(words, sentence)) {
console.log(sentence + ': Yes');
} else {
console.log(sentence + ': No');
}
}

Take the 4 words, put them into a regex.
Use that regex to split each string.
Take the length of the resulting array (subtract one for the initial length of one).
var size = 'thisis'.split(/this|is|my|dog/).length - 1
Or if your list of words is an array
var search = new RegExp(words.join('|'))
var size = 'thisis'.split(search).length - 1
Either way you are splitting up the string by the list of words you have defined.
You can sort the words by length to ensure that larger words are matched first by
words.sort(function (a, b) { return b.length - a.length })

Here is the solution for anyone interested
var input = ['this','is','a','string']; // This will work for any input, but this is a test case
var orderedInput = input.sort(function(a,b){
return b.length - a.length;
});
var inputRegex = new RegExp(orderedInput.join('|'));
// our combonation of words can be any size in an array, just doin this since prompt in js is spammy
var testStrings = ['thisis','thisisastring','thisisaduck','thisisastringg','stringg'];
var foundCombos = (regex,str) => !str.split(regex).filter(str => str.length).length;
var finalResult = testStrings.reduce((all,str)=>{
all[str] = foundCombos(inputRegex,str);
if (all[str] === true){
all[str] = 1;
}
else{
all[str] = 0;
}
return all;
},{});
console.log(finalResult);

We Keep Coding

JavaScript is the programming language of the Web.

Fuzzy searching using regex - javascript

For text input inputVal and your object obj.: let regexString = ''; inputVal.split('').forEach(char => regexString += char + '.*'); const rgx = new RegExp(regexString, 'i'); const result = Object.keys(obj).filter(key => key.match(rgx));

Related

Add spaces before Capital Letters then turn them to lowercase string

How to replace words in string and ignore numbers in string

Checking user input against an array, act as a guess Javascript

RegExp "i" case insensitive VS toLowerCase() (javascript)

Checking if combination of any amount of strings exists

Categories

Resources