How to find total possible values from length and characters? - javascript

I'm totally not a Math whiz kid here, but have put together a function with the great help of StackOverflow (and a lot of trial and error) that generates a random serial number from a Formula, group of Letters/Numbers, and array (so as to not duplicate values).
So, my current formula is as follows:
$.extend({
generateSerial: function(formula, chrs, checks) {
var formula = formula && formula != "" ? formula : 'XXX-XXX-XXX-XXX-XXX', // Default Formula to use, should change to what's most commonly used!
chrs = chrs && chrs != "" ? chrs : "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789", // Default characters to randomize, if not defined!
len = (formula.match(/X/g) || []).length,
indices = [],
rand;
// Get all "-" char indexes
for(var i=0; i < formula.length; i++) {
if (formula[i] === "-") indices.push(i);
}
do {
rand = Array(len).join().split(',').map(function() {
return chrs.charAt(Math.floor(Math.random() * chrs.length));
}).join('');
// Rebuild string!
if (indices && indices.length > 0)
{
for(var x=0; x < indices.length; x++)
rand = rand.insert(indices[x], '-');
}
} while (checks && $.inArray(rand, checks) !== -1);
return rand;
}
});
Ok, so, what I need to be able to do is to find total possible values and make sure that it is possible to generate a unique serial number before actually doing so.
For example:
var num = $.generateSerial('XX', 'AB', new Array('AB', 'BA', 'AA', 'BB'));
This will cause the code to do an infinite loop, since there are no more possibilties here, other than the ones being excluded from the extension. So this will cause browser to crash. What I need to be able to do here is to be able to get the number of possible unique values here and if it is greater than 0, continue, otherwise, don't continue, maybe an alert for an error would be fine.
Also, keep in mind, could also do this in a loop so as to not repeat serials already generated:
var currSerials = [];
for (var x = 0; x < 5; x++)
{
var output = $.generateSerial('XXX-XXX-XXX', '0123456789', currSerials);
currSerials.push(output);
}
But the important thing here, is how to get total possible unique values from within the generateSerial function itself? We have the length, characters, and exclusions array also in here (checks). This would seem more like a math question, and I'm not expert in Math. Could use some help here.
Thanks guys :)
Here is a jsFiddle of it working nicely because there are more possible choices than 16: http://jsfiddle.net/qpw66bwb/1/
And here is a jsFiddle of the problem I am facing: Just click the "Generate Serials" button to see the problem (it continuously loops, never finishes), it wants to create 16 serials, but 16 possible choices are not even possible with 2 characters and only using A and B characters: http://jsfiddle.net/qpw66bwb/2/
I need to catch the loop here and exit out of it, if it is not able to generate a random number somehow. But how?

The number of possible serials is len * chrs.length, assuming all the characters in chrs are different. The serial contains len characters to fill in randomly, and chrs.length is the number of possible characters in each position of that.

Related

Javascript Help - selfDividingNumbers Algorithm producing all 0's

Greetings Stack Overflow!
First off, this is my first question!
I am trying to solve the selfDividingNumbers algorithm and I ran into this interesting problem. This function is supposed to take a range of numbers to check if they are self dividing.
Self Dividing example:
128 is a self-dividing number because
128 % 1 == 0, 128 % 2 == 0, and 128 % 8 == 0.
My attempt with Javascript.
/*
selfDividingNumbers( 1, 22 );
*/
var selfDividingNumbers = function(left, right) {
var output = [];
while(left <= right){
// convert number into an array of strings, size 1
var leftString = left.toString().split();
// initialize digit iterator
var currentDigit = leftString[0];
for(var i = 0; i < leftString.length; i++){
currentDigit = parseInt(leftString[i])
console.log( left % currentDigit );
}
// increment lower bound
left++;
}
return output
};
When comparing the current lower bound to the current digit of the lower bound, left % currentDigit it always produces zero! I figure this is probably a type error but I am unsure of why and would love for someone to point out why!
Would also like to see any other ideas to avoid this problem!
I figured this was a good chance to get a better handle on Javascript considering I am clueless as to why my program is producing this output. Any help would be appreciated! :)
Thanks Stack Overflow!
Calling split() isn't buying you anything. Remove it and you'll get the results you expect. You still have to write the code to populate output though.
The answer by #Joseph may fix your current code, but I think there is a potentially easier way to go about doing this. Consider the following script:
var start = 128;
var num = start;
var sd = true;
while (num > 0) {
var last = num % 10;
if (start % last != 0) {
sd = false;
break;
}
num = Math.floor(num / 10);
}
if (sd) {
print("Is self dividing");
}
else {
print("Is NOT self dividing");
}
Demo
To test each digit in the number for its ability to cleanly divide the original number, you can simply use a loop. In each iteration, check num % 10 to get the current digit, and then divide the number by ten. If we never see a digit which can not divide evenly, then the number is not self dividing, otherwise it is.
So the string split method takes the string and returns an array of string parts. The method expects a parameter, however, the dividing element. If no dividing element is provided, the method will return only one part, the string itself. In your case, what you probably intended was to split the string into individual characters, which would mean the divider would be the empty string:
var leftString = left.toString().split('');
Since you are already familiar with console.log, note that you could also use it to debug your program. If you are confused about the output of left % currentDigit, one thing you could try is logging the variables just before the call,
console.log(typeof left, left, typeof currentDigit, currentDigit)
which might give you ideas about where to look next.

Punch/Combine multiple strings into a single (shortest possible) string that includes all the chars of each strings in forward direction

My purpose is to punch multiple strings into a single (shortest) string that will contain all the character of each string in a forward direction. The question is not specific to any language, but more into the algorithm part. (probably will implement it in a node server, so tagging nodejs/javascript).
So, to explain the problem:
Let's consider I have few strings
["jack", "apple", "maven", "hold", "solid", "mark", "moon", "poor", "spark", "live"]
The Resultant string should be something like:
"sjmachppoalidveonrk"
jack: sjmachppoalidveonrk
apple: sjmachppoalidveonrk
solid: sjmachppoalidveonrk
====================================>>>> all in the forward direction
These all are manual evaluation and the output may not 100% perfect in the example.
So, the point is all the letters of each string have to exist in the output in
FORWARD DIRECTION (here the actual problem belongs), and possibly the server will send the final strings and numbers like 27594 will be generated and passed to extract the token, in the required end. If I have to punch it in a minimal possible string it would have much easier (That case only unique chars are enough). But in this case there are some points:
Letters can be present multiple time, though I have to reuse any
letter if possible, eg: for solid and hold o > l > d can be
reused as forward direction but for apple (a > p) and spark
(p > a) we have to repeat a as in one case it appears before p
for apple, and after p for sparks so either we need to repeat
a or p. Even, we cannot do p > a > p as it will not cover both the case
because we need two p after a for apple
We directly have no option to place a single p and use the same
index twice in a time of extract, we need multiple p with no option
left as the input string contains that
I am (not) sure, that there is multiple outputs possible for a set of
strings. but the concern is it should be minimal in length,
the combination doesn't matter if its cover all the tokens in a forward direction. all (or one ) outputs of minimal possible length
need to trace.
Adding this point as an EDIT to this post. After reading the comments and knowing that it's already an existing
problem is known as shortest common supersequence problem we can
define that the resultant string will be the shortest possible
string from which we can re generate any input string by simply
removing some (0 to N) chars, this is same as all inputs can be found in a forward direction in the resultant string.
I have tried, by starting with an arbitrary string, and then made an analysis of next string and splitting all the letters, and place them accordingly, but after some times, it seems that current string letters can be placed in a better way, If the last string's (or a previous string's) letters were placed according to the current string. But again that string was analysed and placed based on something (multiple) what was processed, and placing something in the favor of something that is not processed seems difficult because to that we need to process that. Or might me maintaining a tree of all processed/unprocessed tree will help, building the building the final string? Any better way than it, it seems a brute force?
Note: I know there are a lot of other transformation possible, please try not to suggest anything else to use, we are doing a bit research on it.
I came up with a somewhat brute force method. This way finds the optimal way to combine 2 words then does it for each element in the array.
This strategy works by trying finding the best possible way to combine 2 words together. It is considered the best by having the fewest letters. Each word is fed into an ever growing "merged" word. Each time a new word is added the existing word is searched for a matching character which exists in the word to be merged. Once one is found both are split into 2 sets and attempted to be joined (using the rules at hand, no need 2 add if letter already exists ect..). The strategy generally yields good results.
The join_word method takes 2 words you wish to join, the first parameter is considered to be the word you wish to place the other into. It then searches for the best way to split into and word into 2 separate parts to merge together, it does this by looking for any shared common characters. This is where the splits_on_letter method comes in.
The splits_on_letter method takes a word and a letter which you wish to split on, then returns a 2d array of all the possible left and right sides of splitting on that character. For example splits_on_letter('boom', 'o') would return [["b","oom"],["bo","om"],["boo","m"]], this is all the combinations of how we could use the letter o as a split point.
The sort() at the beginning is to attempt to place like elements together. The order in which you merge the elements generally effects the results length. One approach I tried was to sort them based upon how many common letters they used (with their peers), however the results were varying. However in all my tests I had maybe 5 or 6 different word sets to test with, its possible with a larger, more varying word arrays you might find different results.
Output is
spmjhooarckpplivden
var words = ["jack", "apple", "maven", "hold", "solid", "mark", "moon", "poor", "spark", "live"];
var result = minify_words(words);
document.write(result);
function minify_words(words) {
// Theres a good sorting method somewhere which can place this in an optimal order for combining them,
// hoever after quite a few attempts i couldnt get better than just a regular sort... so just use that
words = words.sort();
/*
Joins 2 words together ensuring each word has all its letters in the result left to right
*/
function join_word(into, word) {
var best = null;
// straight brute force each word down. Try to run a split on each letter and
for(var i=0;i<word.length;i++) {
var letter = word[i];
// split our 2 words into 2 segments on that pivot letter
var intoPartsArr = splits_on_letter(into, letter);
var wordPartsArr = splits_on_letter(word, letter);
for(var p1=0;p1<intoPartsArr.length;p1++) {
for(var p2=0;p2<wordPartsArr.length;p2++) {
var intoParts = intoPartsArr[p1], wordParts = wordPartsArr[p2];
// merge left and right and push them together
var result = add_letters(intoParts[0], wordParts[0]) + add_letters(intoParts[1], wordParts[1]);
if(!best || result.length <= best.length) {
best = result;
}
}
}
}
// its possible that there is no best, just tack the words together at that point
return best || (into + word);
}
/*
Splits a word at the index of the provided letter
*/
function splits_on_letter(word, letter) {
var ix, result = [], offset = 0;;
while((ix = word.indexOf(letter, offset)) !== -1) {
result.push([word.substring(0, ix), word.substring(ix, word.length)]);
offset = ix+1;
}
result.push([word.substring(0, offset), word.substring(offset, word.length)]);
return result;
}
/*
Adds letters to the word given our set of rules. Adds them starting left to right, will only add if the letter isnt found
*/
function add_letters(word, addl) {
var rIx = 0;
for (var i = 0; i < addl.length; i++) {
var foundIndex = word.indexOf(addl[i], rIx);
if (foundIndex == -1) {
word = word.substring(0, rIx) + addl[i] + word.substring(rIx, word.length);
rIx += addl[i].length;
} else {
rIx = foundIndex + addl[i].length;
}
}
return word;
}
// For each of our words, merge them together
var joinedWords = words[0];
for (var i = 1; i < words.length; i++) {
joinedWords = join_word(joinedWords, words[i]);
}
return joinedWords;
}
A first try, not really optimized (183% shorter):
function getShort(arr){
var perfect="";
//iterate the array
arr.forEach(function(string){
//iterate over the characters in the array
string.split("").reduce(function(pos,char){
var n=perfect.indexOf(char,pos+1);//check if theres already a possible char
if(n<0){
//if its not existing, simply add it behind the current
perfect=perfect.substr(0,pos+1)+char+perfect.substr(pos+1);
return pos+1;
}
return n;//continue with that char
},-1);
})
return perfect;
}
In action
This can be improved trough simply running the upper code with some variants of the array (200% improvement):
var s=["jack",...];
var perfect=null;
for(var i=0;i<s.length;i++){
//shift
s.push(s.shift());
var result=getShort(s);
if(!perfect || result.length<perfect.length) perfect=result;
}
In action
Thats quite close to the minimum number of characters ive estimated ( 244% minimization might be possible in the best case)
Ive also wrote a function to get the minimal number of chars and one to check if a certain word fails, you can find them here
I have used the idea of Dynamic programming to first generate the shortest possible string in forward direction as stated in OP. Then I have combined the result obtained in the previous step to send as a parameter along with the next String in the list. Below is the working code in java. Hope this would help to reach the most optimal solution, in case my solution is identified to be non optimal. Please feel free to report any countercases for the below code:
public String shortestPossibleString(String a, String b){
int[][] dp = new int[a.length()+1][b.length()+1];
//form the dynamic table consisting of
//length of shortest substring till that points
for(int i=0;i<=a.length();i++){
for(int j=0;j<=b.length();j++){
if(i == 0)
dp[i][j] = j;
else if(j == 0)
dp[i][j] = i;
else if(a.charAt(i-1) == b.charAt(j-1))
dp[i][j] = 1+dp[i-1][j-1];
else
dp[i][j] = 1+Math.min(dp[i-1][j],dp[i][j-1]);
}
}
//Backtrack from here to find the shortest substring
char[] sQ = new char[dp[a.length()][b.length()]];
int s = dp[a.length()][b.length()]-1;
int i=a.length(), j=b.length();
while(i!=0 && j!=0){
// If current character in a and b are same, then
// current character is part of shortest supersequence
if(a.charAt(i-1) == b.charAt(j-1)){
sQ[s] = a.charAt(i-1);
i--;
j--;
s--;
}
else {
// If current character in a and b are different
if(dp[i-1][j] > dp[i][j-1]){
sQ[s] = b.charAt(j-1);
j--;
s--;
}
else{
sQ[s] = a.charAt(i-1);
i--;
s--;
}
}
}
// If b reaches its end, put remaining characters
// of a in the result string
while(i!=0){
sQ[s] = a.charAt(i-1);
i--;
s--;
}
// If a reaches its end, put remaining characters
// of b in the result string
while(j!=0){
sQ[s] = b.charAt(j-1);
j--;
s--;
}
return String.valueOf(sQ);
}
public void getCombinedString(String... values){
String sSQ = shortestPossibleString(values[0],values[1]);
for(int i=2;i<values.length;i++){
sSQ = shortestPossibleString(values[i],sSQ);
}
System.out.println(sSQ);
}
Driver program:
e.getCombinedString("jack", "apple", "maven", "hold",
"solid", "mark", "moon", "poor", "spark", "live");
Output:
jmapphsolivecparkonidr
Worst case time complexity of the above solution would be O(product of length of all input strings) when all strings have all characters distinct and not even a single character matches between any pair of strings.
Here is an optimal solution based on dynamic programming in JavaScript, but it can only get through solid on my computer before it runs out of memory. It differs from #CodeHunter's solution in that it keeps the entire set of optimal solutions after each added string, not just one of them. You can see that the number of optimal solutions grows exponentially; even after solid there are already 518,640 optimal solutions.
const STRINGS = ["jack", "apple", "maven", "hold", "solid", "mark", "moon", "poor", "spark", "live"]
function map(set, f) {
const result = new Set
for (const o of set) result.add(f(o))
return result
}
function addAll(set, other) {
for (const o of other) set.add(o)
return set
}
function shortest(set) { //set is assumed non-empty
let minLength
let minMatching
for (const s of set) {
if (!minLength || s.length < minLength) {
minLength = s.length
minMatching = new Set([s])
}
else if (s.length === minLength) minMatching.add(s)
}
return minMatching
}
class ZipCache {
constructor() {
this.cache = new Map
}
get(str1, str2) {
const cached1 = this.cache.get(str1)
if (!cached1) return undefined
return cached1.get(str2)
}
set(str1, str2, zipped) {
let cached1 = this.cache.get(str1)
if (!cached1) {
cached1 = new Map
this.cache.set(str1, cached1)
}
cached1.set(str2, zipped)
}
}
const zipCache = new ZipCache
function zip(str1, str2) {
const cached = zipCache.get(str1, str2)
if (cached) return cached
if (!str1) { //str1 is empty, so only choice is str2
const result = new Set([str2])
zipCache.set(str1, str2, result)
return result
}
if (!str2) { //str2 is empty, so only choice is str1
const result = new Set([str1])
zipCache.set(str1, str2, result)
return result
}
//Both strings start with same letter
//so optimal solution must start with this letter
if (str1[0] === str2[0]) {
const zipped = zip(str1.substring(1), str2.substring(1))
const result = map(zipped, s => str1[0] + s)
zipCache.set(str1, str2, result)
return result
}
//Either do str1[0] + zip(str1[1:], str2)
//or str2[0] + zip(str1, str2[1:])
const zip1 = zip(str1.substring(1), str2)
const zip2 = zip(str1, str2.substring(1))
const test1 = map(zip1, s => str1[0] + s)
const test2 = map(zip2, s => str2[0] + s)
const result = shortest(addAll(test1, test2))
zipCache.set(str1, str2, result)
return result
}
let cumulative = new Set([''])
for (const string of STRINGS) {
console.log(string)
const newCumulative = new Set
for (const test of cumulative) {
addAll(newCumulative, zip(test, string))
}
cumulative = shortest(newCumulative)
console.log(cumulative.size)
}
console.log(cumulative) //never reached

Hacker Rank Annagrams

I am trying to solve the problem described here with JavaScript...
https://www.hackerrank.com/challenges/ctci-making-anagrams
I have to output the number of letters that would need to be removed from two strings in order for there to only be matching letters (compare the two strings for matching letters and total the letters that don't match)
for example...
string a = cbe
string b = abc
the only matching letter is between both strings is the two c's so I would be removing 4 letters (beab).
My code works, but it seems to keep timing out. If I download the individual test case instances, I seem to fail when variables a and b are set to large strings. If I test these individually, I seem to get the right output but i still also get the message "Terminated due to timeout".
I'm thinking it might be obvious to someone why my code timesout. I'm not sure it's the most elegant way of solving the problem but I'd love to get it working. Any help would be much appreciated...
function main() {
var a = readLine();
var b = readLine();
var arraya = a.split('');
var arrayb = b.split('');
var arraylengths = arraya.length + arrayb.length;
//console.log(arraylengths);
if (arraya.length <= arrayb.length) {
var shortestarray = arraya;
var longestarray = arrayb;
} else {
var shortestarray = arrayb;
var longestarray = arraya;
}
var subtract = 0;
for (x = 0; x < shortestarray.length; x++) {
var theletter = shortestarray[x];
var thenumber = x;
if (longestarray.indexOf(theletter, 0) > -1) {
var index = longestarray.indexOf(theletter, 0);
longestarray.splice(index, 1);
subtract = subtract + 2;
}
}
var total = arraylengths - subtract;
console.log(total);
}
Your algorithm is good. It's straight forward and easy to understand.
There are certain things you can do to improve the performance of your code.
You don't have to calculate the indexOf operation twice. you can reduce it to one.
the splice operation is the costliest operation because the JS engine has to delete the element from an array and reassign the indexes of all the elements.
A point to be noted here is that the JS engine does an extra step of correcting the index of the array, which is not required for your purpose. So you can safely remove longestarray.splice(index, 1); and replace it with delete longestarray[index]
Here is a snippet which will increase your performance of the code without changing your logic
for (var x = 0; x < shortestarray.length; x++) {
var theletter = shortestarray[x];
var thenumber = longestarray.indexOf(theletter, 0); // <-- check only once
if (thenumber > -1) {
var index = thenumber;
delete longestarray[index]; // <-- less costlier than splice
subtract = subtract + 2;
}
}
Note: I am not suggesting you to use delete for all the cases. It's useful here because you are not going to do much with the array elements after the element is deleted.
All the best. Happy Coding
I would suggest you hashing. make the characters of string key and its numbers of occurrences value. Do the same for both strings. After that take string 1 and match the count of its every character with the count of same character in string then calculate the difference in the number of occurrences of the same character and delete that character till the difference becomes 0 and count that how many times you performed delete operation.
ALGORITHM:
step 1: Let arr1[255]= an integer array for storing the count of string1[i]
and initialized to zero
ex: string1[i]='a', then arr1[97]=1, because ASCII value of a is 97
and its count is 1. so we made hash table for arr1 where key is
ASCII value of character and value is its no of occurrences.
step 2: Now declare an another array of same type and same size for string 2
step 3: For i=0 to length(string1):
do arr1[string1[i]]++;
step 4: For i=0 to length(string2):
do arr2[string2[i]]++;
step 5: Declare an boolean char_status[255] array to check if the
character is visited or not during traversing and initialize it to
false
step 6: set count=0;
step 7: For i=0 to length(string1):
if(char_status[string1[i]]==false):
count=count+abs(arr1[string1[i]]-arr2[string1[i]])
char_status[string1[i]]=true
step 8: For i=0 to length(string2):
if(char_status[string2[i]]==false):
count=count+abs(arr1[string2[i]]-arr2[string2[i]])
char_status[string2[i]]=true
step 9: print count
I have applied this algo just now and passed all test cases. You may improve this algo more if you have time.

Find out the common parts of all the strings

I have a large array have almost 10000 strings, I want to find all the common part (which is longer than 3 chars)in these strings and get the occurrence of these parts.
I implement with my simple method with javascript, but it really cost a lot of time, even I did some optimization on that, for a short array of 1000 strings, it cost about 8s for Firefox and 12s for Chrome.
So I wonder if there are any related technology or algorithm, as I think it is really a common problem which could be raised in many application.
Build an array of all possible substrings, sort them and then look for blocks of consecutive equal strings.
The implementation below looks for suffixes of a certain length and imposes a minimal number of matches. It is not clear what you want exactly, but you need some constraints. It is easy to look for the longest common suffixes, but if you just want common suffixes, what does that mean? Are 20 occurences of a 4-character string better than 10 occurrences of a 5-character string?
Also, note that the code below does not check for overlapping strings. If you look for matches of length 4 and have 30 words with "green" in it, the result will contain both "gree" and "reen".
It might not be what you want, but it should be easy to adapt. And it's reasonably fast. On 10,000 randomly generated strings with about 30 chars each, it takes less than a second to find common substrings of length 10 and maybe 4 seconds for 1000,000 strings.
Anyway, here goes:
/*
* Return an array of all substrings of the given length
* that occur at least mincount times in all the strings in
* the input array strings.
*/
function substrings(strings, length, mincount) {
var suffix = [];
var res = [];
for (var i = 0; i < strings.length; i++) {
var s = strings[i];
for (var j = 0; j < s.length - length + 1; j++) {
suffix.push(s.substr(j, length));
}
}
suffix.sort();
suffix.push("");
var last = "";
var count = 1;
for (var i = 0; i < suffix.length; i++) {
var s = suffix[i];
if (s == last) {
count++;
} else {
if (count >= mincount) res.push(last);
count = 1;
}
last = s;
}
return res;
}
Check out Crossfilter http://square.github.io/crossfilter/ It will do whatever kind of map-reduce you want. Javascript can be very slow for searching through big messes, though. Depending on your constraints, and how that 10,000 strings will grow over time, you might think of an RDMS, like MySQL, since they are designed for this type of thing.
I had a slightly different issue where I needed to find the common prefixes across each word not just common substrings which could be in the middle or end of the word. This function will return the common prefixes across a set of words.
function findCommonPrefixes (words, min) {
const result = new Set();
for (const word of words) {
let partial = word.toLowerCase();
do {
const otherWords = words.filter(w => w !== word).map(w => w.toLowerCase());
for (const word of otherWords) {
if (word.includes(partial)) {
result.add(partial);
partial = '';
break;
}
}
if (partial) {
partial = partial.slice(0, (partial.length - 1))
}
} while (partial.length && partial.length >= min)
}
return Array.from(result);
}

Allow faults in typing

I want to make a quiz and the user should type the right answer.
Let's say the answer is correct if the answer matches 90%. For example, if the user types
Britney Spers instead of Britney Spears, the answer should be right.
I searched for Javascript functions to determine how accurate the answer is, I found some interesting functions for PHP, Ruby etc, but I need it in JavaScript.
Has anybody experience with these kind of algorhitms?
Thank you if you answer :)
You're looking for the edit distance (aka Levenshtein distance). Under this scheme, the distance between two strings is the number of insertions, deletions, or substitutions required to make the strings match. For example, if the right answer is "oranges", then:
"oranges" has a distance of 0 (they are the same word)
"orange" has a distance of 1 (delete s)
"roranger" has a distance of 2 (insert r, substitute s -> r)
"sponges" has a distance of 3 (substitute o -> s, substitute r -> p, substitute o -> a)
"" has a distance of 7 (insert every letter in oranges)
A simple algorithm for it in Javascript would look like this (adapted and modified from this gist):
function(a, b){
// Return the number of characters in the other
// string if either string is blank.
if(a.length == 0) return b.length;
if(b.length == 0) return a.length;
// Otherwise, let's make a matrix to represent the possible choices
// we can take.
var matrix = [];
var i;
for(i = 0; i <= b.length; i++){
matrix[i] = [i];
}
var j;
for(j = 0; j <= a.length; j++){
matrix[0][j] = j;
}
for(i = 1; i <= b.length; i++){
for(j = 1; j <= a.length; j++){
if(b.charAt(i-1) == a.charAt(j-1)){
matrix[i][j] = matrix[i-1][j-1];
} else {
matrix[i][j] = Math.min(matrix[i-1][j-1] + 1, // substitution
Math.min(matrix[i][j-1] + 1, // insertion
matrix[i-1][j] + 1)); // deletion
}
}
}
return matrix[b.length][a.length];
};
One problem with your question is that the examples you wrote about what you're looking for (e.g. "matches 90%" or "accuracy of the answer") are not well-defined metrics.
There are a lot of ways an answer can be wrong. For example, let's say the right answer is "apple". Which of these should be accepted?
"APPLE" (wrong capitalization)
"ppple" (misspelled)
"apples" (plural, but you wanted the singular)
"Fuji apple" (too specific)
"fruit" (too broad)
and so on. Deciding which of these should be accepted is beyond the power of a simple edit-distance algorithm and will require heavier lifting, like NLP.
You're looking for an edit distance algorithm. Basically, you want to see how many character changes (add/delete/replace) it will take to get from one string to another. Of course now you have to have a dictionary of target strings to find the distance to.
http://en.wikipedia.org/wiki/Edit_distance
More specifically : http://en.wikipedia.org/wiki/Levenshtein_distance
The edit distance between Britney Spers and Britney Spears would be one: insert 'a'.

Categories