Allow faults in typing - javascript

I want to make a quiz and the user should type the right answer.
Let's say the answer is correct if the answer matches 90%. For example, if the user types
Britney Spers instead of Britney Spears, the answer should be right.
I searched for Javascript functions to determine how accurate the answer is, I found some interesting functions for PHP, Ruby etc, but I need it in JavaScript.
Has anybody experience with these kind of algorhitms?
Thank you if you answer :)

You're looking for the edit distance (aka Levenshtein distance). Under this scheme, the distance between two strings is the number of insertions, deletions, or substitutions required to make the strings match. For example, if the right answer is "oranges", then:
"oranges" has a distance of 0 (they are the same word)
"orange" has a distance of 1 (delete s)
"roranger" has a distance of 2 (insert r, substitute s -> r)
"sponges" has a distance of 3 (substitute o -> s, substitute r -> p, substitute o -> a)
"" has a distance of 7 (insert every letter in oranges)
A simple algorithm for it in Javascript would look like this (adapted and modified from this gist):
function(a, b){
// Return the number of characters in the other
// string if either string is blank.
if(a.length == 0) return b.length;
if(b.length == 0) return a.length;
// Otherwise, let's make a matrix to represent the possible choices
// we can take.
var matrix = [];
var i;
for(i = 0; i <= b.length; i++){
matrix[i] = [i];
}
var j;
for(j = 0; j <= a.length; j++){
matrix[0][j] = j;
}
for(i = 1; i <= b.length; i++){
for(j = 1; j <= a.length; j++){
if(b.charAt(i-1) == a.charAt(j-1)){
matrix[i][j] = matrix[i-1][j-1];
} else {
matrix[i][j] = Math.min(matrix[i-1][j-1] + 1, // substitution
Math.min(matrix[i][j-1] + 1, // insertion
matrix[i-1][j] + 1)); // deletion
}
}
}
return matrix[b.length][a.length];
};
One problem with your question is that the examples you wrote about what you're looking for (e.g. "matches 90%" or "accuracy of the answer") are not well-defined metrics.
There are a lot of ways an answer can be wrong. For example, let's say the right answer is "apple". Which of these should be accepted?
"APPLE" (wrong capitalization)
"ppple" (misspelled)
"apples" (plural, but you wanted the singular)
"Fuji apple" (too specific)
"fruit" (too broad)
and so on. Deciding which of these should be accepted is beyond the power of a simple edit-distance algorithm and will require heavier lifting, like NLP.

You're looking for an edit distance algorithm. Basically, you want to see how many character changes (add/delete/replace) it will take to get from one string to another. Of course now you have to have a dictionary of target strings to find the distance to.
http://en.wikipedia.org/wiki/Edit_distance
More specifically : http://en.wikipedia.org/wiki/Levenshtein_distance
The edit distance between Britney Spers and Britney Spears would be one: insert 'a'.

Related

Implementing LLL algorithm as been said on Wikipedia, but getting into serious issues

I am not sure my issue is related to programming or related to concept of LLL algorithm and what has been mentioned on Wikipedia.
I decided to implement LLL algorithm as it has been written on Wikipedia (step-by-step / line-by-line) to actually learn the algorithm and make sure it is truly working but I am getting unexpected or invalid results.
So, I used JavaScript (programming language) and node.js (JavaScript engine) to implement it and this is the git repository to get the complete code.
Long story short, value of K gets out of range, for example when we have only 3 vectors (array size is 3, thus maximum value of index would be 2), but k becomes 3 and it is nonsense.
My code is step-by-step (line-by-line) implementation of the algorithm mentioned on Wikipedia and what I did was only implementing it. So I don't what is the issue.
// ** important
// {b} set of vectors are denoted by this.matrix_before
// {b*} set of vectors are denoted by this.matrix_after
calculate_LLL() {
this.matrix_after = new gs(this.matrix_before, false).matrix; // initialize after vectors: perform Gram-Schmidt, but do not normalize
var flag = false; // invariant
var k = 1;
while (k <= this.dimensions && !flag) {
for (var j = k - 1; j >= 0; j--) {
if (Math.abs(this.mu(k, j)) > 0.5) {
var to_subtract = tools.multiply(Math.round(this.mu(k, j)), this.matrix_before[j], this.dimensions);
this.matrix_before[k] = tools.subtract(this.matrix_before[k], to_subtract, this.dimensions);
this.matrix_after = new gs(this.matrix_before, false).matrix; // update after vectors: perform Gram-Schmidt, but do not normalize
}
}
if (tools.dot_product(this.matrix_after[k], this.matrix_after[k], this.dimensions) >= (this.delta - Math.pow(this.mu(k, k - 1), 2)) * tools.dot_product(this.matrix_after[k - 1], this.matrix_after[k - 1], this.dimensions)) {
if (k + 1 >= this.dimensions) { // invariant: there is some issue, something is wrong
flag = true; // invariant is broken
console.log("something bad happened ! (1)");
}
k++;
// console.log("if; k, j");
// console.log(k + ", " + j);
} else {
var temp_matrix = this.matrix_before[k];
this.matrix_before[k] = this.matrix_before[k - 1];
this.matrix_before[k - 1] = temp_matrix;
this.matrix_after = new gs(this.matrix_before, false).matrix; // update after vectors: perform Gram-Schmidt, but do not normalize
if (k === Math.max(k - 1, 1) || k >= this.dimensions || Math.max(k - 1, 1) >= this.dimensions) { // invariant: there is some issue, something is wrong
flag = true; // invariant is broken
console.log("something bad happened ! (2)");
}
k = Math.max(k - 1, 1);
// console.log("else; k, j");
// console.log(k + ", " + j);
}
console.log(this.matrix_before);
console.log("\n");
} // I added this flag variable to prevent getting exceptions and terminate the loop gracefully
console.log("final: ");
console.log(this.matrix_before);
}
// calculated mu as been mentioned on Wikipedia
// mu(i, j) = <b_i, b*_j> / <b*_j, b*_j>
mu(i, j) {
var top = tools.dot_product(this.matrix_before[i], this.matrix_after[j], this.dimensions);
var bottom = tools.dot_product(this.matrix_after[j], this.matrix_after[j], this.dimensions);
return top / bottom;
}
Here is the screenshot of the algorithm that is on Wikipedia:
Update #1: I added more comments to the code to clarify the question hoping that someone would help.
Just in case you are wondering about the already available implementation of the code, you can type: LatticeReduce[{{0,1},{2,0}}] wolfram alpha to see how this code suppose to behave.
Update #2: I cleaned up the code more and added a validate function to make Gram Schmidt code is working correctly, but still code fails and value of k exceeds number of dimensions (or number of vectors) which doesn't make sense.
The algorithm description in Wikipedia uses rather odd notation -- the vectors are numbered 0..n (rather than, say, 0..n-1 or 1..n), so the total number of vectors is n+1.
The code you've posted here treats this.dimensions as if it corresponds to n in the Wikipedia description. Nothing wrong with that so far.
However, the constructor in the full source file on GitHub sets this.dimensions = matrix[0].length. Two things about this look wrong. The first is that surely matrix[0].length is more like m (the dimension of the space) than n (the number of vectors, minus 1 for unclear reasons). The second is that if it's meant to be n then you need to subtract 1 because the number of vectors is n+1, not n.
So if you want to use this.dimensions to mean n, I think you need to initialize it as matrix.length-1. With the square matrix in your test case, using matrix[0].length-1 would work, but I think the code will then break when you feed in a non-square matrix. The name dimensions is kinda misleading, too; maybe just n to match the Wikipedia description?
Or you could call it something like nVectors, let it equal matrix.length, and change the rest of the code appropriately, which just means an adjustment in the termination condition for the main loop.

How to find total possible values from length and characters?

I'm totally not a Math whiz kid here, but have put together a function with the great help of StackOverflow (and a lot of trial and error) that generates a random serial number from a Formula, group of Letters/Numbers, and array (so as to not duplicate values).
So, my current formula is as follows:
$.extend({
generateSerial: function(formula, chrs, checks) {
var formula = formula && formula != "" ? formula : 'XXX-XXX-XXX-XXX-XXX', // Default Formula to use, should change to what's most commonly used!
chrs = chrs && chrs != "" ? chrs : "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789", // Default characters to randomize, if not defined!
len = (formula.match(/X/g) || []).length,
indices = [],
rand;
// Get all "-" char indexes
for(var i=0; i < formula.length; i++) {
if (formula[i] === "-") indices.push(i);
}
do {
rand = Array(len).join().split(',').map(function() {
return chrs.charAt(Math.floor(Math.random() * chrs.length));
}).join('');
// Rebuild string!
if (indices && indices.length > 0)
{
for(var x=0; x < indices.length; x++)
rand = rand.insert(indices[x], '-');
}
} while (checks && $.inArray(rand, checks) !== -1);
return rand;
}
});
Ok, so, what I need to be able to do is to find total possible values and make sure that it is possible to generate a unique serial number before actually doing so.
For example:
var num = $.generateSerial('XX', 'AB', new Array('AB', 'BA', 'AA', 'BB'));
This will cause the code to do an infinite loop, since there are no more possibilties here, other than the ones being excluded from the extension. So this will cause browser to crash. What I need to be able to do here is to be able to get the number of possible unique values here and if it is greater than 0, continue, otherwise, don't continue, maybe an alert for an error would be fine.
Also, keep in mind, could also do this in a loop so as to not repeat serials already generated:
var currSerials = [];
for (var x = 0; x < 5; x++)
{
var output = $.generateSerial('XXX-XXX-XXX', '0123456789', currSerials);
currSerials.push(output);
}
But the important thing here, is how to get total possible unique values from within the generateSerial function itself? We have the length, characters, and exclusions array also in here (checks). This would seem more like a math question, and I'm not expert in Math. Could use some help here.
Thanks guys :)
Here is a jsFiddle of it working nicely because there are more possible choices than 16: http://jsfiddle.net/qpw66bwb/1/
And here is a jsFiddle of the problem I am facing: Just click the "Generate Serials" button to see the problem (it continuously loops, never finishes), it wants to create 16 serials, but 16 possible choices are not even possible with 2 characters and only using A and B characters: http://jsfiddle.net/qpw66bwb/2/
I need to catch the loop here and exit out of it, if it is not able to generate a random number somehow. But how?
The number of possible serials is len * chrs.length, assuming all the characters in chrs are different. The serial contains len characters to fill in randomly, and chrs.length is the number of possible characters in each position of that.

Find out the common parts of all the strings

I have a large array have almost 10000 strings, I want to find all the common part (which is longer than 3 chars)in these strings and get the occurrence of these parts.
I implement with my simple method with javascript, but it really cost a lot of time, even I did some optimization on that, for a short array of 1000 strings, it cost about 8s for Firefox and 12s for Chrome.
So I wonder if there are any related technology or algorithm, as I think it is really a common problem which could be raised in many application.
Build an array of all possible substrings, sort them and then look for blocks of consecutive equal strings.
The implementation below looks for suffixes of a certain length and imposes a minimal number of matches. It is not clear what you want exactly, but you need some constraints. It is easy to look for the longest common suffixes, but if you just want common suffixes, what does that mean? Are 20 occurences of a 4-character string better than 10 occurrences of a 5-character string?
Also, note that the code below does not check for overlapping strings. If you look for matches of length 4 and have 30 words with "green" in it, the result will contain both "gree" and "reen".
It might not be what you want, but it should be easy to adapt. And it's reasonably fast. On 10,000 randomly generated strings with about 30 chars each, it takes less than a second to find common substrings of length 10 and maybe 4 seconds for 1000,000 strings.
Anyway, here goes:
/*
* Return an array of all substrings of the given length
* that occur at least mincount times in all the strings in
* the input array strings.
*/
function substrings(strings, length, mincount) {
var suffix = [];
var res = [];
for (var i = 0; i < strings.length; i++) {
var s = strings[i];
for (var j = 0; j < s.length - length + 1; j++) {
suffix.push(s.substr(j, length));
}
}
suffix.sort();
suffix.push("");
var last = "";
var count = 1;
for (var i = 0; i < suffix.length; i++) {
var s = suffix[i];
if (s == last) {
count++;
} else {
if (count >= mincount) res.push(last);
count = 1;
}
last = s;
}
return res;
}
Check out Crossfilter http://square.github.io/crossfilter/ It will do whatever kind of map-reduce you want. Javascript can be very slow for searching through big messes, though. Depending on your constraints, and how that 10,000 strings will grow over time, you might think of an RDMS, like MySQL, since they are designed for this type of thing.
I had a slightly different issue where I needed to find the common prefixes across each word not just common substrings which could be in the middle or end of the word. This function will return the common prefixes across a set of words.
function findCommonPrefixes (words, min) {
const result = new Set();
for (const word of words) {
let partial = word.toLowerCase();
do {
const otherWords = words.filter(w => w !== word).map(w => w.toLowerCase());
for (const word of otherWords) {
if (word.includes(partial)) {
result.add(partial);
partial = '';
break;
}
}
if (partial) {
partial = partial.slice(0, (partial.length - 1))
}
} while (partial.length && partial.length >= min)
}
return Array.from(result);
}

How to efficiently check a list item with all other list items?

Is there any way to optimize this method of searching?
for (var i=0;i<dots.length;i++) {
var blist = [];
for (var n=0;n<dots.length;n++) {
if (dots[n][1]>(dots[i][1]-90)
&& dots[n][1]<(dots[i][1]+90)
&& dots[n][2]>(dots[i][2]-90)
&& dots[n][2]<(dots[i][2]+90)) {
if (!(n === i)) blist.push(n);
}
}
dots[x][1] is the x-coordinate and dots[x][2] is the y-coordinate.
I have 1000 dots, and need to find the dots surrounding each dot, so that results in the
if (dots[n][1]>(dots[i][1]-90)
&& dots[n][1]<(dots[i][1]+90)
&& dots[n][2]>(dots[i][2]-90)
&& dots[n][2]<(dots[i][2]+90))
Running a million times a second, so is there a way to optimize this?
Perhaps try using a data structure for your dots like this
var Dot = function(){
var x = 0;
var y = 0;
var Up;
var Right;
var Left;
var Down;
function init(xVal,yVal)
{
x = xVal;
y = yVal;
}
function GetUp()
{
return Up;
}
function SetUp(UpDot)
{
Up = UpDot;
}
return
{
init: init,
GetUp: GetUp,
SetUp: SetUp
};
};
and then use it like this
var Dots = [];
var firstDot = new Dot();
Dots.push(firstDot);
var secondDot = new Dot();
secondDot.init(0,90);
secondDot.SetUp(firstDot);
Dots.push(secondDot);
Obviously, more would need to be added and configured to match your situation. However, what this would allow you to do was iterate through dots and then check weather there existed a near dot making the time O(n) instead of O(n^2) and thus saving you 900,000 checks.
One way to cut your time in half would be not to double-check each pair:
for (var i = 0, len = dots.length; i < len - 1; i++) {
var blist = [];
for (var n = i + 1; n < len; n++) {
if (dots[n][1]>(dots[i][1]-90)
&& dots[n][1]<(dots[i][1]+90)
&& dots[n][2]>(dots[i][2]-90)
&& dots[n][2]<(dots[i][2]+90)) {
blist.push(i);
blist.push(n);
}
}
}
Note the change in loop boundaries. This allows me to check each pair only once and skip the (n === i) check.
I also cache dot.length, probably not a big deal, but worth doing for a tight loop.
Still, that should be an improvement of more than 50%. While that could help, it's not the orders of magnitude change that might be required for this sort of issue.
Here's a sketch of a solution. It may be the same idea TravisJ was suggesting, although that's not clear to me. It really is only a sketch, and would take significant code to implement.
If you partition your space into 90 unit x 90 unit sections, then a dot in a particular section can only be close enough to a dot in that section or to a dot in one of that section's eight neighbors. This could significantly reduce the number of pairs you have to compare. The cost, of course is algorithmic complexity:
First create a data structure to represent your grid sections. They can probably be represented just by top-left corners, since their heights and widths would be fixed at 90, except maybe at the trailing edges, where it probably wouldn't matter. Assuming a rectangular surface, each one could have three, five, or eight neighbors (corners, edges, inner sections respectively).
Loop through your dots, determining which section they live in. If your total grid starts at 0, this should be relatively straightforward, using some Math.floor(something / 90) operations.
For each section, run the loop above on itself and each of its neighbors to find the set of matches. You can use the shortened version of the loop from my earlier answer.
For a further optimization, you can also reduce the number of neighbors to check. If Section3,7 does a comparison with Section3,8, then there is no reason for Section3,8 to also do the comparison with Section3,7. So you check only a certain subset of the neighbors, say those whose x and y components of their section numbers are greater than or equal to their own.
I have not tested this, except in my head. It should work, but I have not tried to write any code. And the code would not be trivial. I don't think it's weeks of work, but it's not something to whip together in a few minutes either.
I believe it could significantly increase the speed, but that will depend upon how many matches there are, how many dots there are relative to the number of sections.

JavaScript: check if an array is a subsequence of another array (write a faster naïve string search algo)

[5, 4, 4, 6].indexOfArray([4, 6]) // 2
['foo', 'bar', 'baz'].indexOfArray(['foo', 'baz']) // -1
I came up with this:
Array.prototype.indexOfArray = function(array) {
var m = array.length;
var found;
var index;
var prevIndex = 0;
while ((index = this.indexOf(array[0], prevIndex)) != -1) {
found = true;
for (var i = 1; i < m; i++) {
if (this[index + i] != array[i]) {
found = false;
}
}
if (found) {
return index;
}
prevIndex = index + 1
}
return index;
};
Later I have find wikipedia calls it Naïve string search:
In the normal case, we only have to look at one or two characters for each wrong position to see that it is a wrong position, so in the average case, this takes O(n + m) steps, where n is the length of the haystack and m is the length of the needle; but in the worst case, searching for a string like "aaaab" in a string like "aaaaaaaaab", it takes O(nm) steps.
Can someone write a faster indexOfArray method in JavaScript?
The algorithm you want is the KMP algorithm (http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm) used to find the starting index of a substring within a string -- you can do exactly the same thing for an array.
I couldn't find a javascript implementation, but here are implementations in other languages http://en.wikibooks.org/wiki/Algorithm_implementation/String_searching/Knuth-Morris-Pratt_pattern_matcher -- it shouldn't be hard to convert one to js.
FWIW: I found this article a good read Efficient substring searching It discusses several variants of Boyer-Moore although it's not in JavaScript. The Boyer-Moore-Horspool variant (by Timo Raita’s -- see first link for link) was going to be my "suggestion" for a potential practical speed gain (does not reduce big-O though -- big-O is upper limit only!). Pay attention to the Conclusion at the bottom of the article and the benchmarks above.
I'm mainly trying to put up opposition for the Knuth-Morris-Pratt implementation ;-)

Categories