Optimizing JavaScript autocorrect implementation - javascript

I modified the jQuery Autocomplete implementation to generate autocorrect suggestions too. I used Levenshtein distance as the metric to decide closest matches.
My code runs at each keypress after the jQuery autocomplete does not have any more suggestions. Here's the code that I wrote:
// Returns edit distance between two strings
edit_distance : function(s1, s2) {
// Auxiliary 2D array
var arr = new Array(s1.length+1);
for(var i=0 ; i<s1.length+1 ; i++)
arr[i] = new Array(s2.length+1);
// Algorithm
for(var i=0 ; i<=s1.length ; i++)
for(var j=0 ; j<=s2.length ; j++)
arr[i][j] = 0;
for(var i=0 ; i<=s1.length ; i++)
arr[i][0] = i;
for(var i=0 ; i<=s2.length ; i++)
arr[0][i] = i;
for(var i=1 ; i<=s1.length ; i++)
for(var j=1 ; j<=s2.length ; j++)
arr[i][j] = Math.min(arr[i-1][j-1] + (s1.charAt(i-1)==s2.charAt(j-1) ? 0 : 1), arr[i-1][j]+1, arr[i][j-1]+1);
// Final answer
return arr[s1.length][s2.length].toString(10);
},
// This is called at each keypress
auto_correct : function() {
// Make object array for sorting both names and IDs in one go
var objArray = new Array();
for(var i=0 ; i<idArray.length ; i++) {
objArray[i] = new Object();
objArray[i].id = idArray[i];
objArray[i].name = nameArray[i];
}
// Sort object array by edit distance
var out = this;
companyObjArray.sort (
function(a,b) {
var input = jQuery("#inputbox").val().toLowerCase();
var d1 = a.name.toLowerCase();
var d2 = b.name.toLowerCase();
return out.editDistance(input,d1) - out.editDistance(input,d2);
}
);
// Copy some closest matches in arrays that are shown by jQuery
this.suggestions = new Array();
this.data = new Array();
for(var i=0 ; i<5 ; i++) {
this.suggestions.push(companyObjArray[i].name);
this.data.push(companyObjArray[i].id);
}
}
All names have IDs associated with them, so before sorting I'm just creating an object array out of them, and sorting the array.
Since the list to search from is in thousands, its slow. I found a data structure called a BK-tree, which can speed it up, but I'm can't implement it right now. I'm looking for optimization suggestions to speed this up. Any suggestions are welcome. Thanks in advance.
EDIT. I decided to use Sift3 as my string distance metric instead of Levenshein distance, it gives more meaningful results and is faster.

There are quite a few things you can optimize here, but most of it boils down to this: Only do each of your 'heavier' calculations once. You are doing a lot of work on each keypress, in every sort comparison, etc., and caching those values will help a lot.
However, for a significant added performance boost, there is another, quite nifty optimization trick you can use. It is used by every major search engine, including on-site search engines on sites such as Amazon.com.
It takes advantage of the fact that you really don't need to sort the whole list, since all you're displaying are the top 10 or 12 items. Whether the rest of the thousands of list items are in the correct order doesn't matter in the least. So all you really have to keep track of while running through the list, are the top 10 or 12 items you've seen so far, which, as it turns out, is a lot faster than full sorting.
Here's the idea in pseudocode:
The user types a character, creating a new search term
We define an empty shortlist array of length 12 (or however many suggestions we want)
Loop through the full list of words (we will call it the dictionary):
Calculate the (Levenshtein) edit distance between the dictionary word and the search term
If the distance is lower (better) than the current threshold, we:
Add the word to the top of the shortlist
Let the bottom word in the shortlist 'overflow' out of the list
Set our threshold distance to match the word now at the bottom of the shortlist
When the loop has finished, we have a shortlist containing some of the best words, but they are unsorted, so we sort just those 12 words, which will be really fast.
One small caveat: Depending on the dataset and the number of elements in your shortlist, the shortlist may differ slightly from the actual top elements, but this can be alleviated by increasing the shortlist size, e.g. to 50, and just removing the bottom 38 once the final sort is done.
As for the actual code:
// Cache important elements and values
var $searchField = $('#searchField'),
$results = $('#results'),
numberOfSuggestions = 12,
shortlistWindowSize = 50,
shortlist,
thresholdDistance;
// Do as little as possible in the keyboard event handler
$searchField.on('keyup', function(){
shortlist = [];
thresholdDistance = 100;
// Run through the full dictionary just once,
// storing just the best N we've seen so far
for (var i=0; i<dictionarySize; i++) {
var dist = edit_distance(this.value, dictionary[i]);
if (dist < thresholdDistance) {
shortlist.unshift({
word: dictionary[i],
distance: dist
});
if (shortlist.length > shortlistWindowSize) {
shortlist.pop();
thresholdDistance = shortlist[shortlistWindowSize-1].distance;
}
}
}
// Do a final sorting of just the top words
shortlist.sort(function(a,b){
return a.distance - b.distance;
});
// Finally, format and show the suggestions to the user
$results.html('<p>' + $.map(shortlist, function(el){
return '<span>[dist=' + el.distance + ']</span> ' + el.word;
}).slice(0,numberOfSuggestions).join('</p><p>') + '</p>').show();
});
Try the method out in this 12.000 word demo on jsFiddle

Related

Speeding up this Javasctipt string manipulation

I am changing the content of about 5000 HTML tags and as I read here doing 5000+ html rendering changes is slow and it is better just to redraw the HTML once,
Therefor I have created a function that loads the entire HTML into a JavaScript string and then goes through the text(looking for a label tag), changes the content of the tags and eventually redraws the HTML once.
With high hopes, this was also a failure and takes around 30 seconds on a 1000 tags test.
My function basically reverse counts all the visible label DIVs on the screen and adds numbering to them,
Here's the code, what am I doing wrong please? (here's an isolated jsfiddle: http://jsfiddle.net/tzvish/eLbTy/8/)
// Goes through all visible entries on the page and updates their count value
function updateCountLabels() {
var entries = document.getElementsByName('entryWell');
var entriesId = document.getElementsByName('entryId');
var entriesHtml = document.getElementById('resultContainer').innerHTML;
var visibleEntries = new Array();
var countEntries = 0 , pointer = 0;
// Create a list of all visible arrays:
for (i = 0; i < entries.length; i++) {
if ($(entries[i]).is(":visible")) {
countEntries++;
visibleEntries.push(entriesId[i].value);
}
}
// Update html of all visible arrays with numbering:
for (i = 0; i < visibleEntries.length; i++) {
currentId = visibleEntries[i];
// Get the position of insertion for that label element:
elementHtml = '<div id="entryCount'+currentId+'" class="entryCountLabel label label-info">';
pointer = entriesHtml.indexOf(elementHtml) + elementHtml.length;
pointerEnd = entriesHtml.indexOf("</div>",pointer);
entriesHtml = entriesHtml.substring(0, pointer) + countEntries + entriesHtml.substring(pointerEnd);
countEntries--;
}
// apply the new modified HTML:
document.getElementById('resultContainer').innerHTML = entriesHtml;
}
From here:
Loops are frequently found to be bottlenecks in JavaScript. To make a loop the most efficient, reverse the order in which you process the items so that the control condition compares the iterator to zero. This is far faster than comparing a value to a nonzero number and significantly speeds up array processing. If there are a large number of required iterations, you may also want to consider using Duff’s Device to speed up execution.
So change:
for (i = 0; i < entries.length; i++)
To
for (i = entries.length - 1; i ==0 ; i--)
Also notice following paragraph from mentioned link:
Be careful when using HTMLCollection objects. Each time a property is accessed on one of these objects, it requires a query of the DOM for matching nodes. This is an expensive operation that can be avoided by accessing HTMLCollection properties only when necessary and storing frequently used values (such as the length property) in local variables.
So change:
for (i = entries.length - 1; i == 0; i--)
To:
var len = entries.length;
for (i = len - 1; i ==0 ; i--)

Javascript code help - Wrong return

So I'm working on this script. When I'm done with it, it should be used for making 2-and-2 groups. But anyway; The 'input' array in the start of the script, will get 22 different inputs from my HTML file. As a standard, I gave them the values 1-22. The thing is my two blocks '1st number' and '2nd number' doesn't work very well: they don't return the right numbers. Cause I want every elev[x] to be used once! Not 2 times, not 0 times, once! And the blocks returns like some of them twice and some of them isn't even used. So how can I fix this?
function Calculate(){
var elev = [];
var inputs = document.getElementsByName("txt");
for(i=0; i<inputs.length; i++) {
elev[i] = {
"Value": inputs[i].value,
"Used": false
};
}
function shuffle(elev){
var len = elev.length;
for(i=1; i<len; i++){
j = ~~(Math.random()*(i+1));
var temp = elev[i];
arr[i] = elev[j];
arr[j] = temp;
}
}
for(var v=0; v<1; v++) {
shuffle(elev);
document.write(elev + '<br/>\n');
}}
Yes, I'm still new at programming and I just wanna learn what I can learn.
Problem solved by doing the Fisher-Yates shuffle.
The idea of shuffling an array and iterating over it is correct, however, sorting by a coin-flipping comparator (a common mis-implementation; suggested by the other answer) is not the correct way to shuffle an array; it produces very skewed results and is not even guaranteed to finish.
Unless you're willing to fetch underscore.js (arr = _.shuffle(arr)), It is recommended to use the Fisher-Yates shuffle:
function shuffle(arr){
var len = arr.length;
for(var i=1; i<len; i++){
var j = ~~(Math.random()*(i+1)); // ~~ = cast to integer
var temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
}
...
shuffle(elev);
for(var i=0; i<elev.length; i++){
//do something with elev[i]
console.log(elev[i].Value);
}
also, note that object fields should be lowercase, not uppercase (value', not 'Value'). Also, theUsed` field should not be neccessary since you now iterate in order. Also, any chance you could use an array of Values? They seem to be the only field in your objects.
Also, Don't use document.write(). It doesn't work as expected when used after the page load. If you want to append something to the document, you hate jQuery and other frameworks and you don't want to go the long way of creating and composing DOM nodes, document.body.innerHTML += is still better than document.write() (but still consider appending DocumentFragments instead').
demo: http://jsfiddle.net/honnza/2GSDd/1/
You are only getting random numbers once, and then processing them if they haven't already been used.
I would suggest shuffling the array (elev.sort(function() {return Math.random()-0.5})) and then just iterating through the result without further random numbers.

Generating random unique data takes too long and eats 100% CPU

WARNING: CPU Usage goes to 100%, be careful.
Link to the jsFiddle
This script has been written to design a dynamic snake and ladder board. Everytime the page is refreshed a new board is created. Most of the time all of the background images do not appear, and the CPU usage goes up to 100%. But on occasion all of them appear and the CPU usage is normal.
Opera shows some of the background images, Firefox lags and asks me if I wish to stop the script.
I believe that the problem is with these lines of code:
for(var key in origin) // Need to implement check to ensure that two keys do not have the same VALUES!
{
if(origin[key] == random_1 || origin[key] == random_2 || key == random_2) // End points cannot be the same AND starting and end points cannot be the same.
{
valFlag = 1;
}
console.log(key);
}
Your algorithm is very ineffective. When array is almost filled up, you literally do millions of useless iterations until you're in luck and RNG accidentally picks missing number. Rewrite it to:
Generate an array of all possible numbers - from 1 to 99.
When you need a random numbers, generate a random index in current bounds of this array, splice element and this random position, removing it from array and use its value as your desired random number.
If generated numbers don't fit some of your conditions (minDiff?) return them back to array. Do note, that you can still stall in loop forever if everything that is left in array is unable to fit your conditions.
Every value you pull from array in this way is guaranteed to be unique, since you originally filled it with unique numbers and remove them on use.
I've stripped drawing and placed generated numbers into array that you can check in console. Put your drawing back and it should work - numbers are generated instantly now:
var snakes = ['./Images/Snakes/snake1.png','./Images/Snakes/snake2.jpg','./Images/Snakes/snake3.gif','./Images/Snakes/snake4.gif','./Images/Snakes/snake5.gif','./Images/Snakes/snake6.jpg'];
var ladders = ['./Images/Ladders/ladder1.jpg','./Images/Ladders/ladder2.jpg','./Images/Ladders/ladder3.png','./Images/Ladders/ladder4.jpg','./Images/Ladders/ladder5.png'];
function drawTable()
{
// Now generating snakes.
generateRand(snakes,0);
generateRand(ladders,1);
}
var uniqNumbers = []
for(var idx = 1; idx < 100; idx++){ uniqNumbers.push(idx) }
var results = []
function generateRand(arr,flag)
{
var valFlag = 0;
var minDiff = 8; // Minimum difference between start of snake/ladder to its end.
var temp;
for(var i = 0; i< arr.length; ++i) {
var valid = false
// This is the single place it still can hang, through with current size of arrays it is highly unlikely
do {
var random_1 = uniqNumbers.splice(Math.random() * uniqNumbers.length, 1)[0]
var random_2 = uniqNumbers.splice(Math.random() * uniqNumbers.length, 1)[0]
if (Math.abs(random_1 - random_2) < minDiff) {
// return numbers
uniqNumbers.push(random_1)
uniqNumbers.push(random_2)
} else {
valid = true
}
} while (!valid);
if(flag == 0) // Snake
{
if(random_1 < random_2) // Swapping them if the first number is smaller than the second number.
{
var temp = random_1; random_1 = random_2; random_2 = temp
}
}
else // Ladders
{
if(random_1>random_2) // Swapping them if the first number is greater than the second number.
{
var temp = random_1; random_1 = random_2; random_2 = temp
}
}
// Just for debug - look results up on console
results.push([random_1, random_2])
}
}
drawTable()
I had a problem like this using "HighCharts", in a for loop - "browsers" have an in-built functionality to detect dead scripts or infinite loops. So the browsers halts or pop-ups up a message saying not responding. Not sure if you have that symptom!
This was resulted from a "loop" with a large pool of data. I wrote a tutorial on it on CodeProject, you might try it, and it might be your answer.
http://www.codeproject.com/Tips/406739/Preventing-Stop-running-this-script-in-Browsers

Numeric sort of a stream that keeps alpha order when the number is the same

I tried a few of the regex sorts I found on SO, but I think they may not like the + symbol in the stream i'm needing to sort.
So I'm getting a data stream that looks like this (3 to 30 letters '+' 0 to 64000 number)
userString = "AAA+800|BBB+700|CCC+600|ZZZ+500|YYY+400|XXX+300|XXA+300|XXZ+300";
the output needs to be in the format:
array[0] = "XXA+300" // 300 being the lowest num and XXA being before XXX
array[...]
array[7] = "AAA+800"
I wish to order it from lowest num to highest num and reversed.
Here is my inefficient code. which loops 8x8 times. (my stream maybe 200 items long)
It works, but it looks messy. Can someone help me improve it so it uses less iterations?
var array = userString.split('|');
array.sort();
for(var i=0; i<len; i++) { // array2 contains just the numbers
bits = array[i].split('+');
array2[i] = bits[1];
}
array2.sort();
if(sort_order==2)
array2.reverse();
var c=0;
for(var a=0;a<len;a++) { // loop for creating array3 (the output)
for(var i=0; i<len ; i++) { // loop thru array to find matching score
bits = array[i].split('+');
if(bits[1] == array2[a]) { // found matching score
array3[c++] = bits[0]+'+'+bits[1]; // add to array3
array[i]='z+z'; // so cant rematch array position
}
}
}
array = array3;
Kind Regards
Please forgive the terse answer (and lack of testing), as I'm typing this on an iPhone.
var userArr = userString.split('|');
userArr.sort(function(a, b) {
var aArr = a.split('+'),
bArr = b.split('+'),
aLetters = aArr[0],
bLetters = bArr[0],
aNumbers = parseInt( aArr[1] ),
bNumbers = parseInt( bArr[1] );
if (aNumbers == bNumbers) {
return aLetters.localeCompare( bLetters );
}
return aNumbers - bNumbers;
/*
// Or, for reverse order:
return -(aNumbers - bNumbers);
// or if you prefer to expand your terms:
return -aNumbers + bNumbers;
*/
});
Basically we're splitting on | then doing a custom sort in which we split again on +. We convert the numbers to integers, then if they differ (e.g. 300 and 800) we compare them directly and return the result (because in that case the letters are moot). If they're the same, though (300 and 300) we compare the first parts (XXA and XXX) and return that result (assuming you want an ordinary alphabetical comparison). In this fashion the whole array is sorted.
I wasn't entirely sure what you meant by "and reversed" in your question, but hopefully this will get you started.
As you may've guessed this isn't totally optimal as we do split and parseInt on every element in every iteration, even if we already did in a previous iteration. This could be solved trivially by pre-processing the input, but with just 200 elements you probably won't see a huge performance hit.
Good luck!

alternatives for excessive for() looping in javascript

Situation
I'm currently writing a javascript widget that displays a random quote into a html element. the quotes are stored in a javascript array as well as how many times they've been displayed into the html element. A quote to be displayed cannot be the same quote as was previously displayed. Furthermore the chance for a quote to be selected is based on it's previous occurences in the html element. ( less occurrences should result in a higher chance compared to the other quotes to be selected for display.
Current solution
I've currently made it work ( with my severely lacking javascript knowledge ) by using a lot of looping through various arrays. while this currently works ( !! ) I find this solution rather expensive for what I want to achieve.
What I'm looking for
Alternative methods of removing an array element from an array, currently looping through the entire array to find the element I want removed and copy all other elements into a new array
Alternative method of calculating and selecting a element from an array based on it's occurence
Anything else you notice I should / could do different while still enforcing the stated business rules under Situation
The Code
var quoteElement = $("div#Quotes > q"),
quotes = [[" AAAAAAAAAAAA ", 1],
[" BBBBBBBBBBBB ", 1],
[" CCCCCCCCCCCC ", 1],
[" DDDDDDDDDDDD ", 1]],
fadeTimer = 600,
displayNewQuote = function () {
var currentQuote = quoteElement.text();
var eligibleQuotes = new Array();
var exclusionFound = false;
for (var i = 0; i < quotes.length; i++) {
var iteratedQuote = quotes[i];
if (exclusionFound === false) {
if (currentQuote == iteratedQuote[0].toString())
exclusionFound = true;
else
eligibleQuotes.push(iteratedQuote);
} else
eligibleQuotes.push(iteratedQuote);
}
eligibleQuotes.sort( function (current, next) {
return current[1] - next[1];
} );
var calculatePoint = eligibleQuotes[0][1];
var occurenceRelation = new Array();
var relationSum = 0;
for (var i = 0; i < eligibleQuotes.length; i++) {
if (i == 0)
occurenceRelation[i] = 1 / ((calculatePoint / calculatePoint) + (calculatePoint / eligibleQuotes[i+1][1]));
else
occurenceRelation[i] = occurenceRelation[0] * (calculatePoint / eligibleQuotes[i][1]);
relationSum = relationSum + (occurenceRelation[i] * 100);
}
var generatedNumber = Math.floor(relationSum * Math.random());
var newQuote;
for (var i = 0; i < occurenceRelation.length; i++) {
if (occurenceRelation[i] <= generatedNumber) {
newQuote = eligibleQuotes[i][0].toString();
i = occurenceRelation.length;
}
}
for (var i = 0; i < quotes.length; i++) {
var iteratedQuote = quotes[i][0].toString();
if (iteratedQuote == newQuote) {
quotes[i][1]++;
i = quotes.length;
}
}
quoteElement.stop(true, true)
.fadeOut(fadeTimer);
setTimeout( function () {
quoteElement.html(newQuote)
.fadeIn(fadeTimer);
}, fadeTimer);
}
if (quotes.length > 1)
setInterval(displayNewQuote, 10000);
Alternatives considered
Always chose the array element with the lowest occurence.
Decided against this as this would / could possibly reveal a too obvious pattern in the animation
combine several for loops to reduce the workload
Decided against this as this would make the code to esoteric, I'd probably wouldn't understand the code anymore next week
jsFiddle reference
http://jsfiddle.net/P5rk3/
Update
Rewrote my function with the techniques mentioned, while I fear that these techniques still loop through the entire array to find it's requirements, at least my code looks cleaner : )
References used after reading the answers here:
http://www.tutorialspoint.com/javascript/array_map.htm
http://www.tutorialspoint.com/javascript/array_filter.htm
http://api.jquery.com/jQuery.each/
I suggest array functions that are mostly supported (and easily added if not):
[].splice(index, howManyToDelete); // you can alternatively add extra parameters to slot into the place of deletion
[].indexOf(elementToSearchFor);
[].filter(function(){});
Other useful functions include forEach and map.
I agree that combining all the work into one giant loop is ugly (and not always possible), and you gain little by doing it, so readability is definitely the winner. Although you shouldn't need too many loops with these array functions.
The answer that you want:
Create an integer array that stores the number of uses of every quote. Also, a global variable Tot with the total number of quotes already used (i.e., the sum of that integer array). Find also Mean, as Tot / number of quotes.
Chose a random number between 0 and Tot - 1.
For each quote, add Mean * 2 - the number of uses(*1). When you get that that value has exceeded the random number generated, select that quote.
In case that quote is the one currently displayed, either select the next or the previous quote or just repeat the process.
The real answer:
Use a random quote, at the very maximum repeat if the quote is duplicated. The data usages are going to be lost when the user reloads/leaves the page. And, no matter how cleverly have you chosen them, most users do not care.
(*1) Check for limits, i.e. that the first or last quota will be eligible with this formula.
Alternative methods of removing an array element from an array
With ES5's Array.filter() method:
Array.prototype.without = function(v) {
return this.filter(function(x) {
return v !== x;
});
};
given an array a, a.without(v) will return a copy of a without the element v in it.
less occurrences should result in a higher chance compared to the other quotes to be selected for display
You shouldn't mess with chance - as my mathematician other-half says, "chance doesn't have a memory".
What you're suggesting is akin to the idea that numbers in the lottery that haven't come up yet must be "overdue" and therefore more likely to appear. It simply isn't true.
You can write functions that explicitly define what you're trying to do with the loop.
Your first loop is a filter.
Your second loop is a map + some side effect.
I don't know about the other loops, they're weird :P
A filter is something like:
function filter(array, condition) {
var i = 0, new_array = [];
for (; i < array.length; i += 1) {
if (condition(array[i], i)) {
new_array.push(array[i]);
}
}
return new_array;
}
var numbers = [1,2,3,4,5,6,7,8,9];
var even_numbers = filter(numbers, function (number, index) {
return number % 2 === 0;
});
alert(even_numbers); // [2,4,6,8]
You can't avoid the loop, but you can add more semantics to the code by making a function that explains what you're doing.
If, for some reason, you are not comfortable with splice or filter methods, there is a nice (outdated, but still working) method by John Resig: http://ejohn.org/blog/javascript-array-remove/

Categories