Spell Check API - javascript

I have implemented the Jazzy spell-check API in my project to find misspelled words and provide suggestions for these words. I've downloaded a ".dic" file to be used with it. However, the dictionary file doesn't contain words in alphabetic order. Could anyone point out the reason why?
Also we have a getSuggestions() method, which provides the suggestions for the misspelled words. Could anyone suggest how to it determines which suggestion displays first?

If you are going to loop through an array of words and compare a string to them, it makes a lot of sense to put the words that are more frequent, like "the" "for", near the beginning so that your loop finds the correct answer sooner.
There are many ways to determine "suggestions", one is the levenshtein distance
https://en.wikipedia.org/wiki/Levenshtein_distance

Related

Is there a PHP or JS algorithm that can filter out the nouns on a string?

I was working on a forum and thought of making a tag generator, something like Quora.com but simpler. So, first I "purified" the string – meaning removed some irrelevant words like "for", "in"...
But I couldn't figure out how to only get the nouns in the string. For example: In this thread's title "Is there a PHP or JS algorithm that can filter out the nouns on a string?" would give us:
PHP
JS
algorithm
nouns
string
This is more or less good and accurate. But I also don't want to use a noun-list because I don't want to waste half of my years writing it. I'll also be glad if you know any good noun-lists. Thank you.
You need a "lexical dictionary" (dictionary that maintains metadata about words and connections between them) like Princeton Wordnet. This is an english word semantic database you can use to query and compare things like nouns / verbs or even synonyms / hypernyms.
This obviously would run on your server. You would have to parse the strings on the server side (you could use Ajax if you want it to look like its on the client). There is no feasible way to maintain an entire english dictionary in browser memory, and to search through it, with anything resembling good performance.

Combining RegEx's

I have two regex's that I am trying to combine. One is email specific and the other checks certain special characters. I have arrived at this solution following much toying:
"^([-0-9a-zA-Z.+_]+#[-0-9a-zA-Z.+_]+\.[a-zA-Z]{2,4}|[\\w\\-ÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜŸäëïöüŸçÇŒœßØøÅåÆæÞþÐð _]){0,80}$"
It does seem to check what I need it to, but for instance the following is still returned valid: abc#foo it does not force a full email address.
Am I using the correct approach or is there a simpler way to structure this RegEx? I'm on a learning curve with regex so all advice appreciated.
Move the multiplier {0,80} inside the parenthesis:
"^([-0-9a-zA-Z.+_]+#[-0-9a-zA-Z.+_]+\.[a-zA-Z]{2,4}|[\\w\\-ÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜŸäëïöüŸçÇŒœßØøÅåÆæÞþÐð _]{0,80})$"
// here __^^^^^^^
Also [a-zA-Z]{2,4} is really poor to validate TLDs, have a look at IANA.
And me#localhost is a valid email address.

How to approach string length constraints when localization is brought into the equation?

Once there was a search input.
It was responsible for filtering data in a table based on user input.
But this search input was special: it would not do anything unless a minimum of 3 characters was entered.
Not because it was lazy, but because it didn't make sense otherwise.
Everything was good until a new and strange (compared to English) language came to town.
It was Japanese and now the minimum string length of 3 was stupid and useless.
I lost the last few pages of that story. Does anyone remember how it ends?
In order to fix the issue, you obviously need to determine if user's input belongs to certain script(s). The most obvious way to do this is to use Unicode Regular Expressions:
var regexPattern = "[\\p{Katakana}\\p{Hiragana}\\p{Han}]+";
The only issue would be, that JavaScript does not support this kind of regular expressions out of the box. Anyway, you are lucky - there is a JS library called XRegExp and its Scripts add-on seems to exactly what you need. Now, the question is, whether you want to require at least three characters for non-Japanese or non-Chinese users, or do it otherwise - require at least three characters for certain scripts (Latin, Common, Cyrillic, Greek and Hebrew) while allowing any other to be searched on one character. I'd suggest the second solution:
if (XRegExp('[\\p{Latin}\\p{Common}\\p{Cyrillic}\\p{Greek}\\p{Hebrew}]+').test(input)) {
// test for string length and call AJAX if the string is long enough
} else {
// call AJAX search method
}
You might want to pre-compile the regular expression for better performance, but that's basically it.
I guess it mainly depends on where you get that min length variable from. If it's hardcoded, you'd probably better use a dynamic internationalization module:
int.getMinStringLength(int.getCurrentLanguage())
Either you have a dynamic bindings framework such as AngularJS, or you update that module when the user changes the language.
Now maybe you'd want to sort your supported languages by using grouping attributes such as "verbose" and "condensed".

Maps (hashtables) in the real world

I'm trying to explain Map (aka hash table, dict) to someone who's new to programming. While the concepts of Array (=list of things) and Set (=bag of things) are familiar to everyone, I'm having a hard time finding a real-world metaphor for Maps (I'm specifically interested in python dicts and Javascript Objects). The often used dictionary/phone book analogy is incorrect, because dictionaries are sorted, while Maps are not - and this point is important to me.
So the question is: what would be a real world phenomena or device that behaves like Map in computing?
I agree with delnan in that the human example is probably too close to that of an object. This works well if you are trying to transition into explaining how objects are implemented in loosely typed languages, however a map is a concept that exists in Java and C# as well. This could potentially be very confusing if they begin to use those languages.
Essentially you need to understand that maps are instant look-ups that rely on a unique set of values as keys. These two things really need to be stressed, so here's a decent yet highly contrived example:
Lets say you're having a party and everyone is supposed to bring one thing. To help the organizer, everyone says what their first name is and what they're bringing. Now lets pretend there are two ways to store this information. The first is by putting it down on a list and the second is by telling someone with a didactic memory. The contrived part is that they can only identify you through you're first name (so he's blind and has a cochlear implant so everyone sounds like a robot, best I can come up with).
List: To add, you just append to the bottom of the list. To back out you just remove yourself from the list. If you want to see who is bringing something and what they're bringing, then you have to scan the entire list until you find them. If you don't find them after scanning, then they're clearly they're not on the list and not bringing anything. The list would clearly allow duplicates of people with the same first name.
Dictionary (contrived person): You don't append to the end of the list, you just tell him someone's first name and what they're bringing. If you want to know what someone is bringing you just ask by name and he immediately tells you. Likewise if two people of the same name tell him they're bringing something, he'll think its the same person just changing what they're bringing. If someone hasn't signed up you would ask by name, but he'd be confused and ask you what you're talking about. Also you would have to say when you tell the guy that someone is no longer bringing something he would lose all memory of them, so yeah highly contrived.
You might also want to show why the list is sufficient if you don't care who brings what, but just need to know what all is being brought. Maybe even leave the names off the list, to stress key/value pairs with the dictionary.
Perhaps it would be the analogy of a human being that your meeting for the first time:
Each person has an unordered amount of attributes, each of these attributes can only have 1 value, which is unique (like hair=long, eye_color=blue). And you would discover these attributes in no particular order.
So for a person she can have a shoesize=38, hair_color=brown and eye_color=blue and when reciting (human_dict.get('shoe_size')) this to someone else you would mention the attributes in no particular order except by attribute name.
I have seen cases where a large list of people were binned according to their last N digits of their identifying number, in order to save on key search. This binning is somewhat similar to hashing, and may help explain it.
Are you successful in explaining the array in a logical way..that array is a storage where elements are kept at first position. second position , third position....first,second.third are basically keys...
Now extend it to say maps are storage where are keys are not necessarily numbers..lets say they are strings...or even numbers which are not consecutive or have any relationship
Conversely lets say in array A(of int) are maps where index 1 is mapped to A's address, 2 to the address of A + 4 and so on....
In some restaurants when you make your order in the counter, they give you a number to identify your order. The numbers :
Don't need to be sorted.
Don't need to be consecutive
The only idea of the numbers is that they can find your order easily. In the map/hash table/associative array world the number would be the key and your order the value.
After you finish your order they can use the same number for another order. So the number is basically the identifier for an order at certain point in time, this would fit the Javascript Object example where the properties of the objects can change their value.

Split word into syllables

How to split the syllables in a word using JavaScript. Is there any API for that? Any help will be appreciated.
Now, back to being constructive, what I suggest is that you find a couple online dictionary sites and look at their APIs (I know dictionary.com has a free API) and see if you can use it to access just the word split into syllables from a lookup.
Unfortunately, from what I have read, it looks like you would really need a dictionary of words split already to check against and there aren't any standalone versions out there.
Be the first and post it somewhere! :)

Categories