parsing localized javascript number strings - javascript

In Javascript, I want to present a number to a user in a format they understand so that they can edit it. Consequently, the system will need to parse an international number.
This is because if they are in France they are likely to prefer to edit the number "1.000.000,5" whereas if they are in Australia, they are likely to prefer to edit the number "1 000 000.5" or "1,000,000.5". (To clarify the scope of the question: my code shouldn't have to know about the individual rules of this or that locale. Does any country use ! as a decimal point? I don't know, and I don't want to know.)
Modern Javascript provides the Intl.NumberFormat API, but it only seems to deal with producing numbers, not parsing them.
How can I parse a localized number?

The rules are going to be have to be somewhere, and a reasonable place is in your program or you can use an external library if it exists. To generally answer your question. A number is broken up into groups of three to represents thousands, although in some countries they break up into other groups; e.g. Japan they break up number into groups. Here is a script to break a number up into groups of threes with a given spacing system.
https://repl.it/Eu7I/2

I don't think it's possible without having pre-set the format and knowing what you are converting to/from.
For example, how can a function differentiate between 9,521 in the US and in France? In the US, that's over nine thousand, in France it's nine and a half (and a bit).
I'd recommend you keep a list of regex's for the different formats you will be displaying (and allowing in input) and use the appropriate one to parse the number when you read it in.

Related

JavaScript Unicode standard format

Is there any standard on how to write Unicode characters in JavaScript/JSON?
For instance, is there any difference between \u011b and \u011B? Most of the web examples use second format. Also, there is an option for ASCII characters to be written in a short format like \xe1. Which format is preferable (standard). Is it good practice to mix these formats together and what about performance?
For the first question: both version are valid. It is more a coding convention, you should prefer what convention is already used in your files/project. Then check on your community (convention used by other programs you heavily use, what they prefer, and as last option you can choose one way. But in any case, keep consistent.
Personally I prefer none of them for code: UTF-8 is so wide used and browsers should understand it, so I would put directly the right character (as character, not as escape sequence). If codepoint is important, I would add it into a comment. it is expected that all developers and tools will have UTF-8 editors.
Javascript uses UCS-2, so the precursor of UTF-16, but considering unicode code points to be just 16bit length (so some emoji would use two characters).
The byte format should not be used for text: it hides the meaning. There are exceptions: e.g. to check which encoding you get from user, or if you have BOM. [But so just for signatures]. For other binary cases, it is ok to use \x1e escapes, e.g. for key identification.
Note: you should really follow one coding guidelines. Google for it and you will find many, e.g. this from Google (which is maybe too much): https://google.github.io/styleguide/jsguide.html

Storing more info in QR Code

I am trying to develop a hybrid mobile app with QR code functionality. QR Code contains a limited number of character can be stored with it. So, I am thinking is it possible to compress the string to make it shorter so that I can store more info into the QR code?
At lengths that short, most compression algorithms will actually make data longer, not shorter. There are some algorithms which may work well, though… smaz comes to mind. However, it is going to depend heavily on what you are trying to compress, and you haven't really provided any information about that.
Instead of thinking about compression, your best bet may be to find an encoding scheme which makes more sense for your data. For example, if you're encoding a date and time, store it as a single number instead of text. Think about whether you really need seconds. If you are storing numbers, consider using variable-length quantities. If your data is JSON, consider using protobuf instead.
If what you have really is text, it may be worth considering coming up with your own character set. Instead of ASCII where each character 8 bits, can you limit yourself to 64 characters? a-z, A-Z, 0-9, and two punctuation characters is only 64 possible symbols… if that is all you need, you could use a 6-bit encoding. If the strings aren't case-sensitive you have tons of room for punctuation.

JavaScript - display number as non standard index?

In my JS, I've got a generated number (fairly enormous, it's normally about 95^[5-10]).
How do I stop this number from being displayed as standard notation?
You can't. JavaScript cannot handle such large numbers natively, and the scientific notation helps emphasize that fact.
That said, you might be able to do some string manipulation on it, to strip out the . and process the exponent to find out how many zeroes to add to the end. Obviously it won't be accurate but that's because of the inability to handle such large numbers I mentioned.

How to approach string length constraints when localization is brought into the equation?

Once there was a search input.
It was responsible for filtering data in a table based on user input.
But this search input was special: it would not do anything unless a minimum of 3 characters was entered.
Not because it was lazy, but because it didn't make sense otherwise.
Everything was good until a new and strange (compared to English) language came to town.
It was Japanese and now the minimum string length of 3 was stupid and useless.
I lost the last few pages of that story. Does anyone remember how it ends?
In order to fix the issue, you obviously need to determine if user's input belongs to certain script(s). The most obvious way to do this is to use Unicode Regular Expressions:
var regexPattern = "[\\p{Katakana}\\p{Hiragana}\\p{Han}]+";
The only issue would be, that JavaScript does not support this kind of regular expressions out of the box. Anyway, you are lucky - there is a JS library called XRegExp and its Scripts add-on seems to exactly what you need. Now, the question is, whether you want to require at least three characters for non-Japanese or non-Chinese users, or do it otherwise - require at least three characters for certain scripts (Latin, Common, Cyrillic, Greek and Hebrew) while allowing any other to be searched on one character. I'd suggest the second solution:
if (XRegExp('[\\p{Latin}\\p{Common}\\p{Cyrillic}\\p{Greek}\\p{Hebrew}]+').test(input)) {
// test for string length and call AJAX if the string is long enough
} else {
// call AJAX search method
}
You might want to pre-compile the regular expression for better performance, but that's basically it.
I guess it mainly depends on where you get that min length variable from. If it's hardcoded, you'd probably better use a dynamic internationalization module:
int.getMinStringLength(int.getCurrentLanguage())
Either you have a dynamic bindings framework such as AngularJS, or you update that module when the user changes the language.
Now maybe you'd want to sort your supported languages by using grouping attributes such as "verbose" and "condensed".

Match Phone Numbers Regardless of Formatting

I've written a query for Mongo to search for a phone number. The gotcha is the phone entry is a String rather than a Number. At first I thought it was working fine, however now I realize that if the query isn't formatted correctly it will not match.
So I guess my question is what's the easiest way of matching a phone number regardless of formatting?
Worst case scenario I use a $where statement and check equality by removing numbers from both the values and doing a regex match on that. Just wondering if there is a more optimal way of doing this?
I would store the phone numbers normalized (e.g. either stripped of non numeric chars, or formatted in a standard format) in the DB in the first place, since they are not already normalized, doing it on the fly for each search request will be expensive, so if you don't have too many entries already (e.g. if this is still all in development), a script that will normalize all entries in one shot (or in several batches during off peek hours if you have a production system) will be possible.
Then your where clause will just normalize the input, and then the search will be much easier.
Same goes for addresses by the way, you have to normalize the data to perform good search, or you'll have to develop some fuzzy matching algorithm, that is simply going to be slower. (and might take you more time than you think)

Categories