I've written a query for Mongo to search for a phone number. The gotcha is the phone entry is a String rather than a Number. At first I thought it was working fine, however now I realize that if the query isn't formatted correctly it will not match.
So I guess my question is what's the easiest way of matching a phone number regardless of formatting?
Worst case scenario I use a $where statement and check equality by removing numbers from both the values and doing a regex match on that. Just wondering if there is a more optimal way of doing this?
I would store the phone numbers normalized (e.g. either stripped of non numeric chars, or formatted in a standard format) in the DB in the first place, since they are not already normalized, doing it on the fly for each search request will be expensive, so if you don't have too many entries already (e.g. if this is still all in development), a script that will normalize all entries in one shot (or in several batches during off peek hours if you have a production system) will be possible.
Then your where clause will just normalize the input, and then the search will be much easier.
Same goes for addresses by the way, you have to normalize the data to perform good search, or you'll have to develop some fuzzy matching algorithm, that is simply going to be slower. (and might take you more time than you think)
Related
In Javascript, I want to present a number to a user in a format they understand so that they can edit it. Consequently, the system will need to parse an international number.
This is because if they are in France they are likely to prefer to edit the number "1.000.000,5" whereas if they are in Australia, they are likely to prefer to edit the number "1 000 000.5" or "1,000,000.5". (To clarify the scope of the question: my code shouldn't have to know about the individual rules of this or that locale. Does any country use ! as a decimal point? I don't know, and I don't want to know.)
Modern Javascript provides the Intl.NumberFormat API, but it only seems to deal with producing numbers, not parsing them.
How can I parse a localized number?
The rules are going to be have to be somewhere, and a reasonable place is in your program or you can use an external library if it exists. To generally answer your question. A number is broken up into groups of three to represents thousands, although in some countries they break up into other groups; e.g. Japan they break up number into groups. Here is a script to break a number up into groups of threes with a given spacing system.
https://repl.it/Eu7I/2
I don't think it's possible without having pre-set the format and knowing what you are converting to/from.
For example, how can a function differentiate between 9,521 in the US and in France? In the US, that's over nine thousand, in France it's nine and a half (and a bit).
I'd recommend you keep a list of regex's for the different formats you will be displaying (and allowing in input) and use the appropriate one to parse the number when you read it in.
I would like to generate a unique number from string. The string is a combination of username and password. I would like to generate a unique number id (not string) from this combination. I first md5 the combination and then convert it to number. The number length needs to be 10. Any suggestions?
It would be best if you can provide more details about the third-party you're trying to interface with, because this is a very odd request and it contains a fundamental flaw. You ask for the number to be unique, but you are allowing for only 10 decimal ("number id") digits, or ~10 billion possible values.
This sounds like an awful lot but it's really not. This gives you a hash of just over 33 bits. The simple hash collision probability calculator at http://davidjohnstone.net/pages/hash-collision-probability puts this at a 44% chance of a collision at just 100,000 entries. But that assumes full usage of all the available input characters. Since username and password combinations are almost always limited to alphabetic and numeric characters, the real collision chance is much worse at far fewer entries (can't be calculated without knowing the characters you allow for these fields - but it's bad).
NodeJS provides numerous crypto functions in the crypto module. A whole set of hashing functions is available, including the ideal-case SHA* options. These can be used to provide safe, irreversible hashes with astronomically collision probabilities.
If these options are not usable for you, I would suggest you have a fundamental design flaw. You're almost certainly mapping a user/pass combination to a userID in a remote system in a way that an attacker would find easy to compromise with a simple brute-force attack, given the high collision risk in your model.
If you are doing what I think you are doing, the "right" way to do this would be to have a simple database on a server somewhere. The user/pass would be assigned a unique ID in there, and it doesn't matter what this is - it could be an auto-increment ID field in a single MySQL table. The server would then contact this remote service with the ID value for any API calls necessary, and return the results to the user. This eliminates the security risk because the username/password are not actually hashed, just stored, and can be checked 100% on every call.
Never use a hash as a primary data value. It's a simplification, not a real value on its own.
I am trying to develop a hybrid mobile app with QR code functionality. QR Code contains a limited number of character can be stored with it. So, I am thinking is it possible to compress the string to make it shorter so that I can store more info into the QR code?
At lengths that short, most compression algorithms will actually make data longer, not shorter. There are some algorithms which may work well, though… smaz comes to mind. However, it is going to depend heavily on what you are trying to compress, and you haven't really provided any information about that.
Instead of thinking about compression, your best bet may be to find an encoding scheme which makes more sense for your data. For example, if you're encoding a date and time, store it as a single number instead of text. Think about whether you really need seconds. If you are storing numbers, consider using variable-length quantities. If your data is JSON, consider using protobuf instead.
If what you have really is text, it may be worth considering coming up with your own character set. Instead of ASCII where each character 8 bits, can you limit yourself to 64 characters? a-z, A-Z, 0-9, and two punctuation characters is only 64 possible symbols… if that is all you need, you could use a 6-bit encoding. If the strings aren't case-sensitive you have tons of room for punctuation.
Once there was a search input.
It was responsible for filtering data in a table based on user input.
But this search input was special: it would not do anything unless a minimum of 3 characters was entered.
Not because it was lazy, but because it didn't make sense otherwise.
Everything was good until a new and strange (compared to English) language came to town.
It was Japanese and now the minimum string length of 3 was stupid and useless.
I lost the last few pages of that story. Does anyone remember how it ends?
In order to fix the issue, you obviously need to determine if user's input belongs to certain script(s). The most obvious way to do this is to use Unicode Regular Expressions:
var regexPattern = "[\\p{Katakana}\\p{Hiragana}\\p{Han}]+";
The only issue would be, that JavaScript does not support this kind of regular expressions out of the box. Anyway, you are lucky - there is a JS library called XRegExp and its Scripts add-on seems to exactly what you need. Now, the question is, whether you want to require at least three characters for non-Japanese or non-Chinese users, or do it otherwise - require at least three characters for certain scripts (Latin, Common, Cyrillic, Greek and Hebrew) while allowing any other to be searched on one character. I'd suggest the second solution:
if (XRegExp('[\\p{Latin}\\p{Common}\\p{Cyrillic}\\p{Greek}\\p{Hebrew}]+').test(input)) {
// test for string length and call AJAX if the string is long enough
} else {
// call AJAX search method
}
You might want to pre-compile the regular expression for better performance, but that's basically it.
I guess it mainly depends on where you get that min length variable from. If it's hardcoded, you'd probably better use a dynamic internationalization module:
int.getMinStringLength(int.getCurrentLanguage())
Either you have a dynamic bindings framework such as AngularJS, or you update that module when the user changes the language.
Now maybe you'd want to sort your supported languages by using grouping attributes such as "verbose" and "condensed".
I would like to write a JavaScript function that validates a zip code, by checking if the zip code actually exists. Here is a list of all zip codes:
http://www.census.gov/tiger/tms/gazetteer/zips.txt (I only care about the 2nd column)
This is really a compression problem. I would like to do this for fun. OK, now that's out of the way, here is a list of optimizations over a straight hashtable that I can think of, feel free to add anything I have not thought of:
Break zipcode into 2 parts, first 2 digits and last 3 digits.
Make a giant if-else statement first checking the first 2 digits, then checking ranges within the last 3 digits.
Or, covert the zips into hex, and see if I can do the same thing using smaller groups.
Find out if within the range of all valid zip codes there are more valid zip codes vs invalid zip codes. Write the above code targeting the smaller group.
Break up the hash into separate files, and load them via Ajax as user types in the zipcode. So perhaps break into 2 parts, first for first 2 digits, second for last 3.
Lastly, I plan to generate the JavaScript files using another program, not by hand.
Edit: performance matters here. I do want to use this, if it doesn't suck. Performance of the JavaScript code execution + download time.
Edit 2: JavaScript only solutions please. I don't have access to the application server, plus, that would make this into a whole other problem =)
You could do the unthinkable and treat the code as a number (remember that it's not actually a number). Convert your list into a series of ranges, for example:
zips = [10000, 10001, 10002, 10003, 23001, 23002, 23003, 36001]
// becomes
zips = [[10000,10003], [23001,23003], [36001,36001]]
// make sure to keep this sorted
then to test:
myzip = 23002;
for (i = 0, l = zips.length; i < l; ++i) {
if (myzip >= zips[i][0] && myzip <= zips[i][1]) {
return true;
}
}
return false;
this is just using a very naive linear search (O(n)). If you kept the list sorted and used binary searching, you could achieve O(log n).
I would like to write a JavaScript function that validates a zip code
Might be more effort than it's worth, keeping it updated so that at no point someone's real valid ZIP code is rejected. You could also try an external service, or do what everyone else does and just accept any 5-digit number!
here is a list of optimizations over a straight hashtable that I can think of
Sorry to spoil the potential Fun, but you're probably not going to manage much better actual performance than JavaScript's Object gives you when used as a hashtable. Object member access is one of the most common operations in JS and will be super-optimised; building your own data structures is unlikely to beat it even if they are potentially better structures from a computer science point of view. In particular, anything using ‘Array’ is not going to perform as well as you think because Array is actually implemented as an Object (hashtable) itself.
Having said that, a possible space compression tool if you only need to know 'valid or not' would be to use a 100000-bit bitfield, packed into a string. For example for a space of only 100 ZIP codes, where codes 032-043 are ‘valid’:
var zipfield= '\x00\x00\x00\x00\xFF\x0F\x00\x00\x00\x00\x00\x00\x00';
function isvalid(zip) {
if (!zip.match('[0-9]{3}'))
return false;
var z= parseInt(zip, 10);
return !!( zipfield.charCodeAt(Math.floor(z/8)) & (1<<(z%8)) );
}
Now we just have to work out the most efficient way to get the bitfield to the script. The naive '\x00'-filled version above is pretty inefficient. Conventional approaches to reducing that would be eg. to base64-encode it:
var zipfield= atob('AAAAAP8PAAAAAAAAAA==');
That would get the 100000 flags down to 16.6kB. Unfortunately atob is Mozilla-only, so an additional base64 decoder would be needed for other browsers. (It's not too hard, but it's a bit more startup time to decode.) It might also be possible to use an AJAX request to transfer a direct binary string (encoded in ISO-8859-1 text to responseText). That would get it down to 12.5kB.
But in reality probably anything, even the naive version, would do as long as you served the script using mod_deflate, which would compress away a lot of that redundancy, and also the repetition of '\x00' for all the long ranges of ‘invalid’ codes.
I use Google Maps API to check whether a zipcode exists.
It's more accurate.
Assuming you've got the zips in a sorted array (seems fair if you're controlling the generation of the datastructure), see if a simple binary search is fast enough.
So... You're doing client side validation and want to optimize for file size? you probably cannot beat general compression. Fortunately, most browsers support gzip for you, so you can use that much for free.
How about a simple json coded dict or list with the zip codes in sorted order and do a look up on the dict. it'll compress well, since its a predictable sequence, import easily since it's json, using the browsers in-built parser, and lookup will probably be very fast also, since that's a javascript primitive.
This might be useful:
PHP Zip Code Range and Distance Calculation
As well as List of postal codes.