Writing a JavaScript zip code validation function - javascript

I would like to write a JavaScript function that validates a zip code, by checking if the zip code actually exists. Here is a list of all zip codes:
http://www.census.gov/tiger/tms/gazetteer/zips.txt (I only care about the 2nd column)
This is really a compression problem. I would like to do this for fun. OK, now that's out of the way, here is a list of optimizations over a straight hashtable that I can think of, feel free to add anything I have not thought of:
Break zipcode into 2 parts, first 2 digits and last 3 digits.
Make a giant if-else statement first checking the first 2 digits, then checking ranges within the last 3 digits.
Or, covert the zips into hex, and see if I can do the same thing using smaller groups.
Find out if within the range of all valid zip codes there are more valid zip codes vs invalid zip codes. Write the above code targeting the smaller group.
Break up the hash into separate files, and load them via Ajax as user types in the zipcode. So perhaps break into 2 parts, first for first 2 digits, second for last 3.
Lastly, I plan to generate the JavaScript files using another program, not by hand.
Edit: performance matters here. I do want to use this, if it doesn't suck. Performance of the JavaScript code execution + download time.
Edit 2: JavaScript only solutions please. I don't have access to the application server, plus, that would make this into a whole other problem =)

You could do the unthinkable and treat the code as a number (remember that it's not actually a number). Convert your list into a series of ranges, for example:
zips = [10000, 10001, 10002, 10003, 23001, 23002, 23003, 36001]
// becomes
zips = [[10000,10003], [23001,23003], [36001,36001]]
// make sure to keep this sorted
then to test:
myzip = 23002;
for (i = 0, l = zips.length; i < l; ++i) {
if (myzip >= zips[i][0] && myzip <= zips[i][1]) {
return true;
}
}
return false;
this is just using a very naive linear search (O(n)). If you kept the list sorted and used binary searching, you could achieve O(log n).

I would like to write a JavaScript function that validates a zip code
Might be more effort than it's worth, keeping it updated so that at no point someone's real valid ZIP code is rejected. You could also try an external service, or do what everyone else does and just accept any 5-digit number!
here is a list of optimizations over a straight hashtable that I can think of
Sorry to spoil the potential Fun, but you're probably not going to manage much better actual performance than JavaScript's Object gives you when used as a hashtable. Object member access is one of the most common operations in JS and will be super-optimised; building your own data structures is unlikely to beat it even if they are potentially better structures from a computer science point of view. In particular, anything using ‘Array’ is not going to perform as well as you think because Array is actually implemented as an Object (hashtable) itself.
Having said that, a possible space compression tool if you only need to know 'valid or not' would be to use a 100000-bit bitfield, packed into a string. For example for a space of only 100 ZIP codes, where codes 032-043 are ‘valid’:
var zipfield= '\x00\x00\x00\x00\xFF\x0F\x00\x00\x00\x00\x00\x00\x00';
function isvalid(zip) {
if (!zip.match('[0-9]{3}'))
return false;
var z= parseInt(zip, 10);
return !!( zipfield.charCodeAt(Math.floor(z/8)) & (1<<(z%8)) );
}
Now we just have to work out the most efficient way to get the bitfield to the script. The naive '\x00'-filled version above is pretty inefficient. Conventional approaches to reducing that would be eg. to base64-encode it:
var zipfield= atob('AAAAAP8PAAAAAAAAAA==');
That would get the 100000 flags down to 16.6kB. Unfortunately atob is Mozilla-only, so an additional base64 decoder would be needed for other browsers. (It's not too hard, but it's a bit more startup time to decode.) It might also be possible to use an AJAX request to transfer a direct binary string (encoded in ISO-8859-1 text to responseText). That would get it down to 12.5kB.
But in reality probably anything, even the naive version, would do as long as you served the script using mod_deflate, which would compress away a lot of that redundancy, and also the repetition of '\x00' for all the long ranges of ‘invalid’ codes.

I use Google Maps API to check whether a zipcode exists.
It's more accurate.

Assuming you've got the zips in a sorted array (seems fair if you're controlling the generation of the datastructure), see if a simple binary search is fast enough.

So... You're doing client side validation and want to optimize for file size? you probably cannot beat general compression. Fortunately, most browsers support gzip for you, so you can use that much for free.
How about a simple json coded dict or list with the zip codes in sorted order and do a look up on the dict. it'll compress well, since its a predictable sequence, import easily since it's json, using the browsers in-built parser, and lookup will probably be very fast also, since that's a javascript primitive.

This might be useful:
PHP Zip Code Range and Distance Calculation
As well as List of postal codes.

Related

Building large strings in JavaScript ; Is join method most efficient?

In writing a database to disk as a text file of JSON strings, I've been experimenting with how to most efficiently build the string of text that is ultimately converted to a blob for download to disk.
There a number of questions that state to not concatenate a string with the + operator in a loop, but instead write the component strings to an array and then use the join method to build one large string.
The best explanation I came across explaining why can be found here, by Jeol Mueller:
In JavaScript (and C# for that matter) strings are immutable. They can never be changed, only replaced with other strings. You're probably aware that combined + "hello " doesn't directly modify the combined variable - the operation creates a new string that is the result of concatenating the two strings together, but you must then assign that new string to the combined variable if you want it to be changed.
So what this loop is doing is creating a million different string objects, and throwing away 999,999 of them. Creating that many strings that are continually growing in size is not fast, and now the garbage collector has a lot of work to do to clean up after this."
The thread here, was also helpful.
However, using the join method didn't allow me to build the string I was aiming for without getting the error:
allocation size overflow
I was trying to write 50,000 JSON strings from a database into one text file, which simply may have been too large no matter what. I think it was reaching over 350MB. I was just testing the limit of my application and picked something far larger than a user of the application will likely ever create. So, this test case was likely unreasonable.
Nonetheless, this leaves me with three questions about working with large strings.
For the same amount of data overall, does altering the number of array elements joined in a single join operation affect the efficiency in terms of not hitting an allocation size overflow?
For example, I tried writing the JSON strings to a pseudo 3-D array of 100 (and then 50) elements per dimension; and then looped through the outer two dimensions joining them together. 100^3 = 1,000,000 or 50^3 = 125,000 both provide more than enough entries to hold the 50,000 JSON strings. I know I'm not including the 0 index, here.
So, the 50,000 strings were held in an array from a[1][1][1] to a5[100][100] in the first attempt and of a[1][1][1] to a[20][50][50] in the second attempt. If the dimensions are i, j, k from outer to inner, I joined all the k elements in each a[i][j]; and then joined all of those i x j joins, and lastly all of these i joins into the final text string.
All attemtps still hit the allocation size overflow before completing.
So, is there any difference between joining 50,000 smaller strings in one join versus 50 larger strings, if the total data is the same?
Is there a better, more efficient way to build large strings than the join method?
Does the same principle described by Joel Mueller regarding string concatenation apply to reducing a string through substring, such as string = string.substring(position)?
The context of this third question is that when I read a text file in as a string and break it down into its component JSON strings before writing to the database, I use an array that is map of the file layout; so, I know the length of each JSON string in advance and repeat three statements inside a loop:
l = map[i].l;
str = text.substring(0,l);
text = text.substring(l).
It would appear that since strings are immutable, this sort of reverse of concatenation step is as inefficient as using the + operator to concatenate.
Would it be more efficient to not delete the str from text each iteration, and just keep track of the increasing start and end positions for the substrings as step through the loop reading the entire text string?
Response to message about duplicate question
I got a message, I guess from the stackoverflow system itself, asking me to edit my question explaining why it is different from the proposed duplicate.
Reasons are:
The proposed duplicate asks specifically and exclusively about the maximum size of a single string. None of the three bolded questions, here, asks about the maximum size of a single string, although that is useful to know.
This question asks about the most efficient way of building large strings and that isn't addressed in the answers found in the proposed duplicate, apart from an efficent way of building a large test string. They don't address how to build a realistic string, comprised of actual application data.
This question provides a couple links to some information concerning the efficiency of building large strings that may be helpful to those interested in more than the maximum size alone.
This question also has a specific context of why the large string was being built, which led to some suggestions about how to handle that situation in a more efficient manner. Although, in the strictest sense, they don't specifically address the question by title, they do address the broader context of the question as presented, which is how to deal with the large strings, even if that means ways to work around them. Someone searching on this same topic might find real help in these suggestions that is not provided in the proposed duplicate.
So, although the proposed duplicate is somewhat helpful, it doesn't appear to be anywhere near a genuine duplicate of this question in its full context.
Additional Information
This doesn't answer the question concerning the most efficient way to build a large string, but it refers to the comments about how to get around the string size limit.
Converting each component string to a blob and holding them in an array, and then converting the array of blobs into a single blob, accomplished this. I don't know what the size limit of a single blob is, but did see 800MB in another question.
A process (or starting point) for creating the blob to write the database to disk and then to read it back in again can be found here.
Regarding the idea of writing the blobs or strings to disk as they are generated on the client as opposed to generating one giant string or blob for download, although the most logical and efficient method, may not be possible in the scenario presented here of an offline application.
According to this question, web extensions no longer have access to the privileged javascript code necessary to accomplish this through the File API.
I asked this question related to the Streams API write stream method and something called StreamSaver.
In writing a database to disk as a text file of JSON strings.
I see no reason to store the data in a string or array of strings in this case. Instead you can write the data directly to the file.
In the simplest case you can write each string to the file separately.
To get better performance, you could first write some data to a smaller buffer, and then write that buffer to disk when it's full.
For best performance you could create a file of a certain size and create a memory mapping over that file. Then write/copy the data directly to the mapped memory (which is your file). The trick would be to know or guess the size up front, or you could resize the file when needed and then remap the file.
Joining or growing strings will trigger a lot of memory (re)allocations, which is unnecessary overhead in this case.
I don't want the user to have to download more than one file
If the goal is to let a user download that generated file, you could even do better by streaming those strings directly to the user without even creating a file. This also has the advantage that the user starts receiving data immediately instead of first having to wait till the whole file is generated.
Because the file size is not known up front, you could use chunked transfer encoding.

In Javascript, why define an array with split?

I frequently see code where people define a populated array using the split method, like this:
var colors = "red,green,blue".split(',');
How does this differ from:
var colors = ["red","green","blue"];
Is it simply to avoid having to quote each value?
Splitting a string is a bad way of creating an array. There several issues with the approach that include performance, stability and memory consumption. It requires CPU time to parse the string, it is prone to errors (double commas, spaces in the string, etc.) and means your script essentially has to store twice as much data in memory.
It's not a good idea and is most likely just a bad habit someone picked up when they first learned about strings and arrays. That or they're trying to be clever for some kind of coding exercise.
As a rule of thumb, the only time you should be parsing strings into arrays is if you're reading that string data from an external source and need to convert it to native types. If you already know the values ahead of time, you should create the array yourself.
The one possible reason someone might do this is to reduce the number of characters in their source code, trading performance for bandwidth. 'a,b,c,d,e,f,g'.split(',') is fewer characters than ['a','b','c','d','e','f','g'].
There is no difference, it's just bad practice and laziness if anything. The only reason I could think of using the first approach is if the data naturally came in string form and using an array literal made it completely unreadable.

Options for dynamic code generation

I have a (hypothetical) question and I think the solution would be to dynamically generate code.
I want to quickly evaluate an arbitrary mathematical function that a user has entered, say to find the sum i=1 to N of i^3+2i^2+6i+1. N is arbitrary and i^3+2i^2+6i+1 is arbitrary too (it need not be a polynomial, and it might contain trigonometric functions and other functions too). Suppose N can be very large. I want to know how I can evaluate the answer quickly, assuming that I have already parsed the user input to some bytecode or something else my program can understand.
If possible, I would also like my code to be easily compiled and run on different operating systems (including mobile).
I have thought of a few ways:
1) Write an interpreter that interprets and executes each command in my bytecode. This makes me free to use any language, but it's slow.
2) Write in Java/C# and use the dynamic code generation (e.g. Is it possible to dynamically compile and execute C# code fragments?). This would execute as as fast as if I had written the function directly in my source code, with a only a slight slowdown as C#/Java are both JIT-compiled to machine code. The limitation is that Java isn't widely supported on mobile, and C# is Windows-only.
3) Embed an assembler/C++ compiler/compiler for whatever compiled language that I use. The limitation is that it won't work on mobile either - it won't let me execute a data file.
4) Write HTML/Javascript then embed it in a web browser control and put it in an application (I think this is the way some people use to make a universal app that would run anywhere). But it's slow too and writing real applications in Javascript is a pain.
Which option do you think is most suitable? Or perhaps I should go with a mix, maybe my application code will create and execute a generated Javascript function?
The fastest and simplest way to perform these calculations on large values of N are with raw maths instead of repeated summation.
Here's a formula to calculate each individual item in the expression, perform this for all items in the expression and you are done:
H[n] is the nth Harmonic number.
There are multiple approaches to calculating H[n]. Some calculate the largest required number and generate all up to that number, saving any other values required...
Alternately store every 10,000th item in the series in a file and calculate H[n] from the nearest entry.

Javascript How to pass an bit array nicely as url parameter (angular js)

from the coding point of view I'm looking for way to pass an bit array as url parameter in a small javascript application and I want to keep the parameter length as small as possible. I have an idea how to solve it, but that seems rather complicated for a problem that seems like a common one to me.
The context is the following: I'm currently trying out angularjs and wrote a simple app that is a bit like a shopping cart / configurator. I have a fairly small number of objects (>1024) and each object has a small number of configuration options (less than 6). Someone using the app might pick ~20 of these objects in a typical scenario and configure these with the tool. I'd like to make that configuration 'bookmarkable'. The App stores this configuration as an array.
I could jsonify that array I guess and pass it as url parameter, but this would be horribly long. As indicated above - all I need for each object would be 2 bytes (10 bit for the object ID and 6 bit for its configuration). Writing a function that translates my configuration array into a bit array shouldn't be difficult. If I'm not mistaken im restricted to 32 bit integers in javascript, but that shouldn't be too much trouble either. Eventually I could translate that bitarray into an integer version and use this as url parameter, but this would still a fairly long parameter and my guess is that I could make it much shorter if I'd not only use numbers in the url parameter, but as well letters.
Since this problem more or less sounds to me like a fairly common one, I'd be very happy if you could give me snippets or hints where I could look to solve the problem. If there are more effecient ways than my idea to use an array of bit arrays that's great too. I just want to avoid creating a really clumsy solution if there should be an elegant best practice version so to speak. The whole point is to get a bit more familar with javascript / angularjs.
Cheers
When representing the bytes as letters, you probably will need percent encoding anyway. However, you can simply use String.fromCharCode before:
> encodeURIComponent(String.fromCharCode.apply(String, [2, 10, 20, 30, 245]))
%02%0A%14%1E%C3%B5
I have a fairly small number of objects and each object has a small number of configuration options ...
For me, the model is too complicated to put all of this on URL rails.
I would create some Json object , store it into cache / localStorage with unique ID and pass in URL ID only.
On receive side, extract all stored data by ID you passed with URL.
By this way t will be easy to maintain the code and debug in case of failure.
You can try converting your numbers to hex:
> [2,10,20,30,245].map(function(x) { return (x < 16 ? "0" : "") + x.toString(16) }).join("")
"020a141ef5"
and vice versa:
> "020a141ef5".match(/../g).map(function(x) { return parseInt(x, 16) })
[2, 10, 20, 30, 245]
This isn't the most compact coding, but the easiest one to implement.

Zip code check from the csv file

I am working on a web app where I need to get the user's zip code and see if it matches one of the zips in csv file of 6000 zip codes (if yes, enter; else display error;) I was going to do this in sql and make a query with user's input but wanted to know if there is a better approach (speed and other suggestions). Javascript preferred. Thanks.
SQL's probably the easiest/fastest.
As a fun experiment, though, if you really wanted to do it in JavaScript, you could read in the file and create an index of sorts by splitting each number as a string into its characters, and then creating a multidimensional hash for the lookup.
You could then compare that performance to a simple Array.indexOf() call.
Those're probably reasonably fast but more work than needed, just go w/ the SQL. ;)

Categories