I am working on a web app where I need to get the user's zip code and see if it matches one of the zips in csv file of 6000 zip codes (if yes, enter; else display error;) I was going to do this in sql and make a query with user's input but wanted to know if there is a better approach (speed and other suggestions). Javascript preferred. Thanks.
SQL's probably the easiest/fastest.
As a fun experiment, though, if you really wanted to do it in JavaScript, you could read in the file and create an index of sorts by splitting each number as a string into its characters, and then creating a multidimensional hash for the lookup.
You could then compare that performance to a simple Array.indexOf() call.
Those're probably reasonably fast but more work than needed, just go w/ the SQL. ;)
Related
I was working on a forum and thought of making a tag generator, something like Quora.com but simpler. So, first I "purified" the string – meaning removed some irrelevant words like "for", "in"...
But I couldn't figure out how to only get the nouns in the string. For example: In this thread's title "Is there a PHP or JS algorithm that can filter out the nouns on a string?" would give us:
PHP
JS
algorithm
nouns
string
This is more or less good and accurate. But I also don't want to use a noun-list because I don't want to waste half of my years writing it. I'll also be glad if you know any good noun-lists. Thank you.
You need a "lexical dictionary" (dictionary that maintains metadata about words and connections between them) like Princeton Wordnet. This is an english word semantic database you can use to query and compare things like nouns / verbs or even synonyms / hypernyms.
This obviously would run on your server. You would have to parse the strings on the server side (you could use Ajax if you want it to look like its on the client). There is no feasible way to maintain an entire english dictionary in browser memory, and to search through it, with anything resembling good performance.
I guess i have a noob question here.
I have a huge Json file (30MB) that i would like to parse with a Jquery web app.
Now, the ideal would be to load it into local storage, regex what i want and show the results.
I would like it to start showing the results as soon as i type (google style) but it looks like every attempt i've made the app just hangs.
If i reduce the Json file to 1 MB then it works.
Does anybody know how to do that? maybe with an example that i can see?
thanks a lot!
I would recommend you to not use this method for search. Because even if you will manage to make your search quicker it will take very long time to download 30MB file.
What you could do is convert your data from JSON to SQL and do search by sending AJAX calls to the server from your javascript code.
For further optimizations you can send AJAX calls only after search term's length exceeds certain number of characters probably 3.
My real world problem is: users of my mobile app type their city and I have to make sure it really exists, and that it is correctly written (caseinsensitive, so these are correct: New York, NEW york, new york. This is not correct: newyork)
There are online apis that work quite well (Google Geocode API for example) but:
After a very little amount of requests, you have to pay (2.500/day right now)
Users must be connected to the internet
That's why I tought that an offline-local solution would be better. There are many websites (like Maxmind) where you can download a list containing every city in the world. I could embed this huge txt/csv right inside my application and do a string search locally (it's a big file, ok, but not that big. It's just a onetime download of something like 30-40MB of uncompressed .txt)
I'm trying to avoid jQuery at all costs and I don't want to use any PHP/MySQL solutions (even if fulltext indexes could be handy), that's why I'm trying to do all this just using javascript.
Given a string as input, let's say "city3", what's the best/fastest way to check if it's inside an (external) huge list like:
city1,
city2,
city3,
city4,
[...]
After solving this (big) problem: if there are no exact matches, is there a way to search for the correct city without freezing the device for 10 minutes?
In the example before, lets say the user types "cit y3" or "cyty3" or "cìty3": can any js function tell him that he might be looking for "city3"? Is this kind of search too slow in this scenario?
Thanks
If speed is an issue then I would recommend loading the data into a JavaScript object and performing an in-memory search rather than repeatedly scanning a big blob of text in a file.
Try formatting the data into JSON with the city names as keys, that will give you good search performance.
A Workaround is creating a Database either SQL either noSQL, and Query this database through your JavaScript Code, using jquery Json functions.
Using a SQL Database ideal would be either MySQL either MariaDB An enhanced, drop-in replacement for MySQL.
In this solution you will probably need a Backend such as PHP to fetch the data from your Database convert them to JSON Format, and then get them through your JavaScript using jQUery Library , with the $.getJSON function
Using a noSQL Database ideal would be MongoDB.
In this solution you can fetch your data directly from javascript, also with the $.getJSON function.
Example for MongoDB Provided Here
if you dont want to use database i think you can do this:
-first , instead use one big file split it into several files. (you can write a script for this and use it just one time for split the big file). in each file put cities that starts with (example) aa , second file cityes that starts with ab.
-then for each city check first letters and then search inside that file.
For example if you need to search for city "Ahmedabad" it will search only in the files with cities that starts with Ah. Probably this is not the best solution ,at the end you got 421 file instead 1 , but reasearch will be faster.
I have a problem I'd like to solve to not have to spend a lot of manual work to analyze as an alternative.
I have 2 JSON objects (returned from different web service API or HTTP responses). There is intersecting data between the 2 JSON objects, and they share similar JSON structure, but not identical. One JSON (the smaller one) is like a subset of the bigger JSON object.
I want to find all the interesecting data between the two objects. Actually, I'm more interested in the shared parameters/properties within the object, not really the actual values of the parameters/properties of each object. Because I want to eventually use data from one JSON output to construct the other JSON as input to an API call. Unfortunately, I don't have the documentation that defines the JSON for each API. :(
What makes this tougher is the JSON objects are huge. One spans a page if you print it out via Windows Notepad. The other spans 37 pages. The APIs return the JSON output compressed as a single line. Normal text compare doesn't do much, I'd have to reformat manually or w/ script to break up object w/ newlines, etc. for a text compare to work well. Tried with Beyond Compare tool.
I could do manual search/grep but that's a pain to cycle through all the parameters inside the smaller JSON. Could write code to do it but I'd also have to spend time to do that, and test if the code works also. Or maybe there's some ready made code already for that...
Or can look for JSON diff type tools. Searched for some. Came across these:
https://github.com/samsonjs/json-diff or https://tlrobinson.net/projects/javascript-fun/jsondiff
https://github.com/andreyvit/json-diff
both failed to do what I wanted. Presumably the JSON is either too complex or too large to process.
Any thoughts on best solution? Or might the best solution for now be manual analysis w/ grep for each parameter/property?
In terms of a code solution, any language will do. I just need a parser or diff tool that will do what I want.
Sorry, can't share the JSON data structure with you either, it may be considered confidential.
Beyond Compare works well, if you set up a JSON file format in it to use Python to pretty-print the JSON. Sample setup for Windows:
Install Python 2.7.
In Beyond Compare, go under Tools, under File Formats.
Click New. Choose Text Format. Enter "JSON" as a name.
Under the General tab:
Mask: *.json
Under the Conversion tab:
Conversion: External program (Unicode filenames)
Loading: c:\Python27\python.exe -m json.tool %s %t
Note, that second parameter in the command line must be %t, if you enter two %ss you will suffer data loss.
Click Save.
Jeremy Simmons has created a better File Format package Posted on forum: "JsonFileFormat.bcpkg" for BEYOND COMPARE that does not require python or so to be installed.
Just download the file and open it with BC and you are good to go. So, its much more simpler.
JSON File Format
I needed a file format for JSON files.
I wanted to pretty-print & sort my JSON to make comparison easy.
I have attached my bcpackage with my completed JSON File Format.
The formatting is done via jq - http://stedolan.github.io/jq/
Props to
Stephen Dolan for the utility https://github.com/stedolan.
I have sent a message to the folks at Scooter Software asking them to
include it in the page with additional formats.
If you're interested in seeing it on there, I'm sure a quick reply to
the thread with an up-vote would help them see the value posting it.
Attached Files Attached Files File Type: bcpkg JsonFileFormat.bcpkg
(449.8 KB, 58 views)
I have a small GPL project that would do the trick for simple JSON. I have not added support for nested entities as it is more of a simple ObjectDB solution and not actually JSON (Despite the fact it was clearly inspired by it.
Long and short the API is pretty simple. Make a new group, populate it, and then pull a subset via whatever logical parameters you need.
https://github.com/danielbchapman/groups
The API is used basically like ->
SubGroup items = group
.notEqual("field", "value")
.lessThan("field2", 50); //...etc...
There's actually support for basic unions and joins which would do pretty much what you want.
Long and short you probably want a Set as your data-type. Considering your comparisons are probably complex you need a more complex set of methods.
My only caution is that it is GPL. If your data is confidential, odds are you may not be interested in that license.
I would like to write a JavaScript function that validates a zip code, by checking if the zip code actually exists. Here is a list of all zip codes:
http://www.census.gov/tiger/tms/gazetteer/zips.txt (I only care about the 2nd column)
This is really a compression problem. I would like to do this for fun. OK, now that's out of the way, here is a list of optimizations over a straight hashtable that I can think of, feel free to add anything I have not thought of:
Break zipcode into 2 parts, first 2 digits and last 3 digits.
Make a giant if-else statement first checking the first 2 digits, then checking ranges within the last 3 digits.
Or, covert the zips into hex, and see if I can do the same thing using smaller groups.
Find out if within the range of all valid zip codes there are more valid zip codes vs invalid zip codes. Write the above code targeting the smaller group.
Break up the hash into separate files, and load them via Ajax as user types in the zipcode. So perhaps break into 2 parts, first for first 2 digits, second for last 3.
Lastly, I plan to generate the JavaScript files using another program, not by hand.
Edit: performance matters here. I do want to use this, if it doesn't suck. Performance of the JavaScript code execution + download time.
Edit 2: JavaScript only solutions please. I don't have access to the application server, plus, that would make this into a whole other problem =)
You could do the unthinkable and treat the code as a number (remember that it's not actually a number). Convert your list into a series of ranges, for example:
zips = [10000, 10001, 10002, 10003, 23001, 23002, 23003, 36001]
// becomes
zips = [[10000,10003], [23001,23003], [36001,36001]]
// make sure to keep this sorted
then to test:
myzip = 23002;
for (i = 0, l = zips.length; i < l; ++i) {
if (myzip >= zips[i][0] && myzip <= zips[i][1]) {
return true;
}
}
return false;
this is just using a very naive linear search (O(n)). If you kept the list sorted and used binary searching, you could achieve O(log n).
I would like to write a JavaScript function that validates a zip code
Might be more effort than it's worth, keeping it updated so that at no point someone's real valid ZIP code is rejected. You could also try an external service, or do what everyone else does and just accept any 5-digit number!
here is a list of optimizations over a straight hashtable that I can think of
Sorry to spoil the potential Fun, but you're probably not going to manage much better actual performance than JavaScript's Object gives you when used as a hashtable. Object member access is one of the most common operations in JS and will be super-optimised; building your own data structures is unlikely to beat it even if they are potentially better structures from a computer science point of view. In particular, anything using ‘Array’ is not going to perform as well as you think because Array is actually implemented as an Object (hashtable) itself.
Having said that, a possible space compression tool if you only need to know 'valid or not' would be to use a 100000-bit bitfield, packed into a string. For example for a space of only 100 ZIP codes, where codes 032-043 are ‘valid’:
var zipfield= '\x00\x00\x00\x00\xFF\x0F\x00\x00\x00\x00\x00\x00\x00';
function isvalid(zip) {
if (!zip.match('[0-9]{3}'))
return false;
var z= parseInt(zip, 10);
return !!( zipfield.charCodeAt(Math.floor(z/8)) & (1<<(z%8)) );
}
Now we just have to work out the most efficient way to get the bitfield to the script. The naive '\x00'-filled version above is pretty inefficient. Conventional approaches to reducing that would be eg. to base64-encode it:
var zipfield= atob('AAAAAP8PAAAAAAAAAA==');
That would get the 100000 flags down to 16.6kB. Unfortunately atob is Mozilla-only, so an additional base64 decoder would be needed for other browsers. (It's not too hard, but it's a bit more startup time to decode.) It might also be possible to use an AJAX request to transfer a direct binary string (encoded in ISO-8859-1 text to responseText). That would get it down to 12.5kB.
But in reality probably anything, even the naive version, would do as long as you served the script using mod_deflate, which would compress away a lot of that redundancy, and also the repetition of '\x00' for all the long ranges of ‘invalid’ codes.
I use Google Maps API to check whether a zipcode exists.
It's more accurate.
Assuming you've got the zips in a sorted array (seems fair if you're controlling the generation of the datastructure), see if a simple binary search is fast enough.
So... You're doing client side validation and want to optimize for file size? you probably cannot beat general compression. Fortunately, most browsers support gzip for you, so you can use that much for free.
How about a simple json coded dict or list with the zip codes in sorted order and do a look up on the dict. it'll compress well, since its a predictable sequence, import easily since it's json, using the browsers in-built parser, and lookup will probably be very fast also, since that's a javascript primitive.
This might be useful:
PHP Zip Code Range and Distance Calculation
As well as List of postal codes.