Evaluating JSON from popular auto-suggests

Evaluating JSON from popular auto-suggests - javascript

I was evaluating the various ways in which the big guys implement auto suggest. These are my observations.(Search string used was "ab") Questions towards the end.
Yahoo tries something like this, where the response was a JSONP. Response is readable and serves the purpose.
Yahoo's response
yasearch({"q":"ab ","gprid":"Y435dN7TRFqnYqQhnBueJA","f":["k","m"],"r":[["ab de villiers",0],["ab exercises",0],["ab king pro",0],
["ab infi-net internet banking",0],["ab mujhe raat din",0],["ab workouts",0],["ab mp3",0],["ab meri bari",0],["ab ke baras",0],["ab meri baari",0]]})
Bing had a similar approach. Returns an "if" block, sa_inst.apiCB() seems to be a function which would process the JSON. Again response is readable and legit.
Bing's response
if(typeof sa_inst.apiCB == 'function') sa_inst.apiCB({"AS":{"Query":"ab","FullResults":1,"Results":[{"Type":"AS","Suggests":[{"Txt":"ab<strong>p</strong> <strong>news</strong>","Type":"AS","Sk":""},
{"Txt":"ab<strong>bottapp</strong>.ab<strong>bott</strong>.<strong>in</strong>","Type":"AS","Sk":"AS1"},{"Txt":"ab<strong>t</strong> <strong>travels</strong>","Type":"AS","Sk":"AS2"},{"Txt":"ab<strong>p</strong> <strong>ananda</strong>","Type":"AS","Sk":"AS3"},
{"Txt":"ab<strong>hibus</strong>","Type":"AS","Sk":"AS4"},{"Txt":"ab<strong>p</strong> <strong>maza</strong>","Type":"AS","Sk":"AS5"},{"Txt":"ab<strong>b</strong>","Type":"AS","Sk":"AS6"},
{"Txt":"ab<strong>outgoogle</strong>","Type":"AS","Sk":"AS7"}]}]}} /* pageview_candidate */);
Now comes Google. Response is sent as 2 JSON objects(separated by /""/). Most of it is unreadable.
Google's response
{e:"XteVUYKqDoKHrAfdz4D4Aw",c:0,u:"https://www.google.com/s?hl\x3den\x26gs_rn\x3d14\x26gs_ri\x3dpsy-ab\x26tok\x3dvsobDhICRmdcnY7ayKTGng\x26cp\x3d2\x26gs_id\x3dd\x26xhr\x3dt\x26q\x3dab\x26es_nrs\x3dtrue\x26pf\x3dp\x26output\x3dsearch\x26sclient
\x3dpsy-ab\x26oq\x3d\x26gs_l\x3d\x26pbx\x3d1\x26bav\x3don.2,or.r_cp.r_qf.\x26bvm\x3dbv.46751780,d.bmk\x26fp\x3d2647af89de6b6c61\x26biw\x3d1366\x26bih\x3d453\x26tch\x3d1\x26ech\x3d2\x26psi\x3dVteVUcYuzOGsB6LpgdgB.1368774484351.1",
p:true,d:"[\x22ab\x22,[[\x22ab\\u003Cb\\u003Ec\\u003C\\/b\\u003E\x22,0,[]],[\x22ab\\u003Cb\\u003Ec news\\u003C\\/b\\u003E\x22,0,[]],[\x22ab\\u003Cb\\u003Eercrombie\\u003C\\/b\\u003E\x22,0,[]],[\x22ab\\u003Cb\\u003Ecya\\u003C\\/b\\u003E\x22,0,[]]],
{\x22j\x22:\x22d\x22,\x22q\x22:\x22t8z6h8KhWvbkEX6xablxgYxDUq4\x22,\x22t\x22:
{\x22bpc\x22:false,\x22tlw\x22:false}}]"}
/*""*/{e:"XteVUYKqDoKHrAfdz4D4Aw",c:-1,u:"https://www.google.com/searchdata?hl\x3den\x26gs_rn\x3d14\x26gs_ri\x3dpsy-ab\x26tok\x3dvsobDhICRmdcnY7ayKTGng\x26cp\x3d2\x26gs_id\x3dd\x26xhr\x3dt\x26q\x3dab\x26es_nrs\x3dtrue
\x26pf\x3dp\x26output\x3dsearch\x26sclient\x3dpsy-ab\x26oq\x3d\x26gs_l\x3d\x26pbx\x3d1\x26bav\x3don.2,or.r_cp.r_qf.\x26bvm\x3dbv.46751780,d.bmk\x26fp\x3d2647af89de6b6c61\x26biw\x3d1366\x26bih\x3d453\x26tch\x3d1\x26ech\x3d2\x26psi\x3dVteVUcYuzOGsB6LpgdgB.1368774484351.1",
p:true,d:"{\x22snp\x22:1}"}/*""*/
Are those hex codes or what do you call them?
Why is there a need for 2 objects to be returned?
What is the need for encoding the JSON?
Which is the ideal format for JSON among these three?
Any thoughts on this are welcome.

Are those hex codes or what do you call them?
Assuming you are referring to Google's response, I believe they are;
the \x00 format represents hex values
the \u0000 format represents unicode values
Why is there a need for 2 objects to be returned?
Not sure, one guess is that one of those objects is a list of recommendations, and ther other is a list of previous searches you have made.
What is the need for encoding the JSON?
Basically, these guys would have needed a way of delivering structured data from their server to the client. Only two real possibilities come to mind; XML and JSON. In this case, JSON would probably always be the winner as it uses less bandwidth and can be dealt with in Javascript easier.
Which is the ideal format for JSON among these three?
This is only opinion, but following from above, the ideal situation is the least amount of data, so based on that alone, I think Yahoo wins.

Related

When exchanging data between a browser and a server, the data can only be text. Why?

I understood why we are using JSON to exchange data between browser and server in place of XML but I could not understand why we are using only string type of JSON even we have six different value datatype, I mean why we can't use integer or Boolean or any other value datatype.
Hope you guys understand what I'm trying to say, Thanks in advance.

If I understand correctly, the limitation is because of the way data needs to be encoded to be sent over HTTP and ultimately over the wire. You json object (or xml,etc) is ultimately just a payload for HTTP (which is just a payload for tcp in turn and so on).
HTTP inherently does not and should not identify data types in payload, it is just an array for HTTP. You can select how to represent this array i.e. how to encode it; It can be string (ascii, utf-8, etc) or binary but it has to be uniform for the whole payload.
HTTP does offer different encoding methods of payload which can be interpreted by the receiver by looking at the content-type header and accordingly decode the data.
Hope this helps.

why we are using only string type of JSON
Uhm, we're not. I believe you're misunderstanding something here. HTTP responses can really contain anything; every time you download a PDF or an image from a web server, the web server is sending a binary payload, which can literally be anything. So it's not even true that all HTTP bodies must be text.
To exchange data between systems, you send bytes. For these bytes to mean anything, you need an encoding scheme. Image formats have a particular way in which bytes need to be arranged, and when properly doing so, you can send pictures with them. Same for PDFs, video, audio, and anything else (including text).
If you want to send structured data, you need to express that structure somehow. How do you send a, for example, PHP array over HTTP…? (Substitute your equivalent list data structure in your language of choice.) You can't. A PHP array is a specific data structure in memory of a PHP runtime, sending that as is over HTTP has no meaning (because it deals with internal pointers and such). This array needs to be serialised first. There are many possible serialisation methods, some of them using binary data, and some using formats which are human readable to varying degrees. You could simply join all array elements with commas and .split(',') them again on the other end, but that's rather simplistic and misses many more complex cases and edge cases.
JSON and XML (and YAML and whatnot) are human readable formats which can serialise data structures like arrays (and dictionaries and numbers and booleans etc), and which happen to be text-based (purposely, to make them developer-friendly). You can use any of those data types JSON allows. Nothing prevents you from doing so, and not using them is insane. JSON and XML also happen to be two formats easily parsed with tools built into every browser. You could use any other binary format too, but then you'd have to manually parse it in Javascript.

Communication between browser and server can be done in many ways. It's just that JSON comes out of the box. You can use protobuf, xml and plethora of other data serialization techniques to communicate with the server as long as both sides understand what's the communication medium. On the browser side, you have to probably implement protobuf, xml etc serialization/deserialization on your own in javascript.
Any valid JSON is permitted for data exchange. The keys are string quoted but the values can be strings, numbers, booleans, array or other objects itself. Though before transmission, everything is converted into a string and the receiving side parses it into the correct format.

JSON diff of large JSON data, finding some JSON as a subset of another JSON

I have a problem I'd like to solve to not have to spend a lot of manual work to analyze as an alternative.
I have 2 JSON objects (returned from different web service API or HTTP responses). There is intersecting data between the 2 JSON objects, and they share similar JSON structure, but not identical. One JSON (the smaller one) is like a subset of the bigger JSON object.
I want to find all the interesecting data between the two objects. Actually, I'm more interested in the shared parameters/properties within the object, not really the actual values of the parameters/properties of each object. Because I want to eventually use data from one JSON output to construct the other JSON as input to an API call. Unfortunately, I don't have the documentation that defines the JSON for each API. :(
What makes this tougher is the JSON objects are huge. One spans a page if you print it out via Windows Notepad. The other spans 37 pages. The APIs return the JSON output compressed as a single line. Normal text compare doesn't do much, I'd have to reformat manually or w/ script to break up object w/ newlines, etc. for a text compare to work well. Tried with Beyond Compare tool.
I could do manual search/grep but that's a pain to cycle through all the parameters inside the smaller JSON. Could write code to do it but I'd also have to spend time to do that, and test if the code works also. Or maybe there's some ready made code already for that...
Or can look for JSON diff type tools. Searched for some. Came across these:
https://github.com/samsonjs/json-diff or https://tlrobinson.net/projects/javascript-fun/jsondiff
https://github.com/andreyvit/json-diff
both failed to do what I wanted. Presumably the JSON is either too complex or too large to process.
Any thoughts on best solution? Or might the best solution for now be manual analysis w/ grep for each parameter/property?
In terms of a code solution, any language will do. I just need a parser or diff tool that will do what I want.
Sorry, can't share the JSON data structure with you either, it may be considered confidential.

Beyond Compare works well, if you set up a JSON file format in it to use Python to pretty-print the JSON. Sample setup for Windows:
Install Python 2.7.
In Beyond Compare, go under Tools, under File Formats.
Click New. Choose Text Format. Enter "JSON" as a name.
Under the General tab:
Mask: *.json
Under the Conversion tab:
Conversion: External program (Unicode filenames)
Loading: c:\Python27\python.exe -m json.tool %s %t
Note, that second parameter in the command line must be %t, if you enter two %ss you will suffer data loss.
Click Save.

Jeremy Simmons has created a better File Format package Posted on forum: "JsonFileFormat.bcpkg" for BEYOND COMPARE that does not require python or so to be installed.
Just download the file and open it with BC and you are good to go. So, its much more simpler.
JSON File Format
I needed a file format for JSON files.
I wanted to pretty-print & sort my JSON to make comparison easy.
I have attached my bcpackage with my completed JSON File Format.
The formatting is done via jq - http://stedolan.github.io/jq/
Props to
Stephen Dolan for the utility https://github.com/stedolan.
I have sent a message to the folks at Scooter Software asking them to
include it in the page with additional formats.
If you're interested in seeing it on there, I'm sure a quick reply to
the thread with an up-vote would help them see the value posting it.
Attached Files Attached Files File Type: bcpkg JsonFileFormat.bcpkg
(449.8 KB, 58 views)

I have a small GPL project that would do the trick for simple JSON. I have not added support for nested entities as it is more of a simple ObjectDB solution and not actually JSON (Despite the fact it was clearly inspired by it.
Long and short the API is pretty simple. Make a new group, populate it, and then pull a subset via whatever logical parameters you need.
https://github.com/danielbchapman/groups
The API is used basically like ->
SubGroup items = group
.notEqual("field", "value")
.lessThan("field2", 50); //...etc...
There's actually support for basic unions and joins which would do pretty much what you want.
Long and short you probably want a Set as your data-type. Considering your comparisons are probably complex you need a more complex set of methods.
My only caution is that it is GPL. If your data is confidential, odds are you may not be interested in that license.

Which is faster in JavaScript, JSON or SOAP parsing?

Here's the two scenarios.
We are using a manually built xml soap request with xmlhttprequest, sending it to a wcf soap service, getting back the response and using xPath to parse the data and fill out a drop down list.
We are sending a json request to a rest wcf service and getting a json response back and assigning the values to a drop down list
Which scenario is faster? My sense tells me #2 but I could be wrong.

Json will be faster, since Json is essentially Javascript. But that shouldn't be the main motivation. Parsing the data, will assumingly be only a small part of your application anyway.
On the other hand, browsers are also well trained to parse XML.
The main difference is that XML, and therefor SOAP, is larger to send to the client, so the transfer may be a bigger slowdown than the parsing.
Anyway, if you want to know, you should just test and profile instead of guessing or asking.

Option two would generally be faster than option one, as JSON is a much simpler format than XML.
However, if you really need the parsing to be fast, you shouldn't use either, you should use a custom format that is really fast to parse using simple string operations. For example a comma separated string that could be parsed with a split(',').

After profiling in my scenario, I found out that JSON is actually much faster as far as processing time within the browser

Is parsing JSON faster than parsing XML

I'm creating a sophisticated JavaScript library for working with my company's server side framework.
The server side framework encodes its data to a simple XML format. There's no fancy namespacing or anything like that.
Ideally I'd like to parse all of the data in the browser as JSON. However, if I do this I need to rewrite some of the server side code to also spit out JSON. This is a pain because we have public APIs that I can't easily change.
What I'm really concerned about here is performance in the browser of parsing JSON versus XML. Is there really a big difference to be concerned about? Or should I exclusively go for JSON? Does anyone have any experience or benchmarks in the performance difference between the two?
I realize that most modern web developers would probably opt for JSON and I can see why. However, I really am just interested in performance. If there's a proven massive difference then I'm prepared to spend the extra effort in generating JSON server side for the client.

JSON should be faster since it's JS Object Notation, which means it can be recognized natively by JavaScript. In PHP on the GET side of things, I will often do something like this:
<script type="text/javascript">
var data = <?php json_encode($data)?>;
</script>
For more information on this, see here:
Why is Everyone Choosing JSON Over XML for jQuery?
Also...what "extra effort" do you really have to put into "generating" JSON? Surely you can't be saying that you'll be manually building the JSON string? Almost every modern server-side language has libraries that convert native variables into JSON strings. For example, PHP's core json_encode function converts an associative array like this:
$data = array('test'=>'val', 'foo'=>'bar');
into
{"test": "val", "foo": "bar"}
Which is simply a JavaScript object (since there are no associative arrays (strictly speaking) in JS).

Firstly, I'd like to say thanks to everyone who's answered my question. I REALLY appreciate all of your responses.
In regards to this question, I've conducted some further research by running some benchmarks. The parsing happens in the browser. IE 8 is the only browser that doesn't have a native JSON parser. The XML is the same data as the JSON version.
Chrome (version 8.0.552.224), JSON: 92ms, XML: 90ms
Firefox (version 3.6.13), JSON: 65ms, XML: 129ms
IE (version 8.0.6001.18702), JSON: 172ms, XML: 125ms
Interestingly, Chrome seems to have almost the same speed. Please note, this is parsing a lot of data. With little snippets of data, this isn't probably such a big deal.

Benchmarks have been done. Here's one. The difference in some of the earlier browsers appeared to be an entire order of magnitude (on the order of 10s of milliseconds instead of 100s of ms), but not massive. Part of this is in server response time - XML is bulkier as a data format. Part of it is parsing time - JSON lets you send JavaScript objects, while XML requires parsing a document.
You could consider adding to your public API a method to return JSON instead of modifying existing functions if it becomes and issue, unless you don't want to expose the JSON.
See also the SO question When to prefer JSON over XML?

Performance isn't really a consideration, assuming that you're not talking about gigabytes of XML. Yes, it will take longer (XML is more verbose), but it's not going to be something that the user will notice.
The real issue, in my opinion, is support for XML within JavaScript. E4X is nice, but it isn't supported by Microsoft. So you'll need to use a third-party library (such as JQuery) to parse the XML.

If possible, it would make sense to just measure it. By 'if possible' I mean that tooling for javascript (esp. for performance analysis) may not be quite as good as for stand-alone programming languages.
Why measure? Because speculation based solely on properties of data formats is not very useful for performance analysis -- developers' intuitions are notoriously poor at predicting performance. In this case it just means that it all comes down to maturity of respective XML and JSON parser (and generators) in use. XML has the benefit of having been around longer; JSON is bit simpler to process. This based on having actually written libraries for processing both. In the end, if all things are equal (maturity and performance optimization of libraries), JSON can indeed be bit faster to process. But both can be very fast; or very slow with bad implementations.
However: I suspect that you should not worry all that much about performance, like many have already suggested. Both xml and json can be parsed efficiently, and with modern browsers, probably are.
Chances are that if you have performance problems it is not with reading or writing of data but something else; and first step would be actually figuring out what the actual problem is.

since JSON is native in and designed FOR Javascript, it's going to out-perform XML parsing all day long. you didn't mention your server-side language, in PHP there is the json_encode/json_decode functionality built into the PHP core...

the difference in performace will be so tiny, you wouldn't even notice it (and: you shouldn't think about performance problems until you have performance problems - there are a lot of more important points to care for - maintainable, readable and documented code...).
but, to answer ayou question: JSON will be faster to parse (because it's simple javascript object notation).

In this situation, I'd say stick with the XML. All major browsers have a DOM parsing interface that will parse well-formed XML. This link shows a way to use the DOMParser interface in Webkit/Opera/Firefox, as well as the ActiveX DOM Object in IE: https://sites.google.com/a/van-steenbeek.net/archive/explorer_domparser_parsefromstring

It also depends on how your JSON is structured. Tree-like structures tend to parse more efficiently than a list of objects. This is where one's fundamental understanding of data structures will be handy. I would not be surprised if you parse a list-like structure in JSON that might look like this:
{
{
"name": "New York",
"country":"USA",
"lon": -73.948753,
"lat": 40.712784
},
{
"name": "Chicago",
"country":"USA",
"lon": -23.948753,
"lat": 20.712784
},
{
"name": "London",
"country":"UK",
"lon": -13.948753,
"lat": 10.712784
}
}
and then compare it to a tree like structure in XML that might look like this:
<cities>
<country name="USA">
<city name="New York">
<long>-73.948753</long>
<lat>40.712784</lat>
</city>
<city name="Chicago">
<long>-23.948753</long>
<lat>20.712784</lat>
</city>
</country>
<country name="UK">
<city name="London">
<long>-13.948753</long>
<lat>10.712784</lat>
</city>
</country>
</cities>
The XML structure may yield a faster time than that of JSON since if I loop through the node of UK to find London, I don't have to loop through the rest of the countries to find my city. In the JSON example, I just might if London is near the bottom of the list. But, what we have here is a difference in structure. I would be surprised to find that XML is faster in either case or in a case where the structures are exactly the same.
Here is an experiment I did using Python - I know the question is looking at this strictly from a JavaScript perspective, but you might find it useful. The results show that JSON is faster than XML. However, the point is: how you structure is going to have an effect on how efficiently you are able to retrieve it.

Another reason to stick with XML is, that if you switch to JSON, you modify the "maintenance contract". XML is more typed than JSON is, in the sense that it works more naturally with typed languages (i.e. NOT javascript).
If you change to JSON, some future maintainer of the code base might introduce a JSON array at some point which has mixed type content (e.g. [ "Hello", 42, false ]), which will present a problem to any code written in a typed language.
Yes, you could do that as well in XML but it requires extra effort, while in JSON it can just slip in.
And while it does not seem like a big deal at first glance, it actually is as it forces the code in the typed language to stick with a JSON tree instead of deserializing to a native type.

best example i have found about these two is :
http://www.utilities-online.info/xmltojson/#.VVGOlfCYK7M
that means JSON is more human readable and understandable than XML.

Is it bad to store JSON on disk?

Mostly I have just used XML files to store config info and to provide elementary data persistence. Now I am building a website where I need to store some XML type data. However I am already using JSON extensively throughout the whole thing. Is it bad to store JSON directly instead of XML, or should I store the XML and introduce an XML parser.

Not bad at all. Although there are more XML editors, so if you're going to need to manually edit the files, XML may be better.

Differences between using XML and JSON are:
A lot easier to find an editor supporting nice way to edit XML. I'm aware of no editors that do this for JSON, but there might be some, I hope :)
Extreme portability/interoperability - not everything can read JSON natively whereas pretty much any language/framework these days has XML libraries.
JSON takes up less space
JSON may be faster to process, ESPECIALLY in a JavaScript app where it's native data.
JSON is more human readable for programmers (this is subjective but everyone I know agrees so).
Now, please notice the common thread: any of the benefits of using pure XML listed above are 100% lost immediately as soon as you store JSON as XML payload.
Therefore, the gudelines are as follows:
If wide interoperability is an issue and you talk to something that can't read JSON (like a DB that can read XML natively), use XML.
Otherwise, I'd recommend using JSON
NEVER EVER use JSON as XML payload unless you must use XML as a transport container due to existing protocol needs AND the cost of encoding and decoding JSON to/from XML is somehow prohibitively high as compared to network/storage lossage due to double encoding (I have a major trouble imagining a plausible scenario like this, but who knows...)
UPDATED: Removed Unicode bullets as per info in comments

It's just data, like XML. There's nothing about it that would preclude saving it to disk.

Define "bad". They're both just plain-text formats. Knock yourself out.

If your storing the data as a cache (meaning it was in one format and you had to process it programatically to "make" it JSON. Then I say no problem. As long as the consumer of your JSON reads native JSON then it's standard practice to save cache data to disk or memory.
However if you're storing a configuration file in JSON which needs human interaction to "process" then I may reconsider. Using JSON for simple Key:Value pairs is cool, but anything beyond that, the format may be too compact (meaning nested { and [ brackets can be hard to decipher).

one potential issue with JSON, when there is deep nesting, is readability,
you may actually see ]]]}], making debugging difficult

We Keep Coding

JavaScript is the programming language of the Web.