Consider this JSON response:
[{
Name: 'Saeed',
Age: 31
}, {
Name: 'Maysam',
Age: 32
}, {
Name: 'Mehdi',
Age: 27
}]
This works fine for small amount of data, but when you want to serve larger amounts of data (say many thousand records for example), it seems logical to prevent those repetitions of property names in the response JSON somehow.
I Googled the concept (DRYing JSON) and to my surprise, I didn't find any relevant result. One way of course is to compress JSON using a simple home-made algorithm and decompress it on the client-side before consuming it:
[['Name', 'Age'],
['Saeed', 31],
['Maysam', 32],
['Mehdi', 27]]
However, a best practice would be better than each developer trying to reinvent the wheel. Have you guys seen a well-known widely-accepted solution for this?
First off, JSON is not meant to be the most compact way of representing data. It's meant to be parseable directly into a javascript data structure designed for immediate consumption without further parsing. If you want to optimize for size, then you probably don't want self describing JSON and you need to allow your code to make a bunch of assumptions about how to handle the data and put it to use and do some manual parsing on the receiving end. It's those assumptions and extra coding work that can save you space.
If the property names and format of the server response are already known to the code, you could just return the data as an array of alternating values:
['Saeed', 31, 'Maysam', 32, 'Mehdi', 27]
or if it's safe to assume that names don't include commas, you could even just return a comma delimited string that you could split into it's pieces and stick into your own data structures:
"Saeed, 31, Maysam, 32, Mehdi, 27"
or if you still want it to be valid JSON, you can put that string in an array like this which is only slightly better than my first version where the items themselves are array elements:
["Saeed, 31, Maysam, 32, Mehdi, 27"]
These assumptions and compactness put more of the responsibility for parsing the data on your own javascript, but it is that removal of the self describing nature of the full JSON you started with that leads to its more compact nature.
One solution is known as hpack algorithm
https://github.com/WebReflection/json.hpack/wiki
You might be able to use a CSV format instead of JSON, as you would only specify the property names once. However, this would require a rigid structure like in your example.
JSON isn't really the kind of thing that lends itself to DRY, since it's already quite well-packaged considering what you can do with it. Personally, I've used bare arrays for JSON data that gets stored in a file for later use, but for simple AJAX requests I just leave it as it is.
DRY usually refers to what you write yourself, so if your object is being generated dynamically you shouldn't worry about it anyway.
Use gzip-compression which is usually readily built into most web servers & clients?
It will still take some (extra) time & memory to generate & parse the JSON at each end, but it will not take that much time to send over the network, and will take minimal implementation effort on your behalf.
Might be worth a shot even if you pre-compress your source-data somehow.
It's actually not a problem for JSON that you've often got massive string or "property" duplication (nor is it for XML).
This is exactly what the duplicate string elimination component of the DEFLATE-algorithm addresses (used by GZip).
While most browser clients can accept GZip-compressed responses, traffic back to the server won't be.
Does that warrant using "JSON compression" (i.e. hpack or some other scheme)?
It's unlikely to be much faster than implementing GZip-compression in Javascript (which is not impossible; on a reasonably fast machine you can compress 100 KB in 250 ms).
It's pretty difficult to safely process untrusted JSON input. You need to use stream-based parsing and decide on a maximum complexity threshold, or else your server might be in for a surprise. See for instance Armin Ronacher's Start Writing More Classes:
If your neat little web server is getting 10000 requests a second through gevent but is using json.loads then I can probably make it crawl to a halt by sending it 16MB of well crafted and nested JSON that hog away all your CPU.
Related
I'm working on the front-end of a application which communicates with a back-end thought REST API. The back-end is some kind of standalone device, not a standard web server thus it is not so powerful (but can run php). The API is very generic and returns values like-key value pairs e.g.
[{ key: "key1", value: "some value" }, { key: "key2", value: "1234" }]
The problem what I'm facing is that it does not consider types, it returns everything like string in quotes (numbers: "123", boolean: "1"). Recently I asked for a change (my argument was that the manual type conversion is unnecessary work which has to be done by each client app and can be avoided if the server can do it) but I need some more convincing arguments. Some counterarguments to my request are:
RESTful communications is natively as a string (so regardless, if
transferring a 1 or a "1" -- a client side type conversion has to
be done)
it is the responsibility of the GUI designer to understand the context of each parameter
the back-end is following the KISS principle keeping everything as strings and no additional processing is needed on the back-and and can be done on the GUI which is typically on a much more powerful PC
So what could be some good arguments to convince my colleagues that types in JSON responses are good thing for me and for them as well?
Thanks
RESTful communcation and JSON are 2 different things. JSON is only the format of the data, it could be XML or even CSV or a custom one, this doesn't remove that RESTful aspect.
JSON is natively handled by pretty much all javascript library that handle server communcation, no parsing to do, little conversion (maybe date object in timestamp and other kind of stuff).
On the server-side there are a lot of library that can handle the JSON for you too, and how they generate the key-value thing ? A generic code with introspection or do they write tons of serializer for all classes ? This can lead to a lot of technical and unncessary code to write, and test.
KISS doesn't mean to keep it totally stupid and don't think about anything. Having boolean has a number in a string and number as string has nothing simple as a developper, it's merely hell to handle for all objects conversion. If you need to check for data constraint, this will lead to a repeat yourself when you will have to validate every fields (testing if the number is a number,...).
The more simple thing is not to write your own library that convert all to string, with probably less performance that specialized library. It's to used library that do the job for you.
If you write all as typed object, your json deserializer on backend will do a part of the validation for you, a boolean will be a boolean, a number will be a number, if you have a lot of validation to do, this lead to way less code to write to perform all the checks.
Client-side i guess there is a lot of code to deal with all this key/value things and to convert values. This slow down development as hell and if you perform unit-testing, it adds lot of testing to do.
it is the responsibility of the GUI designer to understand the context of each parameter
Well this is true, but i feel like providing well formatted data is the responsability of the server. Enforcing a format will lead to a fail fast pratice, which is a good thing. Did they never had any production problem because of this generic format ? They wouldn't with proper JSON.
Personnaly my JSON code on the server is Java annotation and one custom serializer, nothing more. I wonder how much code did they write to serialize/deserialize and convert types.
First of all, the idea of using the following data format [{ key: "key1", value: "some value" }, { key: "key2", value: "1234" }] is kinda stupid and is basically the same as { "key1": "some value", "key2": "1234" }.
In regards to the points, your colleagues made:
1. While this is indeed text-based transfer and conversion between the string representation of your object and an actual object has to be done, there would be no need to recursively walk the tree of key/value pairs if you were to transfer objects or other complex types as values.
Let's pretend that you have to transfer following piece of data:
{ "key1": { "x": 10, "y": 20 } }
if you were to encode inner object as string you would get something like this:
"{ \"key1\": \"{ \"x\": \"10\", \"y\": \"20\" }\" }"
If your value was converted to string, you'd have to call JSON.parse on the entire object first as well as on the object that was possibly stored as text inside that object, which requires recursive walking of the tree. On the other hand, when using native types (such as objects, numbers and arrays) as values you would achieve the same effect with just one call to JSON.parse (which would internally be recursive, but still better that managing it yourself).
Second point is valid but I feel like any service should return the data in ready-to-use form or at least as ready as it can be. Wouldn't it be stupid if your service gave you some piece of raw XML instead of parsed and prepared data? Same principle applies here.
I feel like your friends are just being lazy trying to cover their butts with KISS principle here. It is very likely that their server code has all the info it needs to encode the values in their proper types, which would be much harder to do on the client side. So their third point seems like a blatant 'we are too lazy so you have to do it' thing.
I need to retrieve a large amount of data (coordinates plus an extra value) via AJAX. The format of the data is:
-72.781;;6,-68.811;;8
Note two different delimiters are being used: ;; and ,.
Shall I just return a delimited string and use String.split() (twice) or is it better to return a JSON string and use JSON.parse() to unpack my data? What is the worst and the best from each method?
Even if the data is really quite large, the odds of their being a performance difference noticeable in the real world are quite low (data transfer time will trump the decoding time). So barring a real-world performance problem, it's best to focus on what's best from a code clarity viewpoint.
If the data is homogenous (you deal with each coordinate largely the same way), then there's nothing wrong with the String#split approach.
If you need to refer to the coordinates individually in your code, there would be an argument for assigning them proper names, which would suggest using JSON. I tend to lean toward clarity, so I would probably lean toward JSON.
Another thing to consider is size on the wire. If you only need to support nice fat network connections, it probably doesn't matter, but because JSON keys are reiterated for each object, the size could be markedly larger. That might argue for compressed JSON.
I've created a performance test that describes your issue.
Although it depends on the browser implementation, in many cases -as the results show- split would be much faster, because JSON.parse does a lot of other things in the background, but you would need the data served for easy parsing: in the test I've added a case where you use split (among replace) in order to parse an already formatted json array and, the result speaks for itself.
All in all, I wouldn't go with a script that's a few miliseconds faster but n seconds harder to read and maintain.
I have a ajax call that is currently returning raw html that I inject into the page.
Now my issue is, that in some situations, I need to return back a count value along with the raw html, and if that count is > 10, I need to fire another jquery operation to make something visible.
The best approach here would be to return json then right? So I can do:
jsonReturned.counter
jsonReturned.html
Do you agree?
Also, out of curiosity more than anything, is json any more expensive performance wise? It is just a simple object with properties but just asking.
This question reserves some discretion, but in my opinion, there is no efficiency concern with returning JSON instead of raw HTML. As you stated, you can easily return multiple messages without the need for extra parsing (I'll often have a status, a message, and data for example).
I haven't run any numbers, but I can tell you I've used JSON via AJAX in very heavy traffic (millions of requests) situations with no efficiency concerns.
I agree, json is the way to go. There is no doubt that it is a performance hit. The question is: is it a negligible hit? My opinion is that it is negligible. Javascript is pretty fast these days in the browser. You should be ok.
JSON'll likely be more compact than HTML, since bracket/quote pairs are ALWAYS going to be terser than the smallest possible tag combos. {}/[],"" v.s. <a></a>. 2 chars v.s. 7 is a win in my book. however, if your data requires huge amounts of escaping with \, then JSON would be a net loss, and could double the size of any given string.
Sources of additional overhead include
Download size and load time of a JSON parser for older browsers
JSON parse time.
Larger download size for the HTML content since the JSON string has to contain the HTML string plus quotes at least.
The only way to know whether these are significant to your app is to measure them. Neither of them are obviously large and none of them are more than O(n).
I'm creating a sophisticated JavaScript library for working with my company's server side framework.
The server side framework encodes its data to a simple XML format. There's no fancy namespacing or anything like that.
Ideally I'd like to parse all of the data in the browser as JSON. However, if I do this I need to rewrite some of the server side code to also spit out JSON. This is a pain because we have public APIs that I can't easily change.
What I'm really concerned about here is performance in the browser of parsing JSON versus XML. Is there really a big difference to be concerned about? Or should I exclusively go for JSON? Does anyone have any experience or benchmarks in the performance difference between the two?
I realize that most modern web developers would probably opt for JSON and I can see why. However, I really am just interested in performance. If there's a proven massive difference then I'm prepared to spend the extra effort in generating JSON server side for the client.
JSON should be faster since it's JS Object Notation, which means it can be recognized natively by JavaScript. In PHP on the GET side of things, I will often do something like this:
<script type="text/javascript">
var data = <?php json_encode($data)?>;
</script>
For more information on this, see here:
Why is Everyone Choosing JSON Over XML for jQuery?
Also...what "extra effort" do you really have to put into "generating" JSON? Surely you can't be saying that you'll be manually building the JSON string? Almost every modern server-side language has libraries that convert native variables into JSON strings. For example, PHP's core json_encode function converts an associative array like this:
$data = array('test'=>'val', 'foo'=>'bar');
into
{"test": "val", "foo": "bar"}
Which is simply a JavaScript object (since there are no associative arrays (strictly speaking) in JS).
Firstly, I'd like to say thanks to everyone who's answered my question. I REALLY appreciate all of your responses.
In regards to this question, I've conducted some further research by running some benchmarks. The parsing happens in the browser. IE 8 is the only browser that doesn't have a native JSON parser. The XML is the same data as the JSON version.
Chrome (version 8.0.552.224), JSON: 92ms, XML: 90ms
Firefox (version 3.6.13), JSON: 65ms, XML: 129ms
IE (version 8.0.6001.18702), JSON: 172ms, XML: 125ms
Interestingly, Chrome seems to have almost the same speed. Please note, this is parsing a lot of data. With little snippets of data, this isn't probably such a big deal.
Benchmarks have been done. Here's one. The difference in some of the earlier browsers appeared to be an entire order of magnitude (on the order of 10s of milliseconds instead of 100s of ms), but not massive. Part of this is in server response time - XML is bulkier as a data format. Part of it is parsing time - JSON lets you send JavaScript objects, while XML requires parsing a document.
You could consider adding to your public API a method to return JSON instead of modifying existing functions if it becomes and issue, unless you don't want to expose the JSON.
See also the SO question When to prefer JSON over XML?
Performance isn't really a consideration, assuming that you're not talking about gigabytes of XML. Yes, it will take longer (XML is more verbose), but it's not going to be something that the user will notice.
The real issue, in my opinion, is support for XML within JavaScript. E4X is nice, but it isn't supported by Microsoft. So you'll need to use a third-party library (such as JQuery) to parse the XML.
If possible, it would make sense to just measure it. By 'if possible' I mean that tooling for javascript (esp. for performance analysis) may not be quite as good as for stand-alone programming languages.
Why measure? Because speculation based solely on properties of data formats is not very useful for performance analysis -- developers' intuitions are notoriously poor at predicting performance. In this case it just means that it all comes down to maturity of respective XML and JSON parser (and generators) in use. XML has the benefit of having been around longer; JSON is bit simpler to process. This based on having actually written libraries for processing both. In the end, if all things are equal (maturity and performance optimization of libraries), JSON can indeed be bit faster to process. But both can be very fast; or very slow with bad implementations.
However: I suspect that you should not worry all that much about performance, like many have already suggested. Both xml and json can be parsed efficiently, and with modern browsers, probably are.
Chances are that if you have performance problems it is not with reading or writing of data but something else; and first step would be actually figuring out what the actual problem is.
since JSON is native in and designed FOR Javascript, it's going to out-perform XML parsing all day long. you didn't mention your server-side language, in PHP there is the json_encode/json_decode functionality built into the PHP core...
the difference in performace will be so tiny, you wouldn't even notice it (and: you shouldn't think about performance problems until you have performance problems - there are a lot of more important points to care for - maintainable, readable and documented code...).
but, to answer ayou question: JSON will be faster to parse (because it's simple javascript object notation).
In this situation, I'd say stick with the XML. All major browsers have a DOM parsing interface that will parse well-formed XML. This link shows a way to use the DOMParser interface in Webkit/Opera/Firefox, as well as the ActiveX DOM Object in IE: https://sites.google.com/a/van-steenbeek.net/archive/explorer_domparser_parsefromstring
It also depends on how your JSON is structured. Tree-like structures tend to parse more efficiently than a list of objects. This is where one's fundamental understanding of data structures will be handy. I would not be surprised if you parse a list-like structure in JSON that might look like this:
{
{
"name": "New York",
"country":"USA",
"lon": -73.948753,
"lat": 40.712784
},
{
"name": "Chicago",
"country":"USA",
"lon": -23.948753,
"lat": 20.712784
},
{
"name": "London",
"country":"UK",
"lon": -13.948753,
"lat": 10.712784
}
}
and then compare it to a tree like structure in XML that might look like this:
<cities>
<country name="USA">
<city name="New York">
<long>-73.948753</long>
<lat>40.712784</lat>
</city>
<city name="Chicago">
<long>-23.948753</long>
<lat>20.712784</lat>
</city>
</country>
<country name="UK">
<city name="London">
<long>-13.948753</long>
<lat>10.712784</lat>
</city>
</country>
</cities>
The XML structure may yield a faster time than that of JSON since if I loop through the node of UK to find London, I don't have to loop through the rest of the countries to find my city. In the JSON example, I just might if London is near the bottom of the list. But, what we have here is a difference in structure. I would be surprised to find that XML is faster in either case or in a case where the structures are exactly the same.
Here is an experiment I did using Python - I know the question is looking at this strictly from a JavaScript perspective, but you might find it useful. The results show that JSON is faster than XML. However, the point is: how you structure is going to have an effect on how efficiently you are able to retrieve it.
Another reason to stick with XML is, that if you switch to JSON, you modify the "maintenance contract". XML is more typed than JSON is, in the sense that it works more naturally with typed languages (i.e. NOT javascript).
If you change to JSON, some future maintainer of the code base might introduce a JSON array at some point which has mixed type content (e.g. [ "Hello", 42, false ]), which will present a problem to any code written in a typed language.
Yes, you could do that as well in XML but it requires extra effort, while in JSON it can just slip in.
And while it does not seem like a big deal at first glance, it actually is as it forces the code in the typed language to stick with a JSON tree instead of deserializing to a native type.
best example i have found about these two is :
http://www.utilities-online.info/xmltojson/#.VVGOlfCYK7M
that means JSON is more human readable and understandable than XML.
I'm passing a table of up to 1000 rows, consisting of name, ID, latitude and longitude values, to the client.
The list will then be processed by Javascript and converted to markers on a Google map.
I initially planned to do this with JSON, as I want the code to be readable and easy to deal with, and because we may be adding more structure to it over time.
However, my colleague suggested passing it down as a Javascript array, as it would reduce the size greatly.
This made me think, maybe JSON is a bit redundant. After all, for each row defined, the name of each field is also being outputted repetitively. Whereas, for an array, the position of the cells is used to indicate the field.
However, would there really be a performance improvement by using an array?
The site uses GZIP compression. Is this compression effective enough to take care of any redundancy found in a JSON string?
[edit]
I realize JSON is just a notation.
But my real question is - what notation is best, performance-wise?
If I use fully named attributes, then I can have code like this:
var x = resultset.rows[0].name;
Whereas if I don't, it will look less readable, like so:
var x = resultset.rows[0][2];
My question is - would the sacrifice in code readability be worth it for the performance gains? Or not?
Further notes:
According to Wikipedia, the Deflate compression algorithm (used by gzip) performs 'Duplicate string elimination'. http://en.wikipedia.org/wiki/DEFLATE#Duplicate_string_elimination
If this is correct, I have no reason to be concerned about any redundancy in JSON, as it's already been taken care of.
JSON is just a notation (Javascript Object Notation), and includes JS arrays -- even if there is the word "object" in its name.
See its grammar on http://json.org/ which defines an array like this (quoting) :
An array is an ordered collection of
values. An array begins with [ (left
bracket) and ends with ] (right
bracket). Values are separated by ,
(comma).
This means this (taken from JSON Data Set Sample) would be valid JSON :
[ 100, 500, 300, 200, 400 ]
Even if it doesn't include nor declare nor whatever any object at all.
In your case, I suppose you could use some array, storing data by position, and not by name.
If you are worried about size you could want to "compress" that data on the server side by yourself, and de-compress it on the client side -- but I wouldn't do that : it would mean you'd need more processing time/power on the client side...
I'd rather go with gzipping of the page that contains the data : you'll have nothing to do, it's fully automatic, and it works just fine -- and the difference in size will probably not be noticeable.
I suggest to use a simple CSV format. There is a nice article on the Flickr Development Blog where they talked about their experience with such a problem. But the best would be to try it on your own.