I have a large JSON object of about 8 MB from a server stored in a browser client. If I cut down the name of variables that are duplicated in these lists will that give a performance boost at all when manipulating the lists and updating the objects?
{ "VenueLocationID" : 12 }
{ "vid" : 12 }
If you're transferring 8MB of data from server to client, probably. However, there are better ways to improve performance. If your JSON is coming through an HTTP response, activating gzip compression could net even better performance without reducing readability.
The best way to tune performance is to profile the application -- find out where the bottlenecks are, then address them. Profilers can sometimes find things that I never thought would be issues.
Another thing to look at is how the JSON is being built. I've helped some systems out by stream-parsing. Instead of serializing (stringifying) one huge array, I serialized each element [and wrote it to a response stream,] surrounded by the typical delimiters ('[', ']', and ',').
Related
This is largely an 'am I doing it right / how can I do this better' kind of topic, with some concrete questions at the end. If you have other advice / remarks on the text below, even if I didn't specifically ask those questions, feel free to comment.
I have a MySQL table for users of my app that, along with a set of fixed columns, also has a text column containing a JSON config object. This is to store variable configuration data that cannot be stored in separate columns because it has different properties per user. There doesn't need to be any lookup / ordering / anything on the configuration data, so we decided this would be the best way to go.
When querying the database from my Node.JS app (running on Node 0.12.4), I assign the JSON text to an object and then use Object.defineProperty to create a getter property that parses the JSON string data when it is needed and adds it to the object.
The code looks like this:
user =
uid: results[0].uid
_c: results[0].user_config # JSON config data as string
Object.defineProperty user, 'config',
get: ->
#c = JSON.parse #_c if not #c?
return #c
Edit: above code is Coffeescript, here's the (approximate) Javascript equivalent for those of you who don't use Coffeescript:
var user = {
uid: results[0].uid,
_c: results[0].user_config // JSON config data as string
};
Object.defineProperty(user, 'config', {
get: function() {
if(this.c === undefined){
this.c = JSON.parse(this._c);
}
return this.c;
}
});
I implemented it this way because parsing JSON blocks the Node event loop, and the config property is only needed about half the time (this is in a middleware function for an express server) so this way the JSON would only be parsed when it is actually needed. The config data itself can range from 5 to around 50 different properties organised in a couple of nested objects, not a huge amount of data but still more than just a few lines of JSON.
Additionally, there are three of these JSON objects (I only showed one since they're all basically the same, just with different data in them). Each one is needed in different scenarios but all of the scenarios depend on variables (some of which come from external sources) so at the point of this function it's impossible to know which ones will be necessary.
So I had a couple of questions about this approach that I hope you guys can answer.
Is there a negative performance impact when using Object.defineProperty, and if yes, is it possible that it could negate the benefit from not parsing the JSON data right away?
Am I correct in assuming that not parsing the JSON right away will actually improve performance? We're looking at a continuously high number of requests and we need to process these quickly and efficiently.
Right now the three JSON data sets come from two different tables JOINed in an SQL query. This is to only have to do one query per request instead of up to four. Keeping in mind that there are scenarios where none of the JSON data is needed, but also scenarios where all three data sets are needed (and of course scenarios inbetween), could it be an improvement to only get the required JSON data from its table, at the point when one of the data sets is actually needed? I avoided this because I feel like waiting for four separate SELECT queries to be executed would take longer than waiting for one query with two JOINed tables.
Are there other ways to approach this that would improve the general performance even more? (I know, this one's a bit of a subjective question, but ideas / suggestions of things I should check out are welcome). I'm not looking to spin off parsing the JSON data into a separate thread though, because as our service runs on a cluster of virtualised single-core servers, creating a child process would only increase overall CPU usage, which at high loads would have even more negative impact on performance.
Note: when I say performance it mainly means fast and efficient throughput rates. We prefer a somewhat larger memory footprint over heavier CPU usage.
We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil
- Donald Knuth
What do I get from that article? Too much time is spent in optimizing with dubious results instead of focusing on design and clarity.
It's true that JSON.parse blocks the event loop, but every synchronous call does - this is just code execution and is not a bad thing.
The root concern is not that it is blocking, but how long it is blocking. I remember a Strongloop instructor saying 10ms was a good rule of thumb for max execution time for a call in an app at cloud scale. >10ms is time to start optimizing - for apps at huge scale. Each app has to define that threshold.
So, how much execution time will your lazy init save? This article says it takes 1.5s to parse a 15MB json string - about 10,000 B/ms. 3 configs, 50 properties each, 30 bytes/k-v pair = 4500 bytes - about half a millisecond.
When the time came to optimize, I would look at having your lazy init do the MySQL call. A config is needed only 50% of the time, it won't block the event loop, and an external call to a db absolutely dwarfs a JSON.parse().
All of this to say: What you are doing is not necessarily bad or wrong, but if the whole app is littered with these types of dubious optimizations, how does that impact feature addition and maintenance? The biggest problems I see revolve around time to market, not speed. Code complexity increases time to market.
Q1: Is there a negative performance impact when using Object.defineProperty...
Check out this site for a hint.
Q2: *...not parsing the JSON right away will actually improve performance...
IMHO: inconsequentially
Q3: Right now the three JSON data sets come from two different tables...
The majority db query cost is usually the out of process call and the network data transport (unless you have a really bad schema or config). All data in one call is the right move.
Q4: Are there other ways to approach this that would improve the general performance
Impossible to tell. The place to start is with an observed behavior, then profiler tools to identify the culprit, then code optimization.
Consider this JSON response:
[{
Name: 'Saeed',
Age: 31
}, {
Name: 'Maysam',
Age: 32
}, {
Name: 'Mehdi',
Age: 27
}]
This works fine for small amount of data, but when you want to serve larger amounts of data (say many thousand records for example), it seems logical to prevent those repetitions of property names in the response JSON somehow.
I Googled the concept (DRYing JSON) and to my surprise, I didn't find any relevant result. One way of course is to compress JSON using a simple home-made algorithm and decompress it on the client-side before consuming it:
[['Name', 'Age'],
['Saeed', 31],
['Maysam', 32],
['Mehdi', 27]]
However, a best practice would be better than each developer trying to reinvent the wheel. Have you guys seen a well-known widely-accepted solution for this?
First off, JSON is not meant to be the most compact way of representing data. It's meant to be parseable directly into a javascript data structure designed for immediate consumption without further parsing. If you want to optimize for size, then you probably don't want self describing JSON and you need to allow your code to make a bunch of assumptions about how to handle the data and put it to use and do some manual parsing on the receiving end. It's those assumptions and extra coding work that can save you space.
If the property names and format of the server response are already known to the code, you could just return the data as an array of alternating values:
['Saeed', 31, 'Maysam', 32, 'Mehdi', 27]
or if it's safe to assume that names don't include commas, you could even just return a comma delimited string that you could split into it's pieces and stick into your own data structures:
"Saeed, 31, Maysam, 32, Mehdi, 27"
or if you still want it to be valid JSON, you can put that string in an array like this which is only slightly better than my first version where the items themselves are array elements:
["Saeed, 31, Maysam, 32, Mehdi, 27"]
These assumptions and compactness put more of the responsibility for parsing the data on your own javascript, but it is that removal of the self describing nature of the full JSON you started with that leads to its more compact nature.
One solution is known as hpack algorithm
https://github.com/WebReflection/json.hpack/wiki
You might be able to use a CSV format instead of JSON, as you would only specify the property names once. However, this would require a rigid structure like in your example.
JSON isn't really the kind of thing that lends itself to DRY, since it's already quite well-packaged considering what you can do with it. Personally, I've used bare arrays for JSON data that gets stored in a file for later use, but for simple AJAX requests I just leave it as it is.
DRY usually refers to what you write yourself, so if your object is being generated dynamically you shouldn't worry about it anyway.
Use gzip-compression which is usually readily built into most web servers & clients?
It will still take some (extra) time & memory to generate & parse the JSON at each end, but it will not take that much time to send over the network, and will take minimal implementation effort on your behalf.
Might be worth a shot even if you pre-compress your source-data somehow.
It's actually not a problem for JSON that you've often got massive string or "property" duplication (nor is it for XML).
This is exactly what the duplicate string elimination component of the DEFLATE-algorithm addresses (used by GZip).
While most browser clients can accept GZip-compressed responses, traffic back to the server won't be.
Does that warrant using "JSON compression" (i.e. hpack or some other scheme)?
It's unlikely to be much faster than implementing GZip-compression in Javascript (which is not impossible; on a reasonably fast machine you can compress 100 KB in 250 ms).
It's pretty difficult to safely process untrusted JSON input. You need to use stream-based parsing and decide on a maximum complexity threshold, or else your server might be in for a surprise. See for instance Armin Ronacher's Start Writing More Classes:
If your neat little web server is getting 10000 requests a second through gevent but is using json.loads then I can probably make it crawl to a halt by sending it 16MB of well crafted and nested JSON that hog away all your CPU.
For maximum load speed and page efficiency, is it better to have:
An 18MB JSON file, containing an array of dictionaries, that I can load and start using as a native JavaScript object (e.g. var myname = jsonobj[1]['name']).
A 4MB CSV file, that I need to read using the jquery.csv plugin, and then use lookups to refer to: var nameidx = titles.getPos('name'); var myname = jsonobj[1][nameidx]).
I'm not really expecting anyone to give me a definitive answer, but a general suspicion would be very useful. Or tips for how to measure - perhaps I can check the trade-off between load speed and efficiency using Developer Tools.
My suspicion is that any extra efficiency from using a native JavaScript object in (1) will be outweighed by the much smaller size of the CSV file, but I would like to know if others think the same.
Did you considered delivering the json content using gzip - here is some benchmarks on gzip http://www.cowtowncoder.com/blog/archives/2009/05/entry_263.html
What is your situation? Are you writing some intranet site where you know what browser users are using and have something like a reasonable expectation of bandwidth, or is this a public-facing site?
If you have control of what browsers people use, for example because they're your employees, consider taking advantage of client-side caching. If you're trying to convince people to use this data you should probably consider breaking the data up into chunks and serving it via XHR.
If you really need to serve it all at once then:
Use gzip
Are you doing heavy processing of the data on the client side? How many of the items are you actually likely to go through? If you're only likely to access fewer than 1,000 of them in any given session then I would imagine that the 14MB savings would be worth it. If on the other hand you're comparing all kinds of things against each other all the time (because you're doing some sort of visualization or... anything) then I imagine that the JSON would pay off.
In other words: it depends. Benchmark it.
4MB vs 18MB? Where problem? Json is just standard format now, csv is maybe same good and ok if you using it. My opinion.
14Mb of data are a HUGE difference, but I will try first to serve both the content with GZIP/Deflate server side compression and, thus, make a comparison of these requests (probably the CSV request will be again better in content length)
Then, I would also try to create some data manipulation tests on jsperf both with CSV and JSON data with a real test case/common usage
That depends a lot on the bandwidth of the connection to the user.
Unless this is only going to be used by people who have a super fast connection to the server, I would say that the best option would be an even smaller file that only contains the actual information that you need to display right away, and then load more data as needed.
I have a ajax call that is currently returning raw html that I inject into the page.
Now my issue is, that in some situations, I need to return back a count value along with the raw html, and if that count is > 10, I need to fire another jquery operation to make something visible.
The best approach here would be to return json then right? So I can do:
jsonReturned.counter
jsonReturned.html
Do you agree?
Also, out of curiosity more than anything, is json any more expensive performance wise? It is just a simple object with properties but just asking.
This question reserves some discretion, but in my opinion, there is no efficiency concern with returning JSON instead of raw HTML. As you stated, you can easily return multiple messages without the need for extra parsing (I'll often have a status, a message, and data for example).
I haven't run any numbers, but I can tell you I've used JSON via AJAX in very heavy traffic (millions of requests) situations with no efficiency concerns.
I agree, json is the way to go. There is no doubt that it is a performance hit. The question is: is it a negligible hit? My opinion is that it is negligible. Javascript is pretty fast these days in the browser. You should be ok.
JSON'll likely be more compact than HTML, since bracket/quote pairs are ALWAYS going to be terser than the smallest possible tag combos. {}/[],"" v.s. <a></a>. 2 chars v.s. 7 is a win in my book. however, if your data requires huge amounts of escaping with \, then JSON would be a net loss, and could double the size of any given string.
Sources of additional overhead include
Download size and load time of a JSON parser for older browsers
JSON parse time.
Larger download size for the HTML content since the JSON string has to contain the HTML string plus quotes at least.
The only way to know whether these are significant to your app is to measure them. Neither of them are obviously large and none of them are more than O(n).
What would be the best format for storing a relatively large amount of data (essentially a big hashmap) for quick retrieval using javascript? It would need to support Unicode as well.
XML, JSON?
Gigantic javascript objects are generally a sign that you're trying to do something you really shouldn't be doing. XML is even worse, it has to be parsed to form meaningful data.
In this case an AJAX query to RESTful interface to a proper database backend would probably serve you well.
Javascript object access (particularly for any query beyond accessing a single item by its hash) is very slow compared to even a basic database.
There is a nice research of the people at flickr about this topic. They ended up by using csv over xml and json.
JSON definitely beats XML for performance reasons.
But a query against DB on the backend would probably be the only feasible solution once a certain scale is reached, since local resources can not possibly match data retrieval from large store compared to DB.