Splitting Delimited String vs JSON Parsing Efficiency in JavaScript

Splitting Delimited String vs JSON Parsing Efficiency in JavaScript - javascript

I need to retrieve a large amount of data (coordinates plus an extra value) via AJAX. The format of the data is:
-72.781;;6,-68.811;;8
Note two different delimiters are being used: ;; and ,.
Shall I just return a delimited string and use String.split() (twice) or is it better to return a JSON string and use JSON.parse() to unpack my data? What is the worst and the best from each method?

Even if the data is really quite large, the odds of their being a performance difference noticeable in the real world are quite low (data transfer time will trump the decoding time). So barring a real-world performance problem, it's best to focus on what's best from a code clarity viewpoint.
If the data is homogenous (you deal with each coordinate largely the same way), then there's nothing wrong with the String#split approach.
If you need to refer to the coordinates individually in your code, there would be an argument for assigning them proper names, which would suggest using JSON. I tend to lean toward clarity, so I would probably lean toward JSON.
Another thing to consider is size on the wire. If you only need to support nice fat network connections, it probably doesn't matter, but because JSON keys are reiterated for each object, the size could be markedly larger. That might argue for compressed JSON.

I've created a performance test that describes your issue.
Although it depends on the browser implementation, in many cases -as the results show- split would be much faster, because JSON.parse does a lot of other things in the background, but you would need the data served for easy parsing: in the test I've added a case where you use split (among replace) in order to parse an already formatted json array and, the result speaks for itself.
All in all, I wouldn't go with a script that's a few miliseconds faster but n seconds harder to read and maintain.

Related

In Javascript, why define an array with split?

I frequently see code where people define a populated array using the split method, like this:
var colors = "red,green,blue".split(',');
How does this differ from:
var colors = ["red","green","blue"];
Is it simply to avoid having to quote each value?

Splitting a string is a bad way of creating an array. There several issues with the approach that include performance, stability and memory consumption. It requires CPU time to parse the string, it is prone to errors (double commas, spaces in the string, etc.) and means your script essentially has to store twice as much data in memory.
It's not a good idea and is most likely just a bad habit someone picked up when they first learned about strings and arrays. That or they're trying to be clever for some kind of coding exercise.
As a rule of thumb, the only time you should be parsing strings into arrays is if you're reading that string data from an external source and need to convert it to native types. If you already know the values ahead of time, you should create the array yourself.
The one possible reason someone might do this is to reduce the number of characters in their source code, trading performance for bandwidth. 'a,b,c,d,e,f,g'.split(',') is fewer characters than ['a','b','c','d','e','f','g'].

There is no difference, it's just bad practice and laziness if anything. The only reason I could think of using the first approach is if the data naturally came in string form and using an array literal made it completely unreadable.

What is the complexity of JSON.parse() in JavaScript?

The title says it all. I'm going to be parsing a very large JSON string and was curious what the complexity of this built in method was.
I would hope that it's θ(n) where n is the number of characters in the string since it can determine whether there is a syntax error or not.
I tried searching but couldn't come up with anything.

JSON is very simple grammar that does not require even lookaheads. As soon as GC is not involved then it is purely O(n).

I do not know of the implementations in browsers, but your assumption is correct to a certain point. If the JSON includes mainly strings, it will be straight forward and very linear. If you have many floating points, it will take a bit of time to convert the numbers, but again quite linear (numbers with more digits take slightly longer, but in comparison to a long string... very similar).
Since in most cases arrays and objects are declared as maps, the memory allocation grows as required and will generally be linear. Many (if not most) implementations will make use of Java as a backend. This means garbage collection and thus a quite impossible way to know for sure how much time will be required to transform all the data as it will very much depend on things such as the size of the memory model used on the target computer and how often the garbage collection runs. However, it should generally just grow as items are added to the map and it will thus mostly look like it is linear as well. I would not expect an implementation to make use of a realloc() which would mean copying data and thus being slower and slower as an array/object grows bigger and bigger.

Was curious a little more so to add more info I believe this is the "high-level" implementation of JSON.parse. I tried finding if Chromium has their own source for it and not sure if this is it? This is going off of the source from Github.
Things to note:
worst case is probably the scenario of handling objects which requires O(N) time where N is the number of characters.
in the case a reviver function is passed, it has to rewalk the entire object after it's created but that only happens once so it's fairly negligible. Also depends what the reviver function is doing and you will have to account for it's own time complexity.

Is there any well-known method for DRYing JSON

Consider this JSON response:
[{
Name: 'Saeed',
Age: 31
}, {
Name: 'Maysam',
Age: 32
}, {
Name: 'Mehdi',
Age: 27
}]
This works fine for small amount of data, but when you want to serve larger amounts of data (say many thousand records for example), it seems logical to prevent those repetitions of property names in the response JSON somehow.
I Googled the concept (DRYing JSON) and to my surprise, I didn't find any relevant result. One way of course is to compress JSON using a simple home-made algorithm and decompress it on the client-side before consuming it:
[['Name', 'Age'],
['Saeed', 31],
['Maysam', 32],
['Mehdi', 27]]
However, a best practice would be better than each developer trying to reinvent the wheel. Have you guys seen a well-known widely-accepted solution for this?

First off, JSON is not meant to be the most compact way of representing data. It's meant to be parseable directly into a javascript data structure designed for immediate consumption without further parsing. If you want to optimize for size, then you probably don't want self describing JSON and you need to allow your code to make a bunch of assumptions about how to handle the data and put it to use and do some manual parsing on the receiving end. It's those assumptions and extra coding work that can save you space.
If the property names and format of the server response are already known to the code, you could just return the data as an array of alternating values:
['Saeed', 31, 'Maysam', 32, 'Mehdi', 27]
or if it's safe to assume that names don't include commas, you could even just return a comma delimited string that you could split into it's pieces and stick into your own data structures:
"Saeed, 31, Maysam, 32, Mehdi, 27"
or if you still want it to be valid JSON, you can put that string in an array like this which is only slightly better than my first version where the items themselves are array elements:
["Saeed, 31, Maysam, 32, Mehdi, 27"]
These assumptions and compactness put more of the responsibility for parsing the data on your own javascript, but it is that removal of the self describing nature of the full JSON you started with that leads to its more compact nature.

One solution is known as hpack algorithm
https://github.com/WebReflection/json.hpack/wiki

You might be able to use a CSV format instead of JSON, as you would only specify the property names once. However, this would require a rigid structure like in your example.
JSON isn't really the kind of thing that lends itself to DRY, since it's already quite well-packaged considering what you can do with it. Personally, I've used bare arrays for JSON data that gets stored in a file for later use, but for simple AJAX requests I just leave it as it is.
DRY usually refers to what you write yourself, so if your object is being generated dynamically you shouldn't worry about it anyway.

Use gzip-compression which is usually readily built into most web servers & clients?
It will still take some (extra) time & memory to generate & parse the JSON at each end, but it will not take that much time to send over the network, and will take minimal implementation effort on your behalf.
Might be worth a shot even if you pre-compress your source-data somehow.

It's actually not a problem for JSON that you've often got massive string or "property" duplication (nor is it for XML).
This is exactly what the duplicate string elimination component of the DEFLATE-algorithm addresses (used by GZip).
While most browser clients can accept GZip-compressed responses, traffic back to the server won't be.
Does that warrant using "JSON compression" (i.e. hpack or some other scheme)?
It's unlikely to be much faster than implementing GZip-compression in Javascript (which is not impossible; on a reasonably fast machine you can compress 100 KB in 250 ms).
It's pretty difficult to safely process untrusted JSON input. You need to use stream-based parsing and decide on a maximum complexity threshold, or else your server might be in for a surprise. See for instance Armin Ronacher's Start Writing More Classes:
If your neat little web server is getting 10000 requests a second through gevent but is using json.loads then I can probably make it crawl to a halt by sending it 16MB of well crafted and nested JSON that hog away all your CPU.

Does json add any overhead compared to returning raw html?

I have a ajax call that is currently returning raw html that I inject into the page.
Now my issue is, that in some situations, I need to return back a count value along with the raw html, and if that count is > 10, I need to fire another jquery operation to make something visible.
The best approach here would be to return json then right? So I can do:
jsonReturned.counter
jsonReturned.html
Do you agree?
Also, out of curiosity more than anything, is json any more expensive performance wise? It is just a simple object with properties but just asking.

This question reserves some discretion, but in my opinion, there is no efficiency concern with returning JSON instead of raw HTML. As you stated, you can easily return multiple messages without the need for extra parsing (I'll often have a status, a message, and data for example).
I haven't run any numbers, but I can tell you I've used JSON via AJAX in very heavy traffic (millions of requests) situations with no efficiency concerns.

I agree, json is the way to go. There is no doubt that it is a performance hit. The question is: is it a negligible hit? My opinion is that it is negligible. Javascript is pretty fast these days in the browser. You should be ok.

JSON'll likely be more compact than HTML, since bracket/quote pairs are ALWAYS going to be terser than the smallest possible tag combos. {}/[],"" v.s. <a></a>. 2 chars v.s. 7 is a win in my book. however, if your data requires huge amounts of escaping with \, then JSON would be a net loss, and could double the size of any given string.

Sources of additional overhead include
Download size and load time of a JSON parser for older browsers
JSON parse time.
Larger download size for the HTML content since the JSON string has to contain the HTML string plus quotes at least.
The only way to know whether these are significant to your app is to measure them. Neither of them are obviously large and none of them are more than O(n).

String concatenation vs string buffers in Javascript

I was reading this book - Professional Javascript for Web Developers where the author mentions string concatenation is an expensive operation compared to using an array to store strings and then using the join method to create the final string. Curious, I did a couple test here to see how much time it would save and this is what I got -
http://jsbin.com/ivako
Somehow, the Firefox usually produces somewhat similar times to both ways, but in IE, string concatenation is much much faster. So, can this idea now be considered outdated (browsers probably have improved since?

Even if it were true and the join() was faster than concatenation it wouldn't matter. We are talking about tiny amounts of miliseconds here which are completely negligible.
I would always prefer well structured and easy to read code over microscopic performance boost and I think that using concatenation looks better and is easier to read.
Just my two cents.

On my system (IE 8 in Windows 7) the times of StringBuilder in that test very from about 70-100% in range -- that is, it is not stable -- although the mean is about 95% of that of the normal appending.
While it's easy now just to say "premature optimization" (and I suspect that in almost every case it is) there are things worth considering:
The problem with repeated string concatenation comes repeated memory allocations and repeated data copies (advanced string data-types can reduce/eliminate much of this, but let's keep assuming a simplistic model for now). From this lets raise some questions:
What memory allocation is used? In the naive case each str+=x requires str.length+x.length new memory to be allocated. The standard C malloc, for instance, is a rather poor memory allocator. JS implementations have undergone changes over the years including, among other things, better memory subsystems. Of course these changes don't stop there and touch really all aspects of modern JS code. Because now ancient implementations may have been incredibly slow in certain tasks does not necessarily imply that the same issues still exist, or to the same extents.
As with above the implementation of Array.join is very important. If it does NOT pre-allocate memory for the final string before building it then it only saves on data-copy costs -- how many GB/s is main memory these days? 10,000 x 50 is hardly pushing a limit. A smart Array.join operation with a POOR MEMORY ALLOCATOR would be expected to perform a good bit better simple because the amount of re-allocations is reduced. This difference would be expected to be minimized as allocation cost decreases.
The micro-benchmark code may be flawed depending on if the JS engine creates a new object per each UNIQUE string literal or not. (This would bias it towards the Array.join method but needs to be considered in general).
The benchmark is indeed a micro benchmark :)
Increase the growing size should have an impact of performance based on any or all (and then some) above conditions. It is generally easy to show extreme cases favoring some method or another -- the expected use case is generally of more importance.
Although, quite honestly, for any form of sane string building, I would just use normal string concatenation until such a time it was determined to be a bottleneck, if ever.
I would re-read the above statement from the book and see if there perhaps other implicit considerations the author was indeed meaning to invoke such as "for very large strings" or "insane amounts of string operations" or "in JScript/IE6", etc... If not, then such a statement is about as useful as "Insert sort is O(n*n)" [the realized costs depend upon the state of the data and the size of n of course].
And the disclaimer: the speed of the code depends upon the browser, operating system, the underlying hardware, moon gravitational forces and, of course, how your computer feels about you.

In principle the book is right. Joining an array should be much faster than repeatedly concatenating to the same string. As a simple algorithm on immutable strings it is demonstrably faster.
The trick is: JavaScript authors, being largely non-expert dabblers, have written a load of code out there in the wild that uses concatenating, and relatively little ‘good’ code that using methods like array-join. The upshot is that browser authors can get a better improvement in speed on the average web page by catering for and optimising the ‘bad’, more common option of concatenation.
So that's what happened. The newer browser versions have some fairly hairy optimisation stuff that detects when you're doing a load of concatenations, and hacks it about so that internally it is working more like an array-join, at more or less the same speed.

I actually have some experience in this area, since my primary product is a big, IE-only webapp that does a LOT of string concatenation in order to build up XML docs to send to the server. For example, in the worst case a page might have 5-10 iframes, each with a few hundred text boxes that each have 5-10 expando properties.
For something like our save function, we iterate through every tab (iframe) and every entity on that tab, pull out all the expando properties on each entity and stuff them all into a giant XML document.
When profiling and improving our save method, we found that using string concatention in IE7 was a lot slower than using the array of strings method. Some other points of interest were that accessing DOM object expando properties is really slow, so we put them all into javascript arrays instead. Finally, generating the javascript arrays themselves is actually best done on the server, then you write then onto the page as a literal control to be exectued when the page loads.

As we know, not all browsers are created equal. Because of this, performance in different areas is guaranteed to differ from browser to browser.
That aside, I noticed the same results as you did; however, after removing the unnecessary buffer class, and just using an array directly and a 10000 character string, the results were even tighter/consistent (in FF 3.0.12): http://jsbin.com/ehalu/
Unless you're doing a great deal of string concatenation, I would say that this type of optimization is a micro-optimization. Your time might be better spent limiting DOM reflows and queries (generally the use of document.getElementbyById/getElementByTagName), implementing caching of AJAX results (where applicable), and exploiting event bubbling (there's a link somewhere, I just can't find it now).

Okay, regarding this here is a related module:
http://www.openjsan.org/doc/s/sh/shogo4405/String/Buffer/0.0.1/lib/String/Buffer.html
This is an effective means of creating String buffers, by using
var buffer = new String.Buffer();
buffer.append("foo", "bar");
This is the fastest sort of implementation of String buffers I know of. First of all if you are implementing String Buffers, don't use push because that is a built-in method and it is slow, for one push iterates over the entire arguments array, rather then just adding one element.
It all really depends upon the implementation of the join method, some implementations of the join method are really slow and some are relatively large.

We Keep Coding

JavaScript is the programming language of the Web.