Such as
var myName = 'Bob';
myName += ' is a good name';
For long operations of this, it there a better way to do it? Maybe with a StringBuffer type of structure?
Thanks! :)
The ‘better’ way would be:
var nameparts= ['Bob'];
nameparts.push(' is a good name');
...
nameparts.join('');
However, most modern JavaScript implementations do now detect naïve concatenation and can in many cases optimise it away, because so many people have (alas) written code this way. So in practice the ‘good’ method won't today be as much faster as it once was.
Huge performance boost can be obtained by simply using intermediate strings! It is possible to create StringBuffer-like class in JavaScript to gain even more performance boost.
See the complete article and graphs here.
Efficiency of string concatenation will depend on a browser you are using. You can google for statistics, there is also a googleTalk available on youtube. From what I can remember, most browsers deal with string concatenations efficiently when number of elements is below few thousand. After that IE slows down at exponential rate, when firefox, chrome and safari are doing much better. This may change since IE9 isn't that far away now.
I once read an article about this subject which offerer some code to build buffered strings with arrays:
http://www.softwaresecretweapons.com/jspwiki/javascriptstringconcatenation
I tested it myself and it was way faster in IE... and way slower in Firefox!
To sum up: there're many JavaScript engines out there and we can't really rely on this sort of implementation details. If it's ever an issue, you'll notice. Before that, don't care too much.
Related
I am writing a transpiler from my desktop programming language to JavaScript.
I use gmp on the desktop, so am writing a thin wrapper to mimic the same entry points but use BigInt under the hood.
(NB Emscripten etc NOT involved) So far mpz and mpq are working pretty well, ~30 entry points each, done by hand, so now I'm wondering about mpfr.
Could mpfr be done as mpq with implied/capped denominator of 10^k (where k can be negative), and
accordingly truncated/BigInt numerator? I expect a bit of a struggle with mpfr_const_pi(), mpfr_sin/log/exp(), etc. I say 10^k but am not even certain of that vs 2^k.
I have studied https://github.com/MikeMcl/big.js and friends but no offence meant all that seems to pre-date BigInts, and I simply cannot find anything that implements floats via BigInt.
In short, what code needs to be in mpfr.js so that the following will work (ideally unaltered), obviously any partial ideas, hints, or tips are just as welcome as a full-blown working example. You can assume (eg) mpz_get_str() is available, or of course you can go with using (say) BigInt.toString() etc directly, and not overly panic about precisely where the decimal point has to go, or any "%.75Rf" related nuances. I just need something to get the ball rolling.
<script src="mpfr.js"></script>
<script>
mpfr_set_default_prec(252); // (enough for 75 decimal places)
let one_third = mpfr_init(1); // (ok, non-std syntax, anyway init to 1)
mpfr_div_si(one_third,one_third,3);
console.log(mpfr_sprintf("%.75Rf",one_third);
</script>
I finally found this https://jrsinclair.com/articles/2020/sick-of-the-jokes-write-your-own-arbitrary-precision-javascript-math-library/ and I've now got pretty much everything I needed working.
While it is exactly what I was looking for, I should point out that it is deeply flawed, for instance there is a frankly outrageous memoize() function liberally applied, which no doubt vastly improved some pointless benchmark but would totally cripple real-world use, and other gross ineffiencies such as exp10(n) returns BigInt(1${[...new Array(n)].map(() => 0).join("")}), instead of the much saner
10n**BigInt(n). Nevertheless it is quite spirited and undeniably well meant, with plenty of good ideas.
Should anyone wish to see the results of my efforts I have uploaded the latest version: https://github.com/petelomax/Phix/blob/master/pwa/builtins/mpfr.js
The title says it all. I'm going to be parsing a very large JSON string and was curious what the complexity of this built in method was.
I would hope that it's θ(n) where n is the number of characters in the string since it can determine whether there is a syntax error or not.
I tried searching but couldn't come up with anything.
JSON is very simple grammar that does not require even lookaheads. As soon as GC is not involved then it is purely O(n).
I do not know of the implementations in browsers, but your assumption is correct to a certain point. If the JSON includes mainly strings, it will be straight forward and very linear. If you have many floating points, it will take a bit of time to convert the numbers, but again quite linear (numbers with more digits take slightly longer, but in comparison to a long string... very similar).
Since in most cases arrays and objects are declared as maps, the memory allocation grows as required and will generally be linear. Many (if not most) implementations will make use of Java as a backend. This means garbage collection and thus a quite impossible way to know for sure how much time will be required to transform all the data as it will very much depend on things such as the size of the memory model used on the target computer and how often the garbage collection runs. However, it should generally just grow as items are added to the map and it will thus mostly look like it is linear as well. I would not expect an implementation to make use of a realloc() which would mean copying data and thus being slower and slower as an array/object grows bigger and bigger.
Was curious a little more so to add more info I believe this is the "high-level" implementation of JSON.parse. I tried finding if Chromium has their own source for it and not sure if this is it? This is going off of the source from Github.
Things to note:
worst case is probably the scenario of handling objects which requires O(N) time where N is the number of characters.
in the case a reviver function is passed, it has to rewalk the entire object after it's created but that only happens once so it's fairly negligible. Also depends what the reviver function is doing and you will have to account for it's own time complexity.
The Benchmark: http://jsperf.com/substringing
So, I'm starting up my very first HTML5 browser-based client-side project. It's going to have to parse very, very large text files into, essentially, an array or arrays of objects. I know how I'm going to go about coding it; my primary concern right now is getting the parser code as fast as I can get it, and my primary testbed is Chrome. However, while looking at the differences between substring methods (I haven't touched JavaScript in a long, long time), I noticed that this benchmark was incredibly slow in Chrome compared to FireFox. Why?
My first assumption is that it has to do with the way FireFox's JS engine would handle string objects, and that for FireFox this operation is simple pointer manipulation, while for Chrome it's actually doing hard copies. But, I'm not sure why Chrome wouldn't do pointer manipulation or why FireFox would. Anyone have some insight?
JSPerf appears to be throwing out my FireFox results, not displaying them on the BrowserScope. For me, I'm getting 9,568,203 ±1.44% Ops/sec on .substr() in FF4.
Edit: So I see a FF3.5 performance result down there actually below Chrome. So I decided to test my pointers hypothesis. This brought me to a 2nd revision of my Substrings test, which is doing 1,092,718±1.62% Ops/sec in FF4 versus 1,195±3.81% Ops/sec in Chrome, down to only 1000x faster, but still an inexplicable difference in performance.
A postscriptum: No, I'm not concerned one lick about Internet Explorer. I'm concerned about trying to improve my skills and getting to know this language on a deeper level.
In the case of Spidermonkey (the JS engine in Firefox), a substring() call just creates a new "dependent string": a string object that stores a pointer to the thing it's a substring off and the start and end offsets. This is precisely to make substring() fast, and is an obvious optimization given immutable strings.
As for why V8 does not do that... A possibility is that V8 is trying to save space: in the dependent string setup if you hold on to the substring but forget the original string, the original string can't get GCed because the substring is using part of its string data.
In any case, I just looked at the V8 source, ans it looks like they just don't do any sort of dependent strings at all; the comments don't explain why they don't, though.
[Update, 12/2013]: A few months after I gave the above answer V8 added support for dependent strings, as Paul Draper points out.
Have you eliminated the reading of .length from your benchmark results?
I believe V8 has a few representations of a string:
1. a sequence of ASCII bytes
2. a sequence of UTF-16 code units.
3. a slice of a string (result of substring)
4. a concatenation of two strings.
Number 4 is what makes string += efficient.
I'm just guessing but if they're trying to pack two string pointers and a length into a small space, they may not be able to cache large lengths with the pointers, so may end up walking the joined link list in order to compute the length. This assumes of course that Array.prototype.join creates strings of form (4) from the array parts.
It does lead to a testable hypothesis which would explain the discrepancy even absent buffer copies.
EDIT:
I looked through the V8 source code and StringBuilderConcat is where I would start pulling, especially runtime.cc.
I was reading this book - Professional Javascript for Web Developers where the author mentions string concatenation is an expensive operation compared to using an array to store strings and then using the join method to create the final string. Curious, I did a couple test here to see how much time it would save and this is what I got -
http://jsbin.com/ivako
Somehow, the Firefox usually produces somewhat similar times to both ways, but in IE, string concatenation is much much faster. So, can this idea now be considered outdated (browsers probably have improved since?
Even if it were true and the join() was faster than concatenation it wouldn't matter. We are talking about tiny amounts of miliseconds here which are completely negligible.
I would always prefer well structured and easy to read code over microscopic performance boost and I think that using concatenation looks better and is easier to read.
Just my two cents.
On my system (IE 8 in Windows 7) the times of StringBuilder in that test very from about 70-100% in range -- that is, it is not stable -- although the mean is about 95% of that of the normal appending.
While it's easy now just to say "premature optimization" (and I suspect that in almost every case it is) there are things worth considering:
The problem with repeated string concatenation comes repeated memory allocations and repeated data copies (advanced string data-types can reduce/eliminate much of this, but let's keep assuming a simplistic model for now). From this lets raise some questions:
What memory allocation is used? In the naive case each str+=x requires str.length+x.length new memory to be allocated. The standard C malloc, for instance, is a rather poor memory allocator. JS implementations have undergone changes over the years including, among other things, better memory subsystems. Of course these changes don't stop there and touch really all aspects of modern JS code. Because now ancient implementations may have been incredibly slow in certain tasks does not necessarily imply that the same issues still exist, or to the same extents.
As with above the implementation of Array.join is very important. If it does NOT pre-allocate memory for the final string before building it then it only saves on data-copy costs -- how many GB/s is main memory these days? 10,000 x 50 is hardly pushing a limit. A smart Array.join operation with a POOR MEMORY ALLOCATOR would be expected to perform a good bit better simple because the amount of re-allocations is reduced. This difference would be expected to be minimized as allocation cost decreases.
The micro-benchmark code may be flawed depending on if the JS engine creates a new object per each UNIQUE string literal or not. (This would bias it towards the Array.join method but needs to be considered in general).
The benchmark is indeed a micro benchmark :)
Increase the growing size should have an impact of performance based on any or all (and then some) above conditions. It is generally easy to show extreme cases favoring some method or another -- the expected use case is generally of more importance.
Although, quite honestly, for any form of sane string building, I would just use normal string concatenation until such a time it was determined to be a bottleneck, if ever.
I would re-read the above statement from the book and see if there perhaps other implicit considerations the author was indeed meaning to invoke such as "for very large strings" or "insane amounts of string operations" or "in JScript/IE6", etc... If not, then such a statement is about as useful as "Insert sort is O(n*n)" [the realized costs depend upon the state of the data and the size of n of course].
And the disclaimer: the speed of the code depends upon the browser, operating system, the underlying hardware, moon gravitational forces and, of course, how your computer feels about you.
In principle the book is right. Joining an array should be much faster than repeatedly concatenating to the same string. As a simple algorithm on immutable strings it is demonstrably faster.
The trick is: JavaScript authors, being largely non-expert dabblers, have written a load of code out there in the wild that uses concatenating, and relatively little ‘good’ code that using methods like array-join. The upshot is that browser authors can get a better improvement in speed on the average web page by catering for and optimising the ‘bad’, more common option of concatenation.
So that's what happened. The newer browser versions have some fairly hairy optimisation stuff that detects when you're doing a load of concatenations, and hacks it about so that internally it is working more like an array-join, at more or less the same speed.
I actually have some experience in this area, since my primary product is a big, IE-only webapp that does a LOT of string concatenation in order to build up XML docs to send to the server. For example, in the worst case a page might have 5-10 iframes, each with a few hundred text boxes that each have 5-10 expando properties.
For something like our save function, we iterate through every tab (iframe) and every entity on that tab, pull out all the expando properties on each entity and stuff them all into a giant XML document.
When profiling and improving our save method, we found that using string concatention in IE7 was a lot slower than using the array of strings method. Some other points of interest were that accessing DOM object expando properties is really slow, so we put them all into javascript arrays instead. Finally, generating the javascript arrays themselves is actually best done on the server, then you write then onto the page as a literal control to be exectued when the page loads.
As we know, not all browsers are created equal. Because of this, performance in different areas is guaranteed to differ from browser to browser.
That aside, I noticed the same results as you did; however, after removing the unnecessary buffer class, and just using an array directly and a 10000 character string, the results were even tighter/consistent (in FF 3.0.12): http://jsbin.com/ehalu/
Unless you're doing a great deal of string concatenation, I would say that this type of optimization is a micro-optimization. Your time might be better spent limiting DOM reflows and queries (generally the use of document.getElementbyById/getElementByTagName), implementing caching of AJAX results (where applicable), and exploiting event bubbling (there's a link somewhere, I just can't find it now).
Okay, regarding this here is a related module:
http://www.openjsan.org/doc/s/sh/shogo4405/String/Buffer/0.0.1/lib/String/Buffer.html
This is an effective means of creating String buffers, by using
var buffer = new String.Buffer();
buffer.append("foo", "bar");
This is the fastest sort of implementation of String buffers I know of. First of all if you are implementing String Buffers, don't use push because that is a built-in method and it is slow, for one push iterates over the entire arguments array, rather then just adding one element.
It all really depends upon the implementation of the join method, some implementations of the join method are really slow and some are relatively large.
I'll do some work on a line separated string. Which one will be faster, to split the text via String.split first and then walk on the resultant array or directly walk the whole text via a reg exp and construct the final array on the way?
Well, the best way to get your answer is to just take 2 minutes and write a loop that does it both ways a thousand times and check firebug to see which one is faster ;)
I've had to optimize a lot of string munging while working on MXHR and in my experience, plain String methods are significantly faster than RegExps in current browsers. Use RegExps on the shortest Strings possible and do everything you possibly can with String methods.
For example, I use this little number in my current code:
var mime = mimeAndPayload.shift().split('Content-Type:', 2)[1].split(";", 1)[0].replace(' ', '');
It's ugly as hell, but believe it or not it's significantly faster than the equivalent RegExp under high load.
While this is 2½ years late, hopefully this helps shed some light on the matter for any future viewers: http://jsperf.com/split-join-vs-regex-replace (Includes benchmarks results for multiple browsers, as well the functional benchmark code itself)
I expect that using split() will be much faster. It depends upon many specifics, number of lines vs. length, complexity of regex, etc.