When writing performance-sensitive code in Javascript which operates on large numeric arrays (think a linear algebra package, operating on integers or floating-point numbers), one always wants the the JIT to help out as much as possible. Roughly this means:
We always want our arrays to be packed SMIs (small integers) or packed Doubles, depending on whether we're doing integer or floating-point calculations.
We always want to be passing the same type of thing to functions, so that they don't get labelled "megamorphic" and deoptimised. For instance, we always want to be calling vec.add(x, y) with both x and y being packed SMI arrays, or both packed Double arrays.
We want functions to be inlined as much as possible.
When one strays outside of these cases, a sudden and drastic performance dropoff occurs. This can happen for various innocuous reasons:
You might turn a packed SMI array into a packed Double array via a seemingly innocuous operation, like the equivalent of myArray.map(x => -x). This is actually the "best" bad case, since packed Double arrays are still very fast.
You might turn a packed array into a generic boxed array, for example by mapping the array over a function which (unexpectedly) returned null or undefined. This bad case is fairly easy to avoid.
You might deoptimise a whole function such as vec.add() by passing in too many types of things and turning it megamorphic. This could happen if you want to do "generic programming", where vec.add() is used both in cases where you're not being careful about types (so it sees a lot of types come in) and in cases where you want to eke out maximum performance (it should only ever receive boxed doubles, for instance).
My question is more of a soft question, about how one writes high-performance Javascript code in light of the considerations above, while still keeping the code nice and readable. Some specific sub-questions so that you know what kind of answer I'm aiming for:
Is there a set of guidelines somewhere on how to program while staying in the world of packed SMI arrays (for instance)?
Is possible to do generic high-performance programming in Javascript without using something like a macro system to inline things like vec.add() into callsites?
How does one modularise high-performance code into libaries in light of things like megamorphic call sites and deoptimisations? For instance, if I am happily using Linear Algebra package A at high speed, and then I import a package B that depends on A, but B calls it with other types and deoptimises it, suddenly (without my code changing) my code runs slower.
Are there any good easy to use measurement tools for checking what the Javascript engine is doing internally with types?
V8 developer here. Given the amount of interest in this question, and the lack of other answers, I can give this a shot; I'm afraid it won't be the answer you were hoping for though.
Is there a set of guidelines somewhere on how to program while staying in the world of packed SMI arrays (for instance)?
Short answer: it's right here: const guidelines = ["keep your integers small enough"].
Longer answer: giving a comprehensive set of guidelines is difficult for various reasons. In general, our opinion is that JavaScript developers should write code that makes sense to them and their use case, and JavaScript engine developers should figure out how to run that code fast on their engines. On the flip side, there are obviously some limitations to that ideal, in the sense that some coding patterns will always have higher performance costs than others, regardless of engine implementation choices and optimization efforts.
When we talk about performance advice, we try to keep that in mind, and carefully estimate what recommendations have a high likelihood of remaining valid across many engines and many years, and also are reasonably idiomatic/non-intrusive.
Getting back to the example at hand: using Smis internally is supposed to be an implementation detail that user code doesn't need to know about. It'll make some cases more efficient, and shouldn't hurt in other cases. Not all engines use Smis (for example, AFAIK Firefox/Spidermonkey historically hasn't; I've heard that for some cases they do use Smis these days; but I don't know any details and can't speak with any authority on the matter). In V8, the size of Smis is an internal detail, and has actually been changing over time and over versions. On 32-bit platforms, which used to be the majority use case, Smis have always been 31-bit signed integers; on 64-bit platforms they used to be 32-bit signed integers, which recently seemed like the most common case, until in Chrome 80 we shipped "pointer compression" for 64-bit architectures, which required lowering Smi size to the 31 bits known from 32-bit platforms. If you happened to have based an implementation on the assumption that Smis are typically 32 bits, you'd get unfortunate situations like this.
Thankfully, as you noted, double arrays are still very fast. For numerics-heavy code, it probably makes sense to assume/target double arrays. Given the prevalence of doubles in JavaScript, it is reasonable to assume that all engines have good support for doubles and double arrays.
Is possible to do generic high-performance programming in Javascript without using something like a macro system to inline things like vec.add() into callsites?
"generic" is generally at odds with "high-performance". This is unrelated to JavaScript, or to specific engine implementations.
"Generic" code means that decisions have to be made at runtime. Every time you execute a function, code has to run to determine, say, "is x an integer? If so, take that code path. Is x a string? Then jump over here. Is it an object? Does it have .valueOf? No? Then maybe .toString()? Maybe on its prototype chain? Call that, and restart from the beginning with its result". "High-performance" optimized code is essentially built on the idea to drop all these dynamic checks; that's only possible when the engine/compiler has some way to infer types ahead of time: if it can prove (or assume with high enough probability) that x is always going to be an integer, then it only needs to generate code for that case (guarded by a type check if unproven assumptions were involved).
Inlining is orthogonal to all this. A "generic" function can still get inlined. In some cases, the compiler might be able to propagate type information into the inlined function to reduce polymorphism there.
(For comparison: C++, being a statically compiled language, has templates to solve a related problem. In short, they let the programmer explicitly instruct the compiler to create specialized copies of functions (or entire classes), parameterized on given types. That's a nice solution for some cases, but not without its own set of drawbacks, for example long compile times and large binaries. JavaScript, of course, has no such thing as templates. You could use eval to build a system that's somewhat similar, but then you'd run into similar drawbacks: you'd have to do the equivalent of the C++ compiler's work at runtime, and you'd have to worry about the sheer amount of code you're generating.)
How does one modularise high-performance code into libaries in light of things like megamorphic call sites and deoptimisations? For instance, if I am happily using Linear Algebra package A at high speed, and then I import a package B that depends on A, but B calls it with other types and deoptimises it, suddenly (without my code changing) my code runs slower.
Yes, that's a general problem with JavaScript. V8 used to implement certain builtins (things like Array.sort) in JavaScript internally, and this problem (which we call "type feedback pollution") was one of the primary reasons why we have entirely moved away from that technique.
That said, for numerical code, there aren't all that many types (only Smis and doubles), and as you noted they should have similar performance in practice, so while type feedback pollution is indeed a theoretical concern, and in some cases can have significant impact, it's also fairly likely that in linear algebra scenarios you won't see a measurable difference.
Also, inside the engine there are many more situations than "one type == fast" and "more than one type == slow". If a given operation has seen both Smis and doubles, that's totally fine. Loading elements from two kinds of arrays is fine too. We use the term "megamorphic" for the situation when a load has seen so many different types that it's given up on tracking them individually and instead uses a more generic mechanism that scales better to large numbers of types -- a function containing such loads can still get optimized. A "deoptimization" is the very specific act of having to throw away optimized code for a function because a new type is seen that hasn't been seen previously, and that the optimized code therefore isn't equipped to handle. But even that is fine: just go back to unoptimized code to collect more type feedback, and optimize again later. If this happens a couple of times, then it's nothing to worry about; it only becomes a problem in pathologically bad cases.
So the summary of all that is: don't worry about it. Just write reasonable code, let the engine deal with it. And by "reasonable", I mean: what makes sense for your use case, is readable, maintainable, uses efficient algorithms, doesn't contain bugs like reading beyond the length of arrays. Ideally, that's all there is to it, and you don't need to do anything else. If it makes you feel better to do something, and/or if you're actually observing performance issues, I can offer two ideas:
Using TypeScript can help. Big fat warning: TypeScript's types are aimed at developer productivity, not execution performance (and as it turns out, those two perspectives have very different requirements from a type system). That said, there is some overlap: e.g. if you consistently annotate things as number, then the TS compiler will warn you if you accidentally put null into an array or function that's supposed to only contain/operate on numbers. Of course, discipline is still required: a single number_func(random_object as number) escape hatch can silently undermine everything, because the correctness of the type annotations is not enforced anywhere.
Using TypedArrays can also help. They have a little more overhead (memory consumption and allocation speed) per array compared to regular JavaScript arrays (so if you need many small arrays, then regular arrays are probably more efficient), and they're less flexible because they can't grow or shrink after allocation, but they do provide the guarantee that all elements have exactly one type.
Are there any good easy to use measurement tools for checking what the Javascript engine is doing internally with types?
No, and that's intentional. As explained above, we don't want you to specifically tailor your code to whatever patterns V8 can optimize particularly well today, and we don't believe that you really want to do that either. That set of things can change in either direction: if there's a pattern you'd love to use, we might optimize for that in a future version (we have previously toyed with the idea of storing unboxed 32-bit integers as array elements... but work on that hasn't started yet, so no promises); and sometimes if there's a pattern we used to optimize for in the past, we might decide to drop that if it gets in the way of other, more important/impactful optimizations. Also, things like inlining heuristics are notoriously difficult to get right, so making the right inlining decision at the right time is an area of ongoing research and corresponding changes to engine/compiler behavior; which makes this another case where it would be unfortunate for everyone (you and us) if you spent a lot of time tweaking your code until some set of current browser versions does approximately the inlining decisions you think (or know?) are best, only to come back half a year later to realize that then-current browsers have changed their heuristics.
You can, of course, always measure performance of your application as a whole -- that's what ultimately matters, not what choices specifically the engine made internally. Beware of microbenchmarks, for they are misleading: if you only extract two lines of code and benchmark those, then chances are that the scenario will be sufficiently different (e.g., different type feedback) that the engine will make very different decisions.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
Take, for instance, a JS web engine that implements associative arrays (objects) by using hash tables. but as we know hash tables have worse case O(n) because collisions are inevitable.
Suppose I begin to develop a new data structure using Javascript, a data structure such as LinkedList that has worse case O(1) for insert/delete. But since I implement it with object/array. Then it must be true that my implementation is also at minimum worse case O(n) as well.
I'm aware that this engine optimizes very well, and a good hash function will generate O(1) on average. However, I just want to confirm my realization that this isn't all as straightforward as the textbook says so. or is it?
I suppose at the root of all data structures implements with an array, since the access is always O(1), then shouldn't all data structures be built with an array without intermediary structures? also, the dynamic array still has delete O(n) cant that be the same problem that can trickle down just like my earlier example?
is this where the benefit of using a low-level programming language is better than using a high-level language? Such that low level there isn't so much abstraction and the textbook complexity numbers can actually match?
apologize if my ideas are all over the place.
But since I implement [my custom data structure] with object/array. Then it must be true that my implementation is also at minimum worse case O(n) as well.
No. Your linked list does not use an object with n keys anywhere. You'll have a
const linkedListNode = {
value: …,
next: null,
};
but even if this was implemented using a HashTable with O(n) worst-case member access, in your case n=2. There are not arbitrarily many properties in your object, just two. That's how you get back to O(1).
Is this where the benefit of using a low level programming language is better than using high level language? Such that low level there isn't so much abstraction and the textbook complexity numbers can actually match?
No. Even in a lower-level programming language, you can begin to question the underlying abstraction. You think in C, an indexed array access in memory is constant time? No, since page faults and other caching shenanigans come into play.
This is why textbook complexity is always defined in terms of a machine model. As long as you define object property access in your JavaScript execution model as constant-time (and it's a very reasonable assumption to do that! It closely resembles the real world), your numbers do apply to JavaScript code as well. Sure, you can try unravelling abstractions and analyse your high-level algorithms in terms of the primitives of a lower level, but there's no point in doing that. It's precisely why we have these abstractions in the first place.
"associative arrays (objects) by using hash tables" -> Javascript objects are complicated and are much more than just "it's a hash map". I don't know the exact technical details but I think they change to hash map after a certain amount of values are stored in them, along with that they also store metadata which is used on other algorithms like Object.keys to automatically sort the keys after they've been pulled out of the hash map. Again I don't know the technical details but I do know that, it's not straight forward.
"as we know hash tables has worse case O(n) because collisions are inevitable" -> It depends on what hashing you're using, but more than that even if collisions are inevitable it's not correct to just claim "it's worse case O(n)" and leave it at that because the probability of it being O(n) logarithmically declines to 0, the chances of it finding a collision time after time again and again is extremely unlikely, so while it can perhaps find a collision that doesn't effectively describe the time complexity.
"it must be true that my implementation is also at minimum worse case O(n) as well" -> Not correct, you're speaking about two different things. If you build a linked list each node will be connected to the next node using a heap reference, which has nothing to do with javascript objects. Iterating through an entire linked list will be O(n) but that's because of having to iterate over every next node, not because of anything with objects or hashes.
"worse case O(1) for insert/delete" -> This is only true if you have the reference to the node where you want to insert/delete it, otherwise you'll have to search through it before insertion/deletion. But that's exactly the same in javascript.
"then shouldn't all data structure be built with array without intermediary structures" -> Most data structures I know (like a list, stack, queue) are implemented on top of a normal array. The ones that aren't (like a binary tree, dictionary/map or a linked list) are not implemented on an array because it wouldn't really make sense. For example the whole point of using object references with a linked list is so that you can directly insert/delete something, using an array under the hood would just defeat the entire point of using a linked list when you're specifically trying to take advance of the object references.
"also dynamic array still has delete O(n) cant that be the same problem which can trickle down just like my earlier example" -> Not necessarily because when you wrap things inside an object and use an internal array inside, you can add metadata, indexes, hashes, things stored outside of the array (private to the object) and all sorts of other things to speed up and keep track of things on that array. So the complexity of what's used internally doesn't just automatically spill over to using it in another object. But you do need to be careful, like if you use a list then the inner workings of it in languages like C# is that it will double the internal array when you try to add more elements to it after it's full, this can result in a lot of memory waste.
That being said the use case for javascript is in 99% of cases not "optimize this by another 10ms", javascript is used because of its non IO blocking nature, streaming, async/await reactive programming and it's rapid speed of development, it's also used 90% for web communication, not some highly optimized graphics engine. So there's very very few edge cases where you need over 9000 complexity optimizations, feature development, code readability, maintainability, things like that are a much bigger deal in JS in general. Along with that in most use cases you aren't going to request 1m data records from your DB using JS, usually like 50 that you want to display on a page and for that you'll use the DB to optimize your query, there's hardly ever any need for using such large data structures in any JS development (or web development in general). It's a lot better to pull what you need and to request more or continuously stream what you need to the client. So a lot of the data structures (like a binary tree) aren't really relevant to things in JS unless it's a very specific use case.
I am writing a transpiler from my desktop programming language to JavaScript.
I use gmp on the desktop, so am writing a thin wrapper to mimic the same entry points but use BigInt under the hood.
(NB Emscripten etc NOT involved) So far mpz and mpq are working pretty well, ~30 entry points each, done by hand, so now I'm wondering about mpfr.
Could mpfr be done as mpq with implied/capped denominator of 10^k (where k can be negative), and
accordingly truncated/BigInt numerator? I expect a bit of a struggle with mpfr_const_pi(), mpfr_sin/log/exp(), etc. I say 10^k but am not even certain of that vs 2^k.
I have studied https://github.com/MikeMcl/big.js and friends but no offence meant all that seems to pre-date BigInts, and I simply cannot find anything that implements floats via BigInt.
In short, what code needs to be in mpfr.js so that the following will work (ideally unaltered), obviously any partial ideas, hints, or tips are just as welcome as a full-blown working example. You can assume (eg) mpz_get_str() is available, or of course you can go with using (say) BigInt.toString() etc directly, and not overly panic about precisely where the decimal point has to go, or any "%.75Rf" related nuances. I just need something to get the ball rolling.
<script src="mpfr.js"></script>
<script>
mpfr_set_default_prec(252); // (enough for 75 decimal places)
let one_third = mpfr_init(1); // (ok, non-std syntax, anyway init to 1)
mpfr_div_si(one_third,one_third,3);
console.log(mpfr_sprintf("%.75Rf",one_third);
</script>
I finally found this https://jrsinclair.com/articles/2020/sick-of-the-jokes-write-your-own-arbitrary-precision-javascript-math-library/ and I've now got pretty much everything I needed working.
While it is exactly what I was looking for, I should point out that it is deeply flawed, for instance there is a frankly outrageous memoize() function liberally applied, which no doubt vastly improved some pointless benchmark but would totally cripple real-world use, and other gross ineffiencies such as exp10(n) returns BigInt(1${[...new Array(n)].map(() => 0).join("")}), instead of the much saner
10n**BigInt(n). Nevertheless it is quite spirited and undeniably well meant, with plenty of good ideas.
Should anyone wish to see the results of my efforts I have uploaded the latest version: https://github.com/petelomax/Phix/blob/master/pwa/builtins/mpfr.js
The title says it all. I'm going to be parsing a very large JSON string and was curious what the complexity of this built in method was.
I would hope that it's θ(n) where n is the number of characters in the string since it can determine whether there is a syntax error or not.
I tried searching but couldn't come up with anything.
JSON is very simple grammar that does not require even lookaheads. As soon as GC is not involved then it is purely O(n).
I do not know of the implementations in browsers, but your assumption is correct to a certain point. If the JSON includes mainly strings, it will be straight forward and very linear. If you have many floating points, it will take a bit of time to convert the numbers, but again quite linear (numbers with more digits take slightly longer, but in comparison to a long string... very similar).
Since in most cases arrays and objects are declared as maps, the memory allocation grows as required and will generally be linear. Many (if not most) implementations will make use of Java as a backend. This means garbage collection and thus a quite impossible way to know for sure how much time will be required to transform all the data as it will very much depend on things such as the size of the memory model used on the target computer and how often the garbage collection runs. However, it should generally just grow as items are added to the map and it will thus mostly look like it is linear as well. I would not expect an implementation to make use of a realloc() which would mean copying data and thus being slower and slower as an array/object grows bigger and bigger.
Was curious a little more so to add more info I believe this is the "high-level" implementation of JSON.parse. I tried finding if Chromium has their own source for it and not sure if this is it? This is going off of the source from Github.
Things to note:
worst case is probably the scenario of handling objects which requires O(N) time where N is the number of characters.
in the case a reviver function is passed, it has to rewalk the entire object after it's created but that only happens once so it's fairly negligible. Also depends what the reviver function is doing and you will have to account for it's own time complexity.
I just ran into a very interesting issue when someone posted a jsperf benchmark that conflicted with a previous, nearly identical, benchmark I ran.
Chrome does something drastically different between these two lines:
new Array(99999); // jsperf ~50,000 ops/sec
new Array(100000); // jsperf ~1,700,000 ops/sec
benchmarks: http://jsperf.com/newarrayassign/2
I was wondering if anyone has any clue as to what's going on here!
(To clarify, I'm looking for some low-level details on the V8 internals, such as it's using a different data structure with one vs the other and what those structures are)
Just because this sounded pretty interesting, I searched through the V8 codebase for a static defined as 100000, and I found this kInitialMaxFastElementArray var, which is the subsequently used in the builtin ArrayConstructInitializeElements function function. While I'm not a c programmer and don't know the nitty-gritty here, you can see that it's using an if loop to determine if it's smaller than 100,000, and returning at different points based on that.
Well, there always is some threshold number when you design algorithms that adapt to the size of data (for example SharePoint changes the way it works when you add 1000 items to a list). So, the guess would be that you have found the actual number and the performance differs, as different data structures or algorithms are used.
I don't know what operating system you're using, but if this is Linux, I'd suspect that Chrome (i.e. malloc) is allocating memory from a program-managed heap (size determined using the sbrk system call, and the free lists are managed by the C standard library), but when you reach a certain size threshold, it switches to using mmap to ask the kernel to allocate large chunks of memory that don't interfere with the sbrk-managed heap.
Doug Lea describes how malloc works in the GNU C Library, better than I could. He wrote it.
Or maybe 100000 hits some kind of magic threshold for the amount of space needed that it triggers the garbage collector more frequently when trying to allocate memory.
I was reading this book - Professional Javascript for Web Developers where the author mentions string concatenation is an expensive operation compared to using an array to store strings and then using the join method to create the final string. Curious, I did a couple test here to see how much time it would save and this is what I got -
http://jsbin.com/ivako
Somehow, the Firefox usually produces somewhat similar times to both ways, but in IE, string concatenation is much much faster. So, can this idea now be considered outdated (browsers probably have improved since?
Even if it were true and the join() was faster than concatenation it wouldn't matter. We are talking about tiny amounts of miliseconds here which are completely negligible.
I would always prefer well structured and easy to read code over microscopic performance boost and I think that using concatenation looks better and is easier to read.
Just my two cents.
On my system (IE 8 in Windows 7) the times of StringBuilder in that test very from about 70-100% in range -- that is, it is not stable -- although the mean is about 95% of that of the normal appending.
While it's easy now just to say "premature optimization" (and I suspect that in almost every case it is) there are things worth considering:
The problem with repeated string concatenation comes repeated memory allocations and repeated data copies (advanced string data-types can reduce/eliminate much of this, but let's keep assuming a simplistic model for now). From this lets raise some questions:
What memory allocation is used? In the naive case each str+=x requires str.length+x.length new memory to be allocated. The standard C malloc, for instance, is a rather poor memory allocator. JS implementations have undergone changes over the years including, among other things, better memory subsystems. Of course these changes don't stop there and touch really all aspects of modern JS code. Because now ancient implementations may have been incredibly slow in certain tasks does not necessarily imply that the same issues still exist, or to the same extents.
As with above the implementation of Array.join is very important. If it does NOT pre-allocate memory for the final string before building it then it only saves on data-copy costs -- how many GB/s is main memory these days? 10,000 x 50 is hardly pushing a limit. A smart Array.join operation with a POOR MEMORY ALLOCATOR would be expected to perform a good bit better simple because the amount of re-allocations is reduced. This difference would be expected to be minimized as allocation cost decreases.
The micro-benchmark code may be flawed depending on if the JS engine creates a new object per each UNIQUE string literal or not. (This would bias it towards the Array.join method but needs to be considered in general).
The benchmark is indeed a micro benchmark :)
Increase the growing size should have an impact of performance based on any or all (and then some) above conditions. It is generally easy to show extreme cases favoring some method or another -- the expected use case is generally of more importance.
Although, quite honestly, for any form of sane string building, I would just use normal string concatenation until such a time it was determined to be a bottleneck, if ever.
I would re-read the above statement from the book and see if there perhaps other implicit considerations the author was indeed meaning to invoke such as "for very large strings" or "insane amounts of string operations" or "in JScript/IE6", etc... If not, then such a statement is about as useful as "Insert sort is O(n*n)" [the realized costs depend upon the state of the data and the size of n of course].
And the disclaimer: the speed of the code depends upon the browser, operating system, the underlying hardware, moon gravitational forces and, of course, how your computer feels about you.
In principle the book is right. Joining an array should be much faster than repeatedly concatenating to the same string. As a simple algorithm on immutable strings it is demonstrably faster.
The trick is: JavaScript authors, being largely non-expert dabblers, have written a load of code out there in the wild that uses concatenating, and relatively little ‘good’ code that using methods like array-join. The upshot is that browser authors can get a better improvement in speed on the average web page by catering for and optimising the ‘bad’, more common option of concatenation.
So that's what happened. The newer browser versions have some fairly hairy optimisation stuff that detects when you're doing a load of concatenations, and hacks it about so that internally it is working more like an array-join, at more or less the same speed.
I actually have some experience in this area, since my primary product is a big, IE-only webapp that does a LOT of string concatenation in order to build up XML docs to send to the server. For example, in the worst case a page might have 5-10 iframes, each with a few hundred text boxes that each have 5-10 expando properties.
For something like our save function, we iterate through every tab (iframe) and every entity on that tab, pull out all the expando properties on each entity and stuff them all into a giant XML document.
When profiling and improving our save method, we found that using string concatention in IE7 was a lot slower than using the array of strings method. Some other points of interest were that accessing DOM object expando properties is really slow, so we put them all into javascript arrays instead. Finally, generating the javascript arrays themselves is actually best done on the server, then you write then onto the page as a literal control to be exectued when the page loads.
As we know, not all browsers are created equal. Because of this, performance in different areas is guaranteed to differ from browser to browser.
That aside, I noticed the same results as you did; however, after removing the unnecessary buffer class, and just using an array directly and a 10000 character string, the results were even tighter/consistent (in FF 3.0.12): http://jsbin.com/ehalu/
Unless you're doing a great deal of string concatenation, I would say that this type of optimization is a micro-optimization. Your time might be better spent limiting DOM reflows and queries (generally the use of document.getElementbyById/getElementByTagName), implementing caching of AJAX results (where applicable), and exploiting event bubbling (there's a link somewhere, I just can't find it now).
Okay, regarding this here is a related module:
http://www.openjsan.org/doc/s/sh/shogo4405/String/Buffer/0.0.1/lib/String/Buffer.html
This is an effective means of creating String buffers, by using
var buffer = new String.Buffer();
buffer.append("foo", "bar");
This is the fastest sort of implementation of String buffers I know of. First of all if you are implementing String Buffers, don't use push because that is a built-in method and it is slow, for one push iterates over the entire arguments array, rather then just adding one element.
It all really depends upon the implementation of the join method, some implementations of the join method are really slow and some are relatively large.