Would you ever implement a linked list in Javascript? - javascript

I'm learning about data structures formally for the first time. To me, some of the benefits traditionally described of linked lists (easier memory allocation and faster input and deletion to the body of the list) seem moot in js given the way arrays work (like objects with numbered keys).
Can anyone give an example of why I'd want to use a linked list in javascript?

As the comments note, you'd do this if you need constant time insertion/deletion from the list.
There are two ways Array could be reasonably implemented that allow for populating non-contiguous indices:
As an actual C-like contiguous block of memory large enough to contain all the indices used; unpopulated indices would contain a reference to a dummy value so they wouldn't be treated as populated entries (and excess capacity beyond the max index could be left as garbage, since the length says it's not important)
As a hash table keyed by integers (based on a C-like array)
In either case, the cost to insert at the end is going to be amortized O(1), but with spikes of O(n) work done whenever the capacity of the underlying C-like array is exhausted (or the hash load threshold exceeded) and a reallocation is necessary.
If you insert at the beginning or in the middle (and the index in question is in use), you have the same possible work spikes as before, and new problems. No, the data for the existing indices doesn't change. But all the keys above the entry you're forcing in have to be incremented (actually or logically). If it's a plain C-like array implementation, that's mostly just a memmove (modulo the needs of garbage collections and the like). If it's a hash table implementation, you need to essentially read every element (from the highest index to the lowest, which means either sorting the keys or looking up every index below the current length, even if it's a sparse Array), pop it out, and reinsert it with a key value that is one higher. For a big Array, the cost could be enormous. It's possible the implementation in some browsers might do some clever nonsense by using an "offset" that would internally use negative "keys" relative to the offset to avoid the rehash while still inserting before index 0, but it would make all operations more expensive in exchange for making shift/unshift cheaper.
Point is, a linked list written in JavaScript would incur overhead for being JS (which usually runs more slowly than built-ins for similar magnitude work). But (ignoring the possibility of the memory manager itself introducing work spikes), it's predictable:
If you want to insert or delete from the beginning or the end, it's fixed work (one allocation or deallocation, and reference fixups)
If you are iterating and inserting/deleting as you go, aside from the cost of iteration, it's the same fixed work
If it turns out that offsets aren't used to implement shift/unshift in your browser's implementation (with them, shift would usually be cheap, and unshift cheap until you've unshift-ed more than you've shift-ed), then you'd definitely want to use a linked list when working with a FIFO queue of potentially unbounded size
It's wholly possible all browsers use offsets to optimize (avoiding memmove or re-keying under certain conditions, though it can't avoid occasional realloc and memmove or rehashing without wasting memory). I don't know one way or the other what the major browsers do, but if you're trying to write portable code with consistent performance, you probably don't want to assume that they sacrificed general performance to make the shift/unshift case faster with huge Arrays, they might have preferred to make all other operations faster and assumed shift/unshift would only be used with small Arrays where the cost is trivial.

I think there are some legit cases / reasons to prefer linked lists:
Reason 1:
As others already described, insertion and deletion operations perform fixed in O(1) time for linked lists. This might be a significant advantage depending on your problem.
Reason 2:
You can do things with linked lists that you can't do with arrays. This is due to the nature of a linked list -> every list entry has got references to it's follower (and prececessor if it's a double linked list).
Example1:
So if you have a linked list of items cou could store a reference to a "currentItem" in a variable. If you need to access the item's neighbors you could just write:
curItem.getNext();
or
curItem.getPrev();
Now you could argue that you could do the same with arrays while curItem is just the current Index. Basically this is true (and in most cases I would use that), but remember that in javascript it is possible to skip indices. So if your array looks like this, the index-method would not work as easily as thought:
myArray = [];
myArray[10] = 'a';
myArray[20] = 'b';
If you find yourself in that kind of situation, maybe a linked ist is the better choice.
However, if you need random access to the data (which is more seldom than it seems in most cases) you would go with arrays almost every time.
Example2:
If you want to "split" your list into 2 separate lists, this would also be possible O(1) time. With arrays you'd need to use slice, which is more imperformant. However, this is only an issue if you work with large datasets and perform this operation often. 20 repetitions of slicing of an array of 10 million strings took about 4 seconds on my machine, whereas the separation of one list into 2 took <1 second (providing you already have a reference to the list element where you want to start the separation of course!).
Conclusion:
In some cases you would benefit from a list's nature and it's performance. In some cases, you would suffer from it's imperformance (inability to randomly access multiple data).
I've never used a list in javascript, but similar structures like trees or graphs are used for data representation (in both backend and frontend javascript). So analyzing/learning list implementations in javascript is a good idea for more complex structures.

#noob-in-need I recommend you watch this video about the JavaScript garbage collector: https://www.youtube.com/watch?v=RWmzxyMf2cE
In it he explains why using a linked list can give you finer-grain control over your code's speed (as ShadowRanger discusses in depth) and also prevent unexpected garbage collection slowdowns. Plus it was filmed on talk-like-a-pirate day. :)

This boils down to the very basic differences of array vs linkedlist.
Inserting a new element in an array of elements is expensive, because room has to be created for the new elements and to create room, the existing elements need to be shifted. But for a linked list it's just change of references.
But reading and random access is easier in array than in linkedlist. Random access is not allowed. We have to access elements sequentially starting from the first node. So we cannot do binary search with linked lists.
Extra memory space for a pointer which is used to store reference for the next is required with each element of the list.

It is surprisingly rare to find use cases where linked lists outperform data structures built on top of arrays:
Arrays tend to be more cache friendly and adding to the back is fast on average (O(n) worst case, but O(1) amortized)
The constant-time benefits of a linked list falls apart once you need to search for the location. So, if you need to find the location in the list, you can remove it in O(1), but finding it is already O(n) and the array structure will most likely outperform linked list.
Still, scenarios exist where linked lists are used and where their constant-time operations shine. Schedulers are a good example because latency is important and guaranteed O(1) becomes a factor. Linked lists are used in the Linux kernel, but since you asked for a JavaScript example, the NodeJs runtime uses them for implementing timers:
Timer (design & implementation): lib/internal/timer.js
Linked list implementation: internal/linkedlist.js

Below you can find a simple comparison between Linked List and Array
Ref: https://en.wikipedia.org/wiki/Linked_list#Disadvantages

Related

How does Javascript's engine design affect user data structure implementations on worse case? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
Take, for instance, a JS web engine that implements associative arrays (objects) by using hash tables. but as we know hash tables have worse case O(n) because collisions are inevitable.
Suppose I begin to develop a new data structure using Javascript, a data structure such as LinkedList that has worse case O(1) for insert/delete. But since I implement it with object/array. Then it must be true that my implementation is also at minimum worse case O(n) as well.
I'm aware that this engine optimizes very well, and a good hash function will generate O(1) on average. However, I just want to confirm my realization that this isn't all as straightforward as the textbook says so. or is it?
I suppose at the root of all data structures implements with an array, since the access is always O(1), then shouldn't all data structures be built with an array without intermediary structures? also, the dynamic array still has delete O(n) cant that be the same problem that can trickle down just like my earlier example?
is this where the benefit of using a low-level programming language is better than using a high-level language? Such that low level there isn't so much abstraction and the textbook complexity numbers can actually match?
apologize if my ideas are all over the place.
But since I implement [my custom data structure] with object/array. Then it must be true that my implementation is also at minimum worse case O(n) as well.
No. Your linked list does not use an object with n keys anywhere. You'll have a
const linkedListNode = {
value: …,
next: null,
};
but even if this was implemented using a HashTable with O(n) worst-case member access, in your case n=2. There are not arbitrarily many properties in your object, just two. That's how you get back to O(1).
Is this where the benefit of using a low level programming language is better than using high level language? Such that low level there isn't so much abstraction and the textbook complexity numbers can actually match?
No. Even in a lower-level programming language, you can begin to question the underlying abstraction. You think in C, an indexed array access in memory is constant time? No, since page faults and other caching shenanigans come into play.
This is why textbook complexity is always defined in terms of a machine model. As long as you define object property access in your JavaScript execution model as constant-time (and it's a very reasonable assumption to do that! It closely resembles the real world), your numbers do apply to JavaScript code as well. Sure, you can try unravelling abstractions and analyse your high-level algorithms in terms of the primitives of a lower level, but there's no point in doing that. It's precisely why we have these abstractions in the first place.
"associative arrays (objects) by using hash tables" -> Javascript objects are complicated and are much more than just "it's a hash map". I don't know the exact technical details but I think they change to hash map after a certain amount of values are stored in them, along with that they also store metadata which is used on other algorithms like Object.keys to automatically sort the keys after they've been pulled out of the hash map. Again I don't know the technical details but I do know that, it's not straight forward.
"as we know hash tables has worse case O(n) because collisions are inevitable" -> It depends on what hashing you're using, but more than that even if collisions are inevitable it's not correct to just claim "it's worse case O(n)" and leave it at that because the probability of it being O(n) logarithmically declines to 0, the chances of it finding a collision time after time again and again is extremely unlikely, so while it can perhaps find a collision that doesn't effectively describe the time complexity.
"it must be true that my implementation is also at minimum worse case O(n) as well" -> Not correct, you're speaking about two different things. If you build a linked list each node will be connected to the next node using a heap reference, which has nothing to do with javascript objects. Iterating through an entire linked list will be O(n) but that's because of having to iterate over every next node, not because of anything with objects or hashes.
"worse case O(1) for insert/delete" -> This is only true if you have the reference to the node where you want to insert/delete it, otherwise you'll have to search through it before insertion/deletion. But that's exactly the same in javascript.
"then shouldn't all data structure be built with array without intermediary structures" -> Most data structures I know (like a list, stack, queue) are implemented on top of a normal array. The ones that aren't (like a binary tree, dictionary/map or a linked list) are not implemented on an array because it wouldn't really make sense. For example the whole point of using object references with a linked list is so that you can directly insert/delete something, using an array under the hood would just defeat the entire point of using a linked list when you're specifically trying to take advance of the object references.
"also dynamic array still has delete O(n) cant that be the same problem which can trickle down just like my earlier example" -> Not necessarily because when you wrap things inside an object and use an internal array inside, you can add metadata, indexes, hashes, things stored outside of the array (private to the object) and all sorts of other things to speed up and keep track of things on that array. So the complexity of what's used internally doesn't just automatically spill over to using it in another object. But you do need to be careful, like if you use a list then the inner workings of it in languages like C# is that it will double the internal array when you try to add more elements to it after it's full, this can result in a lot of memory waste.
That being said the use case for javascript is in 99% of cases not "optimize this by another 10ms", javascript is used because of its non IO blocking nature, streaming, async/await reactive programming and it's rapid speed of development, it's also used 90% for web communication, not some highly optimized graphics engine. So there's very very few edge cases where you need over 9000 complexity optimizations, feature development, code readability, maintainability, things like that are a much bigger deal in JS in general. Along with that in most use cases you aren't going to request 1m data records from your DB using JS, usually like 50 that you want to display on a page and for that you'll use the DB to optimize your query, there's hardly ever any need for using such large data structures in any JS development (or web development in general). It's a lot better to pull what you need and to request more or continuously stream what you need to the client. So a lot of the data structures (like a binary tree) aren't really relevant to things in JS unless it's a very specific use case.

Fastest way to check for an object's existence in a list in JavaScript

I've got a list of 100,000 items that live in memory (all of them big ints stored as strings).
The data structure these come in doesn't really matter. Right now they live in an array like so:
const list = ['1','2','3'...'100000'];
Note: The above is just an example - in reality, each entry is an 18 digit string.
I need to check for an object's existence. Currently I'm doing:
const needToCheck = '3';
const doesInclude = list.includes(needToCheck);
However there's a lot of ways I could do this existence check. I need this to be as performant as possible.
A few other avenues I could follow are:
Create a Map with the value being undefined
Create an object ({}) and create the keys of the object as the entries in list, then use hasOwnProperty.
Use a Set()
Use some other sort of data structure (a tree?) due to the fact that these are all numbers. However, due to the fact that these are all 18 digits in length, so maybe that'll be less performant.
I can accept a higher upfront cost to build the data structure to get a bigger speed increase later, as this is for a URL route that will be hit >1MM times a day.
Array.prototype.includes is an O(n) operation, which is not desirable - every time you want to check whether a value exists, you'll have to iterate over much of the collection (perhaps the entire collection).
A Map, Set, or object are better, since checking whether they have a value is an O(1) operation.
A tree is not desirable either, because lookup will necessarily take a number of operations down the tree, which could be an issue if the tree is large and you want to lookup frequently - so the O(1) solution is better.
A Map, while it works, probably isn't appropriate because you just want to see if a value exists - you don't need key-value pairs, just values. A Set is composed of only values (and Set.has is indeed O(1)), so that's the best choice for this situation. An object with keys, while it could work too, might not be a good idea because it may create many unnecessary hidden classes - a Set is more designed towards dynamic values at runtime.
So, the Set approach looks to be the most performant and appropriate choice.
You might also consider the possibility of moving the calculation to the server. 100,000 items isn't necessarily too much, but it's still a surprisingly large amount to see client-side.
Unconventionally, you could also use an object and set each of your 100,000 items as a property because under the hood, the JavaScript Object is implemented with a hash table.
For example,
var numbers = {
"1": 1243213,
"2": 4314121,
"3": 3142123
...
}
You could then very quickly check if an item existed by checking if numbers["1"] === undefined. And not only that, but you can also get the value of of the property at the same time.
However, this method does come with some drawbacks like iterating through the list becoming a lot more complicated (though still possible).
For reference, see https://stackoverflow.com/a/24196259/8250558

Are the advantages of Typed Arrays in JavaScript is that they work the same or similar in C?

I've been playing around with Typed Arrays in JavaScript.
var buffer = new ArrayBuffer(16);
var int32View = new Int32Array(buffer);
I imagine normal arrays ([1, 257, true]) in JavaScript have poor performance because their values could be of any type, therefore, reaching an offset in memory is not trivial.
I originally thought that JavaScript array subscripts worked the same as objects (as they have many similarities), and were hash map based, requiring a hash based lookup. But I haven't found much credible information to confirm this.
So, I'd assume the reason why Typed Arrays perform so well is because they work like normal arrays in C, where they're always typed. Given the initial code example above, and wishing to get the 10th value in the typed array...
var value = int32View[10];
The type is Int32, so each value must consist of 32 bits or 4 bytes.
The subscript is 10.
So the location in memory of that value is <array offset> + (4 * 10), and then read 4 bytes to get the total value.
I basically just want to confirm my assumptions. Is my thoughts around this correct, and if not, please elaborate.
I checked out the V8 source to see if I could answer it myself, but my C is rusty and I'm not too familiar with C++.
Typed Arrays were designed by the WebGL standards committee, for performance reasons. Typically Javascript arrays are generic and can hold objects, other arrays and so on - and the elements are not necessarily sequential in memory, like they would be in C. WebGL requires buffers to be sequential in memory, because that's how the underlying C API expects them. If Typed Arrays are not used, passing an ordinary array to a WebGL function requires a lot of work: each element must be inspected, the type checked, and if it's the right thing (e.g. a float) then copy it out to a separate sequential C-like buffer, then pass that sequential buffer to the C API. Ouch - lots of work! For performance-sensitive WebGL applications this could cause a big drop in the framerate.
On the other hand, like you suggest in the question, Typed Arrays use a sequential C-like buffer already in their behind-the-scenes storage. When you write to a typed array, you are indeed assigning to a C-like array behind the scenes. For the purposes of WebGL, this means the buffer can be used directly by the corresponding C API.
Note your memory address calculation isn't quite enough: the browser must also bounds-check the array, to prevent out-of-range accesses. This has to happen with any kind of Javascript array, but in many cases clever Javascript engines can omit the check when it can prove the index value is already within bounds (such as looping from 0 to the length of the array). It also has to check the array index is really a number and not a string or something else! But it is in essence like you describe, using C-like addressing.
BUT... that's not all! In some cases clever Javascript engines can also deduce the type of ordinary Javascript arrays. In an engine like V8, if you make an ordinary Javascript array and only store floats in it, V8 may optimistically decide it's an array of floats and optimise the code it generates for that. The performance can then be equivalent to typed arrays. So typed arrays aren't actually necessary to reach maximum performance: just use arrays predictably (with every element the same type) and some engines can optimise for that as well.
So why do typed arrays still need to exist?
Optimisations like deducing the type of arrays is really complicated. If V8 deduces an ordinary array has only floats in it, then you store an object in an element, it has to de-optimise and regenerate code that makes the array generic again. It's quite an achievement that all this works transparently. Typed Arrays are much simpler: they're guaranteed to be one type, and you just can't store other things like objects in them.
Optimisations are never guaranteed to happen; you may store only floats in an ordinary array, but the engine may decide for various reasons not to optimise it.
The fact they're much simpler means other less-sophisticated javascript engines can easily implement them. They don't need all the advanced deoptimisation support.
Even with really advanced engines, proving optimisations can be used is extremely difficult and can sometimes be impossible. A typed array significantly simplifies the level of proof the engine needs to be able to optimise around it. A value returned from a typed array is certainly of a certain type, and engines can optimise for the result being that type. A value returned from an ordinary array could in theory have any type, and the engine may not be able to prove it will always have the same type result, and therefore generates less efficient code. Therefore code around a typed array is more easily optimised.
Typed arrays remove the opportunity to make a mistake. You just can't accidentally store an object and suddenly get far worse performance.
So, in short, ordinary arrays can in theory be equally fast as typed arrays. But typed arrays make it much easier to reach peak performance.
Yes, you are mostly correct. With a standard JavaScript array, the JavaScript engine has to assume that the data in the array is all objects. It can still store this as a C-like array/vector, where the access to the memory is still like you described. The problem is that the data is not the value, but something referencing that value (the object).
So, performing a[i] = b[i] + 2 requires the engine to:
access the object in b at index i;
check what type the object is;
extract the value out of the object;
add 2 to the value;
create a new object with the newly computed value from 4;
assign the new object from step 5 into a at index i.
With a typed array, the engine can:
access the value in b at index i (including placing it in a CPU register);
increment the value by 2;
assign the new object from step 2 into a at index i.
NOTE: These are not the exact steps a JavaScript engine will perform, as that depends on the code being compiled (including surrounding code) and the engine in question.
This allows the resulting computations to be much more efficient. Also, the typed arrays have a memory layout guarantee (arrays of n-byte values) and can thus be used to directly interface with data (audio, video, etc.).
When it comes to performance, things can change fast. As AshleysBrain says, it comes down to whether the VM can deduce that a normal array can be implemented as a typed array quickly and accurately. That depends on the particular optimizations of the particular JavaScript VM, and it can change in any new browser version.
This Chrome developer comment provides some guidance that worked as of June 2012:
Normal arrays can be as fast as typed arrays if you do a lot of sequential access. Random access outside the bounds of the array causes the array to grow.
Typed arrays are fast for access, but slow to be allocated. If you create temporary arrays frequently, avoid typed arrays. (Fixing this is possible, but it's low priority.)
Micro-benchmarks such as JSPerf are not reliable for real-world performance.
If I might elaborate on the last point, I've seen this phenomenon with Java for years. When you test the speed of a small piece of code by running it over and over again in isolation, the VM optimizes the heck out of it. It makes optimizations which only make sense for that specific test. Your benchmark can get a hundredfold speed improvement compared to running the same code inside another program, or compared to running it immediately after running several different tests that optimize the same code differently.
I'm not really contributor to any javascript engine, only had some readings on v8, so my answer might not be completely true:
Well values in arrays(only normal arrays with no holes/gaps, not sparse. Sparse arrays are treated as objects.) are all either pointers or a number with a fixed length(in v8 they are 32 bit, if a 31 bit integer then it's tagged with a 0 bit in the end, else it's a pointer).
So I don't think finding the memory location is any different than a typedArray, since the number of the bytes are the same all over the array. But the difference comes that if it's an a object, then you have to add one unboxing layer, which doesn't happen for normal typedArrays.
And ofcourse when accessing typedArrays, definitely doesn't have type checking's that a normal array have(though that might be remove in a higly optimized code, which is only generated for hot code).
For Writing, if it's the same type shouldn't be much slower. If it's a different type then the JS engine might generate polymorphic code for it, which is slower.
You can also try making some benchmarks on jsperf.com to confirm.

Are JavaScript arrays actually linked lists?

I'm new to Javascript, and notice that you don't need to specify an array's size and often see people dynamically creating arrays one element at time. This would be a huge performance problem in other languages as you would constantly need to reallocate memory for the array as it increases in size.
Is this not a problem in JavaScript? If so, then is there a list structure available?
Javascript arrays are typically implemented as hashmaps (just like Javascript objects) with one added feature: there is an attribute length, which is one higher than the highest positive integer that has been used as a key. Nothing stops you from also using strings, floating-point numbers, even negative numbers as keys. Nothing except good sense.
It most likely depends on what JavaScript engine you use.
Internet Explorer uses a mix of sparse arrays and dense arrays to make that work. Some of the more gory details are explained here: http://blogs.msdn.com/b/jscript/archive/2008/04/08/performance-optimization-of-arrays-part-ii.aspx.
The thing about dynamic languages is, well, that they're dynamic. Just like ArrayList in Java, or arrays in Perl, PHP, and Python, an Array in JavaScript will allocate a certain amount of memory and when it gets to be too big, the language automatically appends to the object. Is it as efficient as C++ or even Java? No (C++ can run circles around even the best implementations of JS), but people aren't building Quake in JS (just yet).
It is actually better to think of them as HashMaps with some specialized methods too anyway -- after all, this is valid: var a = []; a['cat']='meow';.
No.
What JavaScript arrays are and aren't is determined by the language specification specifically section 15.4. Array is defined in terms of the operations it provides not implementation details of the memory layout of any particular data structure.
Could Array be implemented on top of a linked list? Yes. This might make certain operations faster such as shift and unshift efficient, but Array also is frequently accessed by index which is not efficient with linked lists.
It's also possible to get the best of both worlds without linked lists. Continguous memory data structures, such as circular queues have both efficient insertion/removal from the front and efficient random access.
In practice, most interpreters optimize dense arrays by using a data structure based around a resizable or reallocable array similar to a C++ vector or Java ArrayList.
Javascript arrays are not true arrays like in C/C++ or other languages. Therefore, they aren't as efficient, but they are arguably easier to use and do not throw out of bounds exceptions.
They are actually more like custom objects that use the properties as indexes.
Example:
var a = { "1": 1, "2": 2};
a.length = 2;
for(var i=0;i<a.length;i++)
console.log(a[i]);
a will behave almost like an array, and you can also call functions from the Array.prototype on it.

String concatenation vs string buffers in Javascript

I was reading this book - Professional Javascript for Web Developers where the author mentions string concatenation is an expensive operation compared to using an array to store strings and then using the join method to create the final string. Curious, I did a couple test here to see how much time it would save and this is what I got -
http://jsbin.com/ivako
Somehow, the Firefox usually produces somewhat similar times to both ways, but in IE, string concatenation is much much faster. So, can this idea now be considered outdated (browsers probably have improved since?
Even if it were true and the join() was faster than concatenation it wouldn't matter. We are talking about tiny amounts of miliseconds here which are completely negligible.
I would always prefer well structured and easy to read code over microscopic performance boost and I think that using concatenation looks better and is easier to read.
Just my two cents.
On my system (IE 8 in Windows 7) the times of StringBuilder in that test very from about 70-100% in range -- that is, it is not stable -- although the mean is about 95% of that of the normal appending.
While it's easy now just to say "premature optimization" (and I suspect that in almost every case it is) there are things worth considering:
The problem with repeated string concatenation comes repeated memory allocations and repeated data copies (advanced string data-types can reduce/eliminate much of this, but let's keep assuming a simplistic model for now). From this lets raise some questions:
What memory allocation is used? In the naive case each str+=x requires str.length+x.length new memory to be allocated. The standard C malloc, for instance, is a rather poor memory allocator. JS implementations have undergone changes over the years including, among other things, better memory subsystems. Of course these changes don't stop there and touch really all aspects of modern JS code. Because now ancient implementations may have been incredibly slow in certain tasks does not necessarily imply that the same issues still exist, or to the same extents.
As with above the implementation of Array.join is very important. If it does NOT pre-allocate memory for the final string before building it then it only saves on data-copy costs -- how many GB/s is main memory these days? 10,000 x 50 is hardly pushing a limit. A smart Array.join operation with a POOR MEMORY ALLOCATOR would be expected to perform a good bit better simple because the amount of re-allocations is reduced. This difference would be expected to be minimized as allocation cost decreases.
The micro-benchmark code may be flawed depending on if the JS engine creates a new object per each UNIQUE string literal or not. (This would bias it towards the Array.join method but needs to be considered in general).
The benchmark is indeed a micro benchmark :)
Increase the growing size should have an impact of performance based on any or all (and then some) above conditions. It is generally easy to show extreme cases favoring some method or another -- the expected use case is generally of more importance.
Although, quite honestly, for any form of sane string building, I would just use normal string concatenation until such a time it was determined to be a bottleneck, if ever.
I would re-read the above statement from the book and see if there perhaps other implicit considerations the author was indeed meaning to invoke such as "for very large strings" or "insane amounts of string operations" or "in JScript/IE6", etc... If not, then such a statement is about as useful as "Insert sort is O(n*n)" [the realized costs depend upon the state of the data and the size of n of course].
And the disclaimer: the speed of the code depends upon the browser, operating system, the underlying hardware, moon gravitational forces and, of course, how your computer feels about you.
In principle the book is right. Joining an array should be much faster than repeatedly concatenating to the same string. As a simple algorithm on immutable strings it is demonstrably faster.
The trick is: JavaScript authors, being largely non-expert dabblers, have written a load of code out there in the wild that uses concatenating, and relatively little ‘good’ code that using methods like array-join. The upshot is that browser authors can get a better improvement in speed on the average web page by catering for and optimising the ‘bad’, more common option of concatenation.
So that's what happened. The newer browser versions have some fairly hairy optimisation stuff that detects when you're doing a load of concatenations, and hacks it about so that internally it is working more like an array-join, at more or less the same speed.
I actually have some experience in this area, since my primary product is a big, IE-only webapp that does a LOT of string concatenation in order to build up XML docs to send to the server. For example, in the worst case a page might have 5-10 iframes, each with a few hundred text boxes that each have 5-10 expando properties.
For something like our save function, we iterate through every tab (iframe) and every entity on that tab, pull out all the expando properties on each entity and stuff them all into a giant XML document.
When profiling and improving our save method, we found that using string concatention in IE7 was a lot slower than using the array of strings method. Some other points of interest were that accessing DOM object expando properties is really slow, so we put them all into javascript arrays instead. Finally, generating the javascript arrays themselves is actually best done on the server, then you write then onto the page as a literal control to be exectued when the page loads.
As we know, not all browsers are created equal. Because of this, performance in different areas is guaranteed to differ from browser to browser.
That aside, I noticed the same results as you did; however, after removing the unnecessary buffer class, and just using an array directly and a 10000 character string, the results were even tighter/consistent (in FF 3.0.12): http://jsbin.com/ehalu/
Unless you're doing a great deal of string concatenation, I would say that this type of optimization is a micro-optimization. Your time might be better spent limiting DOM reflows and queries (generally the use of document.getElementbyById/getElementByTagName), implementing caching of AJAX results (where applicable), and exploiting event bubbling (there's a link somewhere, I just can't find it now).
Okay, regarding this here is a related module:
http://www.openjsan.org/doc/s/sh/shogo4405/String/Buffer/0.0.1/lib/String/Buffer.html
This is an effective means of creating String buffers, by using
var buffer = new String.Buffer();
buffer.append("foo", "bar");
This is the fastest sort of implementation of String buffers I know of. First of all if you are implementing String Buffers, don't use push because that is a built-in method and it is slow, for one push iterates over the entire arguments array, rather then just adding one element.
It all really depends upon the implementation of the join method, some implementations of the join method are really slow and some are relatively large.

Categories