Cross-browser key lookup performance in Javascript objects - javascript

I'm doing a data-intensive project in Javascript, where there are thousands of objects with short strings as "IDs" that need to be looked up efficiently by ID. My (possibly naive) approach was to create an object with a property for each object, keyed by the ID.
How do different browsers/JS engines implement key lookup in very large objects like this? I know that V8 is highly optimized for objects with a small number of properties, but what happens if there are thousands of properties that are constantly added and deleted? Are objects backed by hash tables or tries in the different browsers? Either way, I'd imagine there's a benefit to using shorter keys, but does anyone have insight into whether that's a large benefit? And are there any browsers that are so bad at key lookup (i.e. do any use sequential search?) that it would be more performant to roll my own data structure?

How very co-incidental. There was a question a few days ago about this. The OP generated some benchmarks to test and posted them only a few hours ago. I suggest you check it out: JavaScript Objects as Hashes? Is the complexity greater than O(1)?

Related

How does Javascript's engine design affect user data structure implementations on worse case? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
Take, for instance, a JS web engine that implements associative arrays (objects) by using hash tables. but as we know hash tables have worse case O(n) because collisions are inevitable.
Suppose I begin to develop a new data structure using Javascript, a data structure such as LinkedList that has worse case O(1) for insert/delete. But since I implement it with object/array. Then it must be true that my implementation is also at minimum worse case O(n) as well.
I'm aware that this engine optimizes very well, and a good hash function will generate O(1) on average. However, I just want to confirm my realization that this isn't all as straightforward as the textbook says so. or is it?
I suppose at the root of all data structures implements with an array, since the access is always O(1), then shouldn't all data structures be built with an array without intermediary structures? also, the dynamic array still has delete O(n) cant that be the same problem that can trickle down just like my earlier example?
is this where the benefit of using a low-level programming language is better than using a high-level language? Such that low level there isn't so much abstraction and the textbook complexity numbers can actually match?
apologize if my ideas are all over the place.
But since I implement [my custom data structure] with object/array. Then it must be true that my implementation is also at minimum worse case O(n) as well.
No. Your linked list does not use an object with n keys anywhere. You'll have a
const linkedListNode = {
value: …,
next: null,
};
but even if this was implemented using a HashTable with O(n) worst-case member access, in your case n=2. There are not arbitrarily many properties in your object, just two. That's how you get back to O(1).
Is this where the benefit of using a low level programming language is better than using high level language? Such that low level there isn't so much abstraction and the textbook complexity numbers can actually match?
No. Even in a lower-level programming language, you can begin to question the underlying abstraction. You think in C, an indexed array access in memory is constant time? No, since page faults and other caching shenanigans come into play.
This is why textbook complexity is always defined in terms of a machine model. As long as you define object property access in your JavaScript execution model as constant-time (and it's a very reasonable assumption to do that! It closely resembles the real world), your numbers do apply to JavaScript code as well. Sure, you can try unravelling abstractions and analyse your high-level algorithms in terms of the primitives of a lower level, but there's no point in doing that. It's precisely why we have these abstractions in the first place.
"associative arrays (objects) by using hash tables" -> Javascript objects are complicated and are much more than just "it's a hash map". I don't know the exact technical details but I think they change to hash map after a certain amount of values are stored in them, along with that they also store metadata which is used on other algorithms like Object.keys to automatically sort the keys after they've been pulled out of the hash map. Again I don't know the technical details but I do know that, it's not straight forward.
"as we know hash tables has worse case O(n) because collisions are inevitable" -> It depends on what hashing you're using, but more than that even if collisions are inevitable it's not correct to just claim "it's worse case O(n)" and leave it at that because the probability of it being O(n) logarithmically declines to 0, the chances of it finding a collision time after time again and again is extremely unlikely, so while it can perhaps find a collision that doesn't effectively describe the time complexity.
"it must be true that my implementation is also at minimum worse case O(n) as well" -> Not correct, you're speaking about two different things. If you build a linked list each node will be connected to the next node using a heap reference, which has nothing to do with javascript objects. Iterating through an entire linked list will be O(n) but that's because of having to iterate over every next node, not because of anything with objects or hashes.
"worse case O(1) for insert/delete" -> This is only true if you have the reference to the node where you want to insert/delete it, otherwise you'll have to search through it before insertion/deletion. But that's exactly the same in javascript.
"then shouldn't all data structure be built with array without intermediary structures" -> Most data structures I know (like a list, stack, queue) are implemented on top of a normal array. The ones that aren't (like a binary tree, dictionary/map or a linked list) are not implemented on an array because it wouldn't really make sense. For example the whole point of using object references with a linked list is so that you can directly insert/delete something, using an array under the hood would just defeat the entire point of using a linked list when you're specifically trying to take advance of the object references.
"also dynamic array still has delete O(n) cant that be the same problem which can trickle down just like my earlier example" -> Not necessarily because when you wrap things inside an object and use an internal array inside, you can add metadata, indexes, hashes, things stored outside of the array (private to the object) and all sorts of other things to speed up and keep track of things on that array. So the complexity of what's used internally doesn't just automatically spill over to using it in another object. But you do need to be careful, like if you use a list then the inner workings of it in languages like C# is that it will double the internal array when you try to add more elements to it after it's full, this can result in a lot of memory waste.
That being said the use case for javascript is in 99% of cases not "optimize this by another 10ms", javascript is used because of its non IO blocking nature, streaming, async/await reactive programming and it's rapid speed of development, it's also used 90% for web communication, not some highly optimized graphics engine. So there's very very few edge cases where you need over 9000 complexity optimizations, feature development, code readability, maintainability, things like that are a much bigger deal in JS in general. Along with that in most use cases you aren't going to request 1m data records from your DB using JS, usually like 50 that you want to display on a page and for that you'll use the DB to optimize your query, there's hardly ever any need for using such large data structures in any JS development (or web development in general). It's a lot better to pull what you need and to request more or continuously stream what you need to the client. So a lot of the data structures (like a binary tree) aren't really relevant to things in JS unless it's a very specific use case.

Complexity of accessing data in an object

In some of the projects I'm working on as part of my day job, I need to access data in very large JS objects (on the order of thousands of key-value pairs). I'm trying to improve the efficiency of my code, so I came up with a few questions:
What is the runtime complexity of JS when accessing a field in such an object? My initial hunch is that it's O(n)
Is there a difference when accessing through dot notation or bracket notation? (e.g. obj.field vs obj[field])
I'm guessing there is a different answer for different runtime engines - is there a place where I can see the difference between them?
Javascript objects are actually Hashes, so the complexity is O(1) for all engines.
obj.field is an alias for obj['field'], so they have the same performances.
You can find some JS hashes performance tests here, unfortunately only for your browser engine.
In the worst case an JS object is represented as a hash table and has the same lookup complexity: O(1) on average but O(n) at worst. The hash table implementation is your case I guess, because you have so many items in the object. There is no difference how you access a property, obj.field and obj['filed'] are the same.
It's also worth to mention that the complexity isn't always equal to complexity of a hash table lookup, it's faster in much cases. Modern JS engines use techniques called hidden classes and inline caching to speed up the lookup. It's a pretty big question, how these techniques work, I've explained it in another answer.
Some relative resources:
JavaScript engine fundamentals: Shapes and Inline Caches and a YouTube video as well.
Fast properties in V8
A tour of V8: object representation
JavaScript Engines Hidden Classes (and Why You Should Keep Them in Mind)

Are the advantages of Typed Arrays in JavaScript is that they work the same or similar in C?

I've been playing around with Typed Arrays in JavaScript.
var buffer = new ArrayBuffer(16);
var int32View = new Int32Array(buffer);
I imagine normal arrays ([1, 257, true]) in JavaScript have poor performance because their values could be of any type, therefore, reaching an offset in memory is not trivial.
I originally thought that JavaScript array subscripts worked the same as objects (as they have many similarities), and were hash map based, requiring a hash based lookup. But I haven't found much credible information to confirm this.
So, I'd assume the reason why Typed Arrays perform so well is because they work like normal arrays in C, where they're always typed. Given the initial code example above, and wishing to get the 10th value in the typed array...
var value = int32View[10];
The type is Int32, so each value must consist of 32 bits or 4 bytes.
The subscript is 10.
So the location in memory of that value is <array offset> + (4 * 10), and then read 4 bytes to get the total value.
I basically just want to confirm my assumptions. Is my thoughts around this correct, and if not, please elaborate.
I checked out the V8 source to see if I could answer it myself, but my C is rusty and I'm not too familiar with C++.
Typed Arrays were designed by the WebGL standards committee, for performance reasons. Typically Javascript arrays are generic and can hold objects, other arrays and so on - and the elements are not necessarily sequential in memory, like they would be in C. WebGL requires buffers to be sequential in memory, because that's how the underlying C API expects them. If Typed Arrays are not used, passing an ordinary array to a WebGL function requires a lot of work: each element must be inspected, the type checked, and if it's the right thing (e.g. a float) then copy it out to a separate sequential C-like buffer, then pass that sequential buffer to the C API. Ouch - lots of work! For performance-sensitive WebGL applications this could cause a big drop in the framerate.
On the other hand, like you suggest in the question, Typed Arrays use a sequential C-like buffer already in their behind-the-scenes storage. When you write to a typed array, you are indeed assigning to a C-like array behind the scenes. For the purposes of WebGL, this means the buffer can be used directly by the corresponding C API.
Note your memory address calculation isn't quite enough: the browser must also bounds-check the array, to prevent out-of-range accesses. This has to happen with any kind of Javascript array, but in many cases clever Javascript engines can omit the check when it can prove the index value is already within bounds (such as looping from 0 to the length of the array). It also has to check the array index is really a number and not a string or something else! But it is in essence like you describe, using C-like addressing.
BUT... that's not all! In some cases clever Javascript engines can also deduce the type of ordinary Javascript arrays. In an engine like V8, if you make an ordinary Javascript array and only store floats in it, V8 may optimistically decide it's an array of floats and optimise the code it generates for that. The performance can then be equivalent to typed arrays. So typed arrays aren't actually necessary to reach maximum performance: just use arrays predictably (with every element the same type) and some engines can optimise for that as well.
So why do typed arrays still need to exist?
Optimisations like deducing the type of arrays is really complicated. If V8 deduces an ordinary array has only floats in it, then you store an object in an element, it has to de-optimise and regenerate code that makes the array generic again. It's quite an achievement that all this works transparently. Typed Arrays are much simpler: they're guaranteed to be one type, and you just can't store other things like objects in them.
Optimisations are never guaranteed to happen; you may store only floats in an ordinary array, but the engine may decide for various reasons not to optimise it.
The fact they're much simpler means other less-sophisticated javascript engines can easily implement them. They don't need all the advanced deoptimisation support.
Even with really advanced engines, proving optimisations can be used is extremely difficult and can sometimes be impossible. A typed array significantly simplifies the level of proof the engine needs to be able to optimise around it. A value returned from a typed array is certainly of a certain type, and engines can optimise for the result being that type. A value returned from an ordinary array could in theory have any type, and the engine may not be able to prove it will always have the same type result, and therefore generates less efficient code. Therefore code around a typed array is more easily optimised.
Typed arrays remove the opportunity to make a mistake. You just can't accidentally store an object and suddenly get far worse performance.
So, in short, ordinary arrays can in theory be equally fast as typed arrays. But typed arrays make it much easier to reach peak performance.
Yes, you are mostly correct. With a standard JavaScript array, the JavaScript engine has to assume that the data in the array is all objects. It can still store this as a C-like array/vector, where the access to the memory is still like you described. The problem is that the data is not the value, but something referencing that value (the object).
So, performing a[i] = b[i] + 2 requires the engine to:
access the object in b at index i;
check what type the object is;
extract the value out of the object;
add 2 to the value;
create a new object with the newly computed value from 4;
assign the new object from step 5 into a at index i.
With a typed array, the engine can:
access the value in b at index i (including placing it in a CPU register);
increment the value by 2;
assign the new object from step 2 into a at index i.
NOTE: These are not the exact steps a JavaScript engine will perform, as that depends on the code being compiled (including surrounding code) and the engine in question.
This allows the resulting computations to be much more efficient. Also, the typed arrays have a memory layout guarantee (arrays of n-byte values) and can thus be used to directly interface with data (audio, video, etc.).
When it comes to performance, things can change fast. As AshleysBrain says, it comes down to whether the VM can deduce that a normal array can be implemented as a typed array quickly and accurately. That depends on the particular optimizations of the particular JavaScript VM, and it can change in any new browser version.
This Chrome developer comment provides some guidance that worked as of June 2012:
Normal arrays can be as fast as typed arrays if you do a lot of sequential access. Random access outside the bounds of the array causes the array to grow.
Typed arrays are fast for access, but slow to be allocated. If you create temporary arrays frequently, avoid typed arrays. (Fixing this is possible, but it's low priority.)
Micro-benchmarks such as JSPerf are not reliable for real-world performance.
If I might elaborate on the last point, I've seen this phenomenon with Java for years. When you test the speed of a small piece of code by running it over and over again in isolation, the VM optimizes the heck out of it. It makes optimizations which only make sense for that specific test. Your benchmark can get a hundredfold speed improvement compared to running the same code inside another program, or compared to running it immediately after running several different tests that optimize the same code differently.
I'm not really contributor to any javascript engine, only had some readings on v8, so my answer might not be completely true:
Well values in arrays(only normal arrays with no holes/gaps, not sparse. Sparse arrays are treated as objects.) are all either pointers or a number with a fixed length(in v8 they are 32 bit, if a 31 bit integer then it's tagged with a 0 bit in the end, else it's a pointer).
So I don't think finding the memory location is any different than a typedArray, since the number of the bytes are the same all over the array. But the difference comes that if it's an a object, then you have to add one unboxing layer, which doesn't happen for normal typedArrays.
And ofcourse when accessing typedArrays, definitely doesn't have type checking's that a normal array have(though that might be remove in a higly optimized code, which is only generated for hot code).
For Writing, if it's the same type shouldn't be much slower. If it's a different type then the JS engine might generate polymorphic code for it, which is slower.
You can also try making some benchmarks on jsperf.com to confirm.

Javascript Relational Database

Long story short : I'd like to treat several javascript associative arrays as a database (where the arrays are tables). The relations could be represented by special fields inside the arrays. I'm not interested in the persistence aspect of a database, I only want to be able to query the arrays with a SQL-like language and retrieve sets of data in the form of associative arrays.
My question : Is there any javascript library that has such features ? Otherwise, is there any library that can at least take care of the SQL-like language part ?
Thanks
I believe the closest thing to what you need is the jLinq library. It can operate with js objects and arrays much in the same way you would do with a database, but in a slightly different way. You don't really write queries, but use methods to construct them. Overall it's way better I think.
Some googling found this: http://ajaxian.com/archives/two-js-solutions-to-run-sql-like-statements-on-arrays-and-objects which seemed interesting.
Can I ask why you want to do this?
I came across this question while searching something sort of related. Wanted to share with you (9 years later, I do realize) that I have the same want/need, often, where the script I'm working on has a lot of cross-referencing to do between various sources of information. I use PowerShell. Enumerating arrays of objects, from within a loop, which is enumerating other objects, is just bad/slow/horrible.
To date, my solution has been to take all my arrays and then make hashtables from them, where the key/name is a property value that is common across all the arrays (e.g. ObjectId (GUID)), and the value is the entire object from the array. With this, while in my loop which is enumerating Array#1, I can check for the presence of this current item in any of the other arrays simply by checking the existence of the key in the corresponding hashtables, and that way there's no enumerating the other array, there's just direct , equal effort access to the correct item in the array (but really coming from the newly built hashtable).
So my arrays are just temporary collection buckets, then everything I do from there uses the hashtables, which are just index/lookup tables.
What I was searching for when I stumbled here, was for solutions to keep track of all the different hashtables in the building/planning phase of my scripts.

Are JavaScript arrays actually linked lists?

I'm new to Javascript, and notice that you don't need to specify an array's size and often see people dynamically creating arrays one element at time. This would be a huge performance problem in other languages as you would constantly need to reallocate memory for the array as it increases in size.
Is this not a problem in JavaScript? If so, then is there a list structure available?
Javascript arrays are typically implemented as hashmaps (just like Javascript objects) with one added feature: there is an attribute length, which is one higher than the highest positive integer that has been used as a key. Nothing stops you from also using strings, floating-point numbers, even negative numbers as keys. Nothing except good sense.
It most likely depends on what JavaScript engine you use.
Internet Explorer uses a mix of sparse arrays and dense arrays to make that work. Some of the more gory details are explained here: http://blogs.msdn.com/b/jscript/archive/2008/04/08/performance-optimization-of-arrays-part-ii.aspx.
The thing about dynamic languages is, well, that they're dynamic. Just like ArrayList in Java, or arrays in Perl, PHP, and Python, an Array in JavaScript will allocate a certain amount of memory and when it gets to be too big, the language automatically appends to the object. Is it as efficient as C++ or even Java? No (C++ can run circles around even the best implementations of JS), but people aren't building Quake in JS (just yet).
It is actually better to think of them as HashMaps with some specialized methods too anyway -- after all, this is valid: var a = []; a['cat']='meow';.
No.
What JavaScript arrays are and aren't is determined by the language specification specifically section 15.4. Array is defined in terms of the operations it provides not implementation details of the memory layout of any particular data structure.
Could Array be implemented on top of a linked list? Yes. This might make certain operations faster such as shift and unshift efficient, but Array also is frequently accessed by index which is not efficient with linked lists.
It's also possible to get the best of both worlds without linked lists. Continguous memory data structures, such as circular queues have both efficient insertion/removal from the front and efficient random access.
In practice, most interpreters optimize dense arrays by using a data structure based around a resizable or reallocable array similar to a C++ vector or Java ArrayList.
Javascript arrays are not true arrays like in C/C++ or other languages. Therefore, they aren't as efficient, but they are arguably easier to use and do not throw out of bounds exceptions.
They are actually more like custom objects that use the properties as indexes.
Example:
var a = { "1": 1, "2": 2};
a.length = 2;
for(var i=0;i<a.length;i++)
console.log(a[i]);
a will behave almost like an array, and you can also call functions from the Array.prototype on it.

Categories