I want a reusable list with my own custom behaviors (e.g. updateOrAppendByItemId) to hold a retrieved array for enumeration via ngRepeat. The main purpose is to hold live collections - contents might be changed locally or remotely (via SignalR). It should:
enumerate properly in an ng-repeat directive
not render my custom behaviors as items (rows)
retain sort order from de-serialized source
behave across modern browsers
avoid unnecessary list copying
How should it generally be constructed?
I've immediately run into conceptual problems when building it a few different ways. A JavaScript Array can't be subclassed. Copying my behavioral prototype directly to an Array instance results in undesired enumerable properties. Storing contents in object properties loses source item order.
Each of those problems currently has only theoretical severity to me, with real-world consequences that depend on both undocumented AngularJS behavior and browser differences. I'm hoping someone who has touched this problem before can save me lots of trial-and-error.
Research:
Since I'm starting that potentially lengthy investigation I was hoping to avoid, I'll post my work here.
Angular's ngRepeat first tests if a collection isArrayLike(). If it is, it simply iterates through the length of the collection numerically. That seems highly intentional (although I wouldn't want to depend on the interpretation of the internal function isArrayLike()).
In the one browser I tested in so far, neither of the following change the effective array contents.
// safe
array.myMethod = true;
array['myMethod'] = true;
However, the following is disastrous: /grin
array['5'] = true;
So, that's ECMA spec:
A property name P (in the form of a String value) is an array index if
and only if ToString(ToUint32(P)) is equal to P and ToUint32(P) is not
equal to 232−1.
I may not have to go any further. I'll try this solution and report back.
Related
In a state-managing javascript framework (eg: React), if you have a collection of objects to store in state, which is the more useful and/or performant dataset type to hold them all, an object or an array? Here are a few of the differences I can think of that might come up in using them in state:
Referencing entries:
With objects you can reference an entry directly by its key, whereas with an array you would have to use a function like dataset.find(). The performance difference might be negligible when doing a single lookup on a small dataset, but I imagine it gets larger if the find function has to pore over a large set, or if you need to reference many entries at once.
Updating dataset:
With objects you can add new entries with {...dataset, [newId]: newEntry}, edit old entries with {...dataset, [id]: alteredEntry} and even edit multiple entries in one swoop with {...dataset, [id1]: alteredEntry1, [id2]: alteredEntry2}. Whereas with arrays, adding is easy [...dataset, newEntry1, newEntry2], but to edit you have to use find(), and then probably write a few lines of code cloning the dataset and/or the entry for immutability's sake. And then for editing multiple entries it's going to either require a loop of find() functions (which sounds bad for large lists) or use filter() and then have to deal with adding them back into the dataset afterwards.
Deleting
To delete a single entry from the object dataset you would do delete dataset[id] and for multiple entries you would either use a loop, or a lodash function like _.omit(). To remove entries from an array (and keep it dense) you'd have to either use findIndex() and then .slice(index, 1), or just use filter() which would work nicely for single or multiple deletes. I'm not sure about the performance implications of any of these options.
Looping/Rendering: For an array you can use dataset.map() or even easily render a specialized set on the fly with dataset.filter() or dataset.sort(). For the object to render in React you would have to use Object.values(dataset) before running one of the other iteration functions on it, which I suppose might create a performance hit depending on dataset size.
Are there any points I'm missing here? Does the usability of either one depend perhaps on how large the dataset is, or possibly how frequent the need to use "look up" operations are? Just trying to pin down what circumstances might dictate the superiority of one or the other.
There's no one real answer, the only valid answer is It dependsTM.
Though there are different use-cases that requires different solutions. It all boils down to how the data is going to be used.
A single array of objects
Best used when the order matters and when it's likely rendered as a whole list, where each item is passed from the list looping directly and where items are rarely accessed individually.
This is the quickest (least developer-time consuming) way of storing received data, if the data is already using this structure to begin with, which is often the case.
Pros of array state
Items order can be tracked easily,
Easy looping, where the individual items are passed down from the list.
It's often the original structure returned from API endpoints,
Cons of an array state
Updating an item would trigger a render of the full list.
Needs a little more code to find/edit individual items.
A single object by id
Best used when the order doesn't matter, and it's mostly used to render individual items, like on an edit item page. It's a step in the direction of a normalized state, explained in the next section.
Pros of an object state
Quick and easy to access/update by id
Cons of an object state
Can't re-order items easily
Looping requires an extra step (e.g. Object.keys().map)
Updating an item would trigger a render of the full list,
Likely needs to be parsed into the target state object structure
Normalized state
Implemented using both an object of all items by id, and an array of all the id strings.
{
items: {
byId: { /**/ },
allIds: ['abc123', 'zxy456', /* etc. */],
}
}
This becomes necessary when:
all use-cases are equally likely,
performance is a concern (e.g. huge list),
The data is nested a lot and/or duplicated at different levels,
re-rendering the list as undesirable side-effects
An example of an undesirable side-effect: updating an item, which triggers a full list re-render, could lose a modal open/close state.
Pros
Items order can be tracked,
Referencing individual items is quick,
Updating an item:
Requires minimal code
Doesn't trigger a full list render since the full list loops over allIds strings,
Changing the order is quick and clear, with minimal rendering,
Adding an item is simple but requires adding it in both dataset
Avoids duplicated objects in nested data structures
Cons
Individual removal is the worse case scenario, while not a huge deal either.
A little more code needed to manage the state overall.
Might be confusing to keep both state dataset in sync.
This approach is a common normalization process used in a lot of places, here's additional references:
Redux's state normalization is a strongly recommended best practice,
The normalizr lib.
Objects in JavaScript can be used as Hashtable
(the key must be String)
Is it perform well as Hashtable the data structure?
I mean , does it implemented as Hashtable behind the scene?
Update: (1) I changed HashMap to hashtable (2) I guess most of the browser implement it the same, if not why not? is there any requirement how to implement it in the ECMAScript specs?
Update 2 : I understand, I just wonder how V8 and the Firefox JS VM implements the Object.properties getters/setters?
V8 doesn't implement Object properties access as hashtable, it actually implement it in a better way (performance wise)
So how does it work? "V8 does not use dynamic lookup to access properties. Instead, V8 dynamically creates hidden classes behind the scenes" - that make the access to properties almost as fast as accessing properties of C++ objects.
Why? because in fixed class each property can be found on a specific fixed offset location..
So in general accessing property of an object in V8 is faster than Hashtable..
I'm not sure how it works on other VMs
More info can be found here: https://v8.dev/blog/fast-properties
You can also read more regarding Hashtable in JS here:(my blog) http://simplenotions.wordpress.com/2011/07/05/javascript-hashtable/
"I guess most of the browser implement it the same, if not why not? is there any requirement how to implement it in the ECMAScript specs?"
I am no expert, but I can't think of any reason why a language spec would detail exactly how its features must be implemented internally. Such a constraint would have absolutely no purpose, since it does not impact the functioning of the language in any way other than performance.
In fact, this is absolutely correct, and is in fact the implementation-independence of the ECMA-262 spec is specifically described in section 8.6.2 of the spec:
"The descriptions in these tables indicate their behaviour for native
ECMAScript objects, unless stated otherwise in this document for particular kinds of native ECMAScript objects. Host objects may support these internal properties with any implementation-dependent behaviour as long as it is consistent with the specific host object restrictions stated in this document"
"Host objects may implement these internal methods in any manner unless specified otherwise;"
The word "hash" appears nowhere in the entire ECMA-262 specification.
(original, continued)
The implementations of JavaScript in, say, Internet Explorer 6.0 and Google Chrome's V8 have almost nothing in common, but (more or less) both conform to the same spec.
If you want to know how a specific JavaScript interpreter does something, you should research that engine specifically.
Hashtables are an efficient way to create cross references. They are not the only way. Some engines may optimize the storage for small sets (for which the overhead of a hashtable may be less efficient) for example.
At the end of the day, all you need to know is, they work. There may be faster ways to create lookup tables of large sets, using ajax, or even in memory. For example see the interesting discussion on this post from John Reseig's blog about using a trie data structure.
But that's neither here nor there. Your choice of whether to use this, or native JS objects, should not be driven by information about how JS implements objects. It should be driven only by performance comparison: how does each method scale. This is information you will get by doing performance tests, not by just knowing something about the JS engine implementation.
Most modern JS engines use pretty similar technique to speed up the object property access. The technique is based on so called hidden classes, or shapes. It's important to understand how this optimization works to write efficient JS code.
JS object looks like a dictionary, so why not use one to store the properties? Hash table has O(1) access complexity, it looks like a good solution. Actually, first JS engines have implemented objects this way. But in static typed languages, like C++ or Java a class instance property access is lightning fast. In such languages a class instance is just a segment of memory, end every property has its own constant offset, so to get the property value we just need to take the instance pointer and add the offset to it. In other words, in compile time an expression like this point.x is just replaced by its address in memory.
May be we can implement some similar technique in JS? But how? Let's look at a simple JS function:
function getX(point) {
return point.x;
}
How to get the point.x value? The first problem here is that we don't have a class (or shape) which describes the point. But we can calculate one, that is what modern JS engines do. Most of JS objects at runtime have a shape which is bound to the object. The shape describes properties of the object and where these properties values are stored. It's very similar to how a class definition describes the class in C++ or Java. It's a pretty big question, how the Shape of an object is calculated, I won't describe it here. I recommend this article which contains a great explanation of the shapes in general, and this post which explains how the things are implemented in V8. The most important thing you should know about the shapes is that all objects with the same properties which are added in the same order will have the same shape. There are few exceptions, for example if an object has a lot of properties which are frequently changed, or if you delete some of the object properties using delete operator, the object will be switched into dictionary mode and won't have a shape.
Now, let's imagine that the point object has an array of property values, and we have a shape attached to it, which describes where the x value in this property array is stored. But there is another problem - we can pass any object to the function, it's not even necessary that the object has the x property. This problem is solved by the technique called Inline caching. It's pretty simple, when getX() is executed the first time, it remembers the shape of the point and the result of the x lookup. When the function is called second time, it compares the shape of the point with the previous one. If the shape matches no lookup is required, we can take the previous lookup result.
The primary takeaway is that all objects which describe the same thing should have the same shape, i.e. they should have the same set of properties which are added in the same order. It also explains why it's better to always initialize object properties, even if they are undefined by default, here is a great explanation of the problem.
Relative resources:
JavaScript engine fundamentals: Shapes and Inline Caches and a YouTube video
A tour of V8: object representation
Fast properties in V8
JavaScript Engines Hidden Classes (and Why You Should Keep Them in Mind)
Should I put default values of attributes on the prototype to save space?
this article explains how they are implemented in V8, the engine used by Node.js and most versions of Google Chrome
https://v8.dev/blog/fast-properties
apparently the "tactic" can change over time, depending on the number of properties, going from an array of named values to a dictionary.
v8 also takes the type into account, a number or string will not be treated in the same way as an object (or function, a type of object)
if i understand this correctly a property access frequently, for example in a loop, will be cached.
v8 optimises code on the fly by observing what its actually doing, and how often
v8 will identify the objects with the same set of named properties, added in the same order (like a class constructor would do, or a repetitive bit of JSON, and handle them in the same way.
see the article for more details, then apply at google for a job :)
Objects in JavaScript can be used as Hashtable
(the key must be String)
Is it perform well as Hashtable the data structure?
I mean , does it implemented as Hashtable behind the scene?
Update: (1) I changed HashMap to hashtable (2) I guess most of the browser implement it the same, if not why not? is there any requirement how to implement it in the ECMAScript specs?
Update 2 : I understand, I just wonder how V8 and the Firefox JS VM implements the Object.properties getters/setters?
V8 doesn't implement Object properties access as hashtable, it actually implement it in a better way (performance wise)
So how does it work? "V8 does not use dynamic lookup to access properties. Instead, V8 dynamically creates hidden classes behind the scenes" - that make the access to properties almost as fast as accessing properties of C++ objects.
Why? because in fixed class each property can be found on a specific fixed offset location..
So in general accessing property of an object in V8 is faster than Hashtable..
I'm not sure how it works on other VMs
More info can be found here: https://v8.dev/blog/fast-properties
You can also read more regarding Hashtable in JS here:(my blog) http://simplenotions.wordpress.com/2011/07/05/javascript-hashtable/
"I guess most of the browser implement it the same, if not why not? is there any requirement how to implement it in the ECMAScript specs?"
I am no expert, but I can't think of any reason why a language spec would detail exactly how its features must be implemented internally. Such a constraint would have absolutely no purpose, since it does not impact the functioning of the language in any way other than performance.
In fact, this is absolutely correct, and is in fact the implementation-independence of the ECMA-262 spec is specifically described in section 8.6.2 of the spec:
"The descriptions in these tables indicate their behaviour for native
ECMAScript objects, unless stated otherwise in this document for particular kinds of native ECMAScript objects. Host objects may support these internal properties with any implementation-dependent behaviour as long as it is consistent with the specific host object restrictions stated in this document"
"Host objects may implement these internal methods in any manner unless specified otherwise;"
The word "hash" appears nowhere in the entire ECMA-262 specification.
(original, continued)
The implementations of JavaScript in, say, Internet Explorer 6.0 and Google Chrome's V8 have almost nothing in common, but (more or less) both conform to the same spec.
If you want to know how a specific JavaScript interpreter does something, you should research that engine specifically.
Hashtables are an efficient way to create cross references. They are not the only way. Some engines may optimize the storage for small sets (for which the overhead of a hashtable may be less efficient) for example.
At the end of the day, all you need to know is, they work. There may be faster ways to create lookup tables of large sets, using ajax, or even in memory. For example see the interesting discussion on this post from John Reseig's blog about using a trie data structure.
But that's neither here nor there. Your choice of whether to use this, or native JS objects, should not be driven by information about how JS implements objects. It should be driven only by performance comparison: how does each method scale. This is information you will get by doing performance tests, not by just knowing something about the JS engine implementation.
Most modern JS engines use pretty similar technique to speed up the object property access. The technique is based on so called hidden classes, or shapes. It's important to understand how this optimization works to write efficient JS code.
JS object looks like a dictionary, so why not use one to store the properties? Hash table has O(1) access complexity, it looks like a good solution. Actually, first JS engines have implemented objects this way. But in static typed languages, like C++ or Java a class instance property access is lightning fast. In such languages a class instance is just a segment of memory, end every property has its own constant offset, so to get the property value we just need to take the instance pointer and add the offset to it. In other words, in compile time an expression like this point.x is just replaced by its address in memory.
May be we can implement some similar technique in JS? But how? Let's look at a simple JS function:
function getX(point) {
return point.x;
}
How to get the point.x value? The first problem here is that we don't have a class (or shape) which describes the point. But we can calculate one, that is what modern JS engines do. Most of JS objects at runtime have a shape which is bound to the object. The shape describes properties of the object and where these properties values are stored. It's very similar to how a class definition describes the class in C++ or Java. It's a pretty big question, how the Shape of an object is calculated, I won't describe it here. I recommend this article which contains a great explanation of the shapes in general, and this post which explains how the things are implemented in V8. The most important thing you should know about the shapes is that all objects with the same properties which are added in the same order will have the same shape. There are few exceptions, for example if an object has a lot of properties which are frequently changed, or if you delete some of the object properties using delete operator, the object will be switched into dictionary mode and won't have a shape.
Now, let's imagine that the point object has an array of property values, and we have a shape attached to it, which describes where the x value in this property array is stored. But there is another problem - we can pass any object to the function, it's not even necessary that the object has the x property. This problem is solved by the technique called Inline caching. It's pretty simple, when getX() is executed the first time, it remembers the shape of the point and the result of the x lookup. When the function is called second time, it compares the shape of the point with the previous one. If the shape matches no lookup is required, we can take the previous lookup result.
The primary takeaway is that all objects which describe the same thing should have the same shape, i.e. they should have the same set of properties which are added in the same order. It also explains why it's better to always initialize object properties, even if they are undefined by default, here is a great explanation of the problem.
Relative resources:
JavaScript engine fundamentals: Shapes and Inline Caches and a YouTube video
A tour of V8: object representation
Fast properties in V8
JavaScript Engines Hidden Classes (and Why You Should Keep Them in Mind)
Should I put default values of attributes on the prototype to save space?
this article explains how they are implemented in V8, the engine used by Node.js and most versions of Google Chrome
https://v8.dev/blog/fast-properties
apparently the "tactic" can change over time, depending on the number of properties, going from an array of named values to a dictionary.
v8 also takes the type into account, a number or string will not be treated in the same way as an object (or function, a type of object)
if i understand this correctly a property access frequently, for example in a loop, will be cached.
v8 optimises code on the fly by observing what its actually doing, and how often
v8 will identify the objects with the same set of named properties, added in the same order (like a class constructor would do, or a repetitive bit of JSON, and handle them in the same way.
see the article for more details, then apply at google for a job :)
Javascript arrays used to be just objects with some special syntax, but JIT engines can optimize the array to behave like a true array with O(1) access rather than O(log n) access. I am currently writing a cpu intensive loop where I need to a certain I get O(1) access rather than O(log n). But the problem is that I may need to do array [5231] = obj1 right after its created but the array will eventually be filled. I am worried that such behaviour may trick the JIT to thinking that I am using it as a sparse array and thus I wouldnt get O(1) access time I require. My question is that is there ways I can tell or at least hint the javascript engine that I want a true array?
In particular, do I need to initialize the array with values first? Like fill all of it with reference to a dummy object (my array will only contain references to objects of same prototype or undefined). Would just set the array.length = 6000 be enough?
EDIT: based on http://jsperf.com/sparse-array-v-true-array, it seems filling the array beforehand is a good idea.
I realized I have made a big derp on my profiling test because it is an animation program with varying frame rate and my profiling was focused on detecting where the cpu intensive parts are. It was not suitable in comparing between the two methods as one method would simply run slower but the relative cpu time between different function calls would be the same.
Anyhow, I wrote a test on jspref: http://jsperf.com/sparse-array-v-true-array and (unless I made an error , which so, please do correct me!) A randomly accessed array without filling it first runs twice slower in Chrome and Firefox compared to filled version. So it seems that it is a good idea to fill it in first if you are writing things like a spatial hashmap.
[UPDATE The discussion of "slow mode" below also pointed me to Why the convertToFastObject function make it fast?, which has a nice discussion of the V8 internal discussion that #benjamin-gruenbaum referenced.]
I was wondering about the performance of JavaScript one for optimizing reads, so I wrote a JSPerf test suite, and the results are non-intuitive.
Firstly, the question "Performance of key lookup in JavaScript object" discusses the internals of V8s object creation, but I don't see why the method of creation would change the read performance after it has been created.
Overview
I want to test read access on various data structures. I chose 3 structures:
An object literal, created and assigned in one statement
An object literal, created in one statement, then values assigned in subsequent statements
An array literal, created in one statement, but populated as an associative array, which presumably gets translated to an object in the background.
Test
http://jsperf.com/object-read-varies-by-creation-method
Methodology
In the setup method. I create 3 data structures and populate them with about random words from /usr/share/dict/words, in randomized order (but the same order on each test run). Each test case retrieves the same 10 random words from /usr/share/dict/words, but not necessarily words that are in the list, to simulate misses as well as hits.
Results
What I see on most browsers is pretty intuitive: the two objects perform better than the pseudo-associative array. My assumption is that the array literal is being proxied by an object in those cases, increasing the overall time it takes to access.
What I don't get is the Chrome 33 case, where creating and assigning an object as a single statement performs dramatically better than populating the structure after creation.
If the test included setup time, I could understand that difference, but as it stands, I don't fully get why the two object literals don't perform about the same.
The best guess I've come up with so far is that Chrome's object creation algorithm is able to optimize its hash tree better during an act of creation that includes all keys at once then it is when it's inserting the keys one at a time after the fact.
Can anyone more familiar with these structures account for the difference?