I've been looking for a straight answer for this (I can think of lots of possiblities, but I'd like to know the true reason):
jQuery provides a .data() method for associating data with DOM Element objects. What makes this necessary? Is there a problem adding properties (or methods) directly to DOM Element Objects? What is it?
Is there a problem adding properties (or methods) directly to DOM Element Objects?
Potentially.
There is no web standard that says you can add arbitrary properties to DOM nodes. They are ‘host objects’ with browser-specific implementations, not ‘native JavaScript objects’ which according to ECMA-262 you can do what you like with. Other host objects will not allow you to add arbitrary properties.
In reality since the earliest browsers did allow you to do it, it's a de facto standard that you can anyway... unless you deliberately tell IE to disallow it by setting document.expando= false. You probably wouldn't do that yourself, but if you're writing a script to be deployed elsewhere it might concern you.
There is a practical problem with arbitrary-properties in that you don't really know that the arbitrary name you have chosen doesn't have an existing meaning in some browser you haven't tested yet, or in a future version of a browser or standard that doesn't exist yet. Add a property element.sausage= true, and you can't be sure that no browser anywhere in space and time will use that as a signal to engage the exciting DOM Sausage Make The Browser Crash feature. So if you do add an arbitrary property, make sure to give it an unlikely name, for example element._mylibraryname_sausage= true. This also helps prevent namespace conflicts with other script components that might add arbitrary properties.
There is a further problem in IE in that properties you add are incorrectly treated as attributes. If you serialise the element with innerHTML you'll get an unexpected attribute in the output, eg. <p _mylibraryname_sausage="true">. Should you then assign that HTML string to another element, you'll get a property in the new element, potentially confusing your script.
(Note this only happens for properties whose values are simple types; Objects, Arrays and Functions do not show up in serialised HTML. I wish jQuery knew about this, because the way it works around it to implement the data method is absolutely terrible, results in bugs, and slows down many simple DOM operations.)
I think you can add all the properties you want, as long as you only have to use them yourself and the property is not a method or some object containing methods. What's wrong with that is that methods can create memory leaks in browsers. Especially when you use closures in such methods, the browser may not be able to complete garbage cleaning which causing scattered peaces of memory to stay occupied.
This link explains it nicely.
here you'll find a description of several common memory leak patterns
It has to do with the fact that DOM in IE is not managed by JScript, which makes it completely different environment to access. This leads to the memory leaks http://www.crockford.com/javascript/memory/leak.html. Another reason is that, when people use innerHTML to copy nodes, all those added properties are not transfered.
Related
JavaScript lets you add arbitrary properties and methods to any object, including DOM nodes. Assuming the property was well namespaced (something like _myLib_propertyName) so that it would be unlikely to create a conflict, are there any good reasons not to stash data in DOM nodes?
Are there any good use cases for doing so?
I imagine doing this frequently could contribute to a sloppy coding style or code that is confusing or counter-intuitive, but it seems like there would also be times when tucking "custom" properties into DOM nodes would be an effective and expedient technique.
No, it is usually a bad idea to store your own properties on DOM nodes.
DOM nodes are host objects and host objects can do what they like. Specifically, there is no requirement in the ECMAScript spec for host objects to allow this kind of extension, so browsers are not obliged to allow it. In particular, new browsers may choose not to, and existing code relying on it will break.
Not all host objects in existing browsers allow it. For example, text nodes and all ActiveX objects (such as XMLHttpRequest and XMLDOM objects, used for parsing XML) in IE do not, and failure behaviour varies from throwing errors to silent failure.
In IE, the ability to add properties can be switched off for a whole document, including all nodes within the document, with the line document.expando = false;. Thus if any of the code in your page includes this line, all code relying on adding properties to DOM nodes will fail.
I think more than anything, the best reason not to store data in the DOM is because the DOM is meant to represent content structure and styling for an HTML page. While you could surely add data to the nodes in a well-namespaced manner as to avoid conflicts, you're introducing data state into the visual representation.
I'm trying to find an example for or against storing data in the DOM, and scratching my head to find a convincing case. In the past, I've found that the separation of the data state from the visual state has saved headache during re-designs of sites I've worked on.
If you look at HTML 5 there is the data attribute for fields that will allow you to store information for the field.
Don't be surprised to run into trouble with IE 6 & 7. They are very inconsistent with setAttribute vs set as a property.
And if you're not careful you could set circular references and give yourself a memory leak.
http://www.ibm.com/developerworks/web/library/wa-memleak/
I always set node properties as a last resort, and when I do I'm extra careful with my code and test more heavily than usual.
For what it's worth, the D3.js library makes extensive use of custom properties on DOM nodes. In particular for the following packages that introduce interaction behaviors:
https://github.com/d3/d3-zoom - the __zoom property
https://github.com/d3/d3-brush - the __brush property
D3 also provides a mechanism for putting your own custom properties on DOM elements called d3.local. The point of that utility is to generate property names that are guaranteed not to conflict with the DOM API.
The primary risk I can see associated with custom properties on DOM nodes is accidentally picking a name that already has some meaning in the DOM API. If you use d3.local or maybe prefix everything with __, that should avoid conflicts.
Objects in JavaScript can be used as Hashtable
(the key must be String)
Is it perform well as Hashtable the data structure?
I mean , does it implemented as Hashtable behind the scene?
Update: (1) I changed HashMap to hashtable (2) I guess most of the browser implement it the same, if not why not? is there any requirement how to implement it in the ECMAScript specs?
Update 2 : I understand, I just wonder how V8 and the Firefox JS VM implements the Object.properties getters/setters?
V8 doesn't implement Object properties access as hashtable, it actually implement it in a better way (performance wise)
So how does it work? "V8 does not use dynamic lookup to access properties. Instead, V8 dynamically creates hidden classes behind the scenes" - that make the access to properties almost as fast as accessing properties of C++ objects.
Why? because in fixed class each property can be found on a specific fixed offset location..
So in general accessing property of an object in V8 is faster than Hashtable..
I'm not sure how it works on other VMs
More info can be found here: https://v8.dev/blog/fast-properties
You can also read more regarding Hashtable in JS here:(my blog) http://simplenotions.wordpress.com/2011/07/05/javascript-hashtable/
"I guess most of the browser implement it the same, if not why not? is there any requirement how to implement it in the ECMAScript specs?"
I am no expert, but I can't think of any reason why a language spec would detail exactly how its features must be implemented internally. Such a constraint would have absolutely no purpose, since it does not impact the functioning of the language in any way other than performance.
In fact, this is absolutely correct, and is in fact the implementation-independence of the ECMA-262 spec is specifically described in section 8.6.2 of the spec:
"The descriptions in these tables indicate their behaviour for native
ECMAScript objects, unless stated otherwise in this document for particular kinds of native ECMAScript objects. Host objects may support these internal properties with any implementation-dependent behaviour as long as it is consistent with the specific host object restrictions stated in this document"
"Host objects may implement these internal methods in any manner unless specified otherwise;"
The word "hash" appears nowhere in the entire ECMA-262 specification.
(original, continued)
The implementations of JavaScript in, say, Internet Explorer 6.0 and Google Chrome's V8 have almost nothing in common, but (more or less) both conform to the same spec.
If you want to know how a specific JavaScript interpreter does something, you should research that engine specifically.
Hashtables are an efficient way to create cross references. They are not the only way. Some engines may optimize the storage for small sets (for which the overhead of a hashtable may be less efficient) for example.
At the end of the day, all you need to know is, they work. There may be faster ways to create lookup tables of large sets, using ajax, or even in memory. For example see the interesting discussion on this post from John Reseig's blog about using a trie data structure.
But that's neither here nor there. Your choice of whether to use this, or native JS objects, should not be driven by information about how JS implements objects. It should be driven only by performance comparison: how does each method scale. This is information you will get by doing performance tests, not by just knowing something about the JS engine implementation.
Most modern JS engines use pretty similar technique to speed up the object property access. The technique is based on so called hidden classes, or shapes. It's important to understand how this optimization works to write efficient JS code.
JS object looks like a dictionary, so why not use one to store the properties? Hash table has O(1) access complexity, it looks like a good solution. Actually, first JS engines have implemented objects this way. But in static typed languages, like C++ or Java a class instance property access is lightning fast. In such languages a class instance is just a segment of memory, end every property has its own constant offset, so to get the property value we just need to take the instance pointer and add the offset to it. In other words, in compile time an expression like this point.x is just replaced by its address in memory.
May be we can implement some similar technique in JS? But how? Let's look at a simple JS function:
function getX(point) {
return point.x;
}
How to get the point.x value? The first problem here is that we don't have a class (or shape) which describes the point. But we can calculate one, that is what modern JS engines do. Most of JS objects at runtime have a shape which is bound to the object. The shape describes properties of the object and where these properties values are stored. It's very similar to how a class definition describes the class in C++ or Java. It's a pretty big question, how the Shape of an object is calculated, I won't describe it here. I recommend this article which contains a great explanation of the shapes in general, and this post which explains how the things are implemented in V8. The most important thing you should know about the shapes is that all objects with the same properties which are added in the same order will have the same shape. There are few exceptions, for example if an object has a lot of properties which are frequently changed, or if you delete some of the object properties using delete operator, the object will be switched into dictionary mode and won't have a shape.
Now, let's imagine that the point object has an array of property values, and we have a shape attached to it, which describes where the x value in this property array is stored. But there is another problem - we can pass any object to the function, it's not even necessary that the object has the x property. This problem is solved by the technique called Inline caching. It's pretty simple, when getX() is executed the first time, it remembers the shape of the point and the result of the x lookup. When the function is called second time, it compares the shape of the point with the previous one. If the shape matches no lookup is required, we can take the previous lookup result.
The primary takeaway is that all objects which describe the same thing should have the same shape, i.e. they should have the same set of properties which are added in the same order. It also explains why it's better to always initialize object properties, even if they are undefined by default, here is a great explanation of the problem.
Relative resources:
JavaScript engine fundamentals: Shapes and Inline Caches and a YouTube video
A tour of V8: object representation
Fast properties in V8
JavaScript Engines Hidden Classes (and Why You Should Keep Them in Mind)
Should I put default values of attributes on the prototype to save space?
this article explains how they are implemented in V8, the engine used by Node.js and most versions of Google Chrome
https://v8.dev/blog/fast-properties
apparently the "tactic" can change over time, depending on the number of properties, going from an array of named values to a dictionary.
v8 also takes the type into account, a number or string will not be treated in the same way as an object (or function, a type of object)
if i understand this correctly a property access frequently, for example in a loop, will be cached.
v8 optimises code on the fly by observing what its actually doing, and how often
v8 will identify the objects with the same set of named properties, added in the same order (like a class constructor would do, or a repetitive bit of JSON, and handle them in the same way.
see the article for more details, then apply at google for a job :)
I've been writing Javascript for years, but just now realized I don't entirely understand how these "HTML-y" methods actually interact with the DOM.
My assumption is that when HTML is interpreted into the DOM, the browser keeps references as to which HTML element became which DOM node.
So for .getElementById when you select an element, the browser returns the relevant node by reference. Why not just do .getNodeById then and be more correct?
My assumption is that .innerHTML is interpreted by the browser into the DOM at runtime. However, in older browsers like IE7, .innerHTML cannot be used to create table nodes. This suggests to me that it was originally intended just to modify the text properties of existing nodes, but then it seems strange that it exists at all, and we don't just use .innerText.
I guess I'm just confused by some Javascript history.
Well, the thing is, things like .innerHTML and .getElementById aren't really JavaScript objects at all. They're what the language specification calls "Host Objects" - it's a form of FFI. What happens when you call .getElementById in most browsers is that the string is marshalled from a JS string to a DOM string and then passed to C++ code.
In fact, they're allowed to have pretty much whatever semantics they please - the language makes no guarantees about how they act (this is also true for other DOM objects like timers and XHR).
well if you really want to know about the internals and want to digg into the deep rabbit hole of rendering engines i'd give you a start by pointing you to certain source files of chromium:
http://src.chromium.org/viewvc/blink/trunk/Source/core/dom/Element.cpp
you could find stuff like innerHTML and how it actually works in it. (around line 2425)
.getElementById() will just resolve to a document element that matches the specified id and will return a reference to it.
its some kind of identifier with prototyped methods, getters and setters to interact with that element. as soon as you interact with methods like innerHTML etc. you will simply tell the engine to execute that action on the element that is referenced inside the element reference.
just digg around inside the source code. you might get a good insight:
http://src.chromium.org/viewvc/blink/trunk/Source/core/dom/
Objects in JavaScript can be used as Hashtable
(the key must be String)
Is it perform well as Hashtable the data structure?
I mean , does it implemented as Hashtable behind the scene?
Update: (1) I changed HashMap to hashtable (2) I guess most of the browser implement it the same, if not why not? is there any requirement how to implement it in the ECMAScript specs?
Update 2 : I understand, I just wonder how V8 and the Firefox JS VM implements the Object.properties getters/setters?
V8 doesn't implement Object properties access as hashtable, it actually implement it in a better way (performance wise)
So how does it work? "V8 does not use dynamic lookup to access properties. Instead, V8 dynamically creates hidden classes behind the scenes" - that make the access to properties almost as fast as accessing properties of C++ objects.
Why? because in fixed class each property can be found on a specific fixed offset location..
So in general accessing property of an object in V8 is faster than Hashtable..
I'm not sure how it works on other VMs
More info can be found here: https://v8.dev/blog/fast-properties
You can also read more regarding Hashtable in JS here:(my blog) http://simplenotions.wordpress.com/2011/07/05/javascript-hashtable/
"I guess most of the browser implement it the same, if not why not? is there any requirement how to implement it in the ECMAScript specs?"
I am no expert, but I can't think of any reason why a language spec would detail exactly how its features must be implemented internally. Such a constraint would have absolutely no purpose, since it does not impact the functioning of the language in any way other than performance.
In fact, this is absolutely correct, and is in fact the implementation-independence of the ECMA-262 spec is specifically described in section 8.6.2 of the spec:
"The descriptions in these tables indicate their behaviour for native
ECMAScript objects, unless stated otherwise in this document for particular kinds of native ECMAScript objects. Host objects may support these internal properties with any implementation-dependent behaviour as long as it is consistent with the specific host object restrictions stated in this document"
"Host objects may implement these internal methods in any manner unless specified otherwise;"
The word "hash" appears nowhere in the entire ECMA-262 specification.
(original, continued)
The implementations of JavaScript in, say, Internet Explorer 6.0 and Google Chrome's V8 have almost nothing in common, but (more or less) both conform to the same spec.
If you want to know how a specific JavaScript interpreter does something, you should research that engine specifically.
Hashtables are an efficient way to create cross references. They are not the only way. Some engines may optimize the storage for small sets (for which the overhead of a hashtable may be less efficient) for example.
At the end of the day, all you need to know is, they work. There may be faster ways to create lookup tables of large sets, using ajax, or even in memory. For example see the interesting discussion on this post from John Reseig's blog about using a trie data structure.
But that's neither here nor there. Your choice of whether to use this, or native JS objects, should not be driven by information about how JS implements objects. It should be driven only by performance comparison: how does each method scale. This is information you will get by doing performance tests, not by just knowing something about the JS engine implementation.
Most modern JS engines use pretty similar technique to speed up the object property access. The technique is based on so called hidden classes, or shapes. It's important to understand how this optimization works to write efficient JS code.
JS object looks like a dictionary, so why not use one to store the properties? Hash table has O(1) access complexity, it looks like a good solution. Actually, first JS engines have implemented objects this way. But in static typed languages, like C++ or Java a class instance property access is lightning fast. In such languages a class instance is just a segment of memory, end every property has its own constant offset, so to get the property value we just need to take the instance pointer and add the offset to it. In other words, in compile time an expression like this point.x is just replaced by its address in memory.
May be we can implement some similar technique in JS? But how? Let's look at a simple JS function:
function getX(point) {
return point.x;
}
How to get the point.x value? The first problem here is that we don't have a class (or shape) which describes the point. But we can calculate one, that is what modern JS engines do. Most of JS objects at runtime have a shape which is bound to the object. The shape describes properties of the object and where these properties values are stored. It's very similar to how a class definition describes the class in C++ or Java. It's a pretty big question, how the Shape of an object is calculated, I won't describe it here. I recommend this article which contains a great explanation of the shapes in general, and this post which explains how the things are implemented in V8. The most important thing you should know about the shapes is that all objects with the same properties which are added in the same order will have the same shape. There are few exceptions, for example if an object has a lot of properties which are frequently changed, or if you delete some of the object properties using delete operator, the object will be switched into dictionary mode and won't have a shape.
Now, let's imagine that the point object has an array of property values, and we have a shape attached to it, which describes where the x value in this property array is stored. But there is another problem - we can pass any object to the function, it's not even necessary that the object has the x property. This problem is solved by the technique called Inline caching. It's pretty simple, when getX() is executed the first time, it remembers the shape of the point and the result of the x lookup. When the function is called second time, it compares the shape of the point with the previous one. If the shape matches no lookup is required, we can take the previous lookup result.
The primary takeaway is that all objects which describe the same thing should have the same shape, i.e. they should have the same set of properties which are added in the same order. It also explains why it's better to always initialize object properties, even if they are undefined by default, here is a great explanation of the problem.
Relative resources:
JavaScript engine fundamentals: Shapes and Inline Caches and a YouTube video
A tour of V8: object representation
Fast properties in V8
JavaScript Engines Hidden Classes (and Why You Should Keep Them in Mind)
Should I put default values of attributes on the prototype to save space?
this article explains how they are implemented in V8, the engine used by Node.js and most versions of Google Chrome
https://v8.dev/blog/fast-properties
apparently the "tactic" can change over time, depending on the number of properties, going from an array of named values to a dictionary.
v8 also takes the type into account, a number or string will not be treated in the same way as an object (or function, a type of object)
if i understand this correctly a property access frequently, for example in a loop, will be cached.
v8 optimises code on the fly by observing what its actually doing, and how often
v8 will identify the objects with the same set of named properties, added in the same order (like a class constructor would do, or a repetitive bit of JSON, and handle them in the same way.
see the article for more details, then apply at google for a job :)
Is it possible to augment a DOM element by adding a new property to it as augmenting a normal JavaScript object? For example, can we add a new property of a button or an input, say a value that indicates when was the last time user click on it?
Yes. You can sometimes find it described as bad form. The obvious drawback is that in later versions of Web standards, the DOM element may have new properties added to it which will clash with your custom property, so if you do use this technique, choose a name that is unlikely to clash, e.g. incorporate your company name into it.
Another potential problem is that DOM objects in some browsers are not properly garbage collected. Instead they are reference counted. This means that if you create a system of circular references through custom properties, you may cause a memory leak.
For example, this may cause a memory leak, because there is a circular reference between the DOM and javascript objects:
var myJsObj = {};
var myDomElt = document.getElementById('myId');
myJsObj.domElt = myDomElt;
myDomElt.jsObj = myJsObj;
It is, however, safe to add a property that just holds a primitive value:
myDomElt.fullTitle = "My DOM Element";
Although the warning that the DOM standard may change is still important; what if HTML7 defines a fullTitle property on DOM elements? Your site will have unpredictable behaviour.
An alternative is to use the $.data feature of jQuery:
$(myDomElt).data('fullTitle', 'My Dom Element');
var retrieved = $(myDomElt).data('fullTitle');
It's possible but is a really bad idea. Daniel Earwicker's answer mentions some reasons why it's a bad idea. I'd add these:
DOM nodes are host objects and host objects can do what they like. Specifically, there is no requirement in the ECMAScript spec for host objects to allow this kind of extension, so browsers are not obliged to allow it. In particular, new browsers may choose not to, and existing code relying on it will break.
Not all host objects in existing browsers allow it. For example, text nodes and all ActiveX objects (such as XMLHttpRequest in older IE and XMLDOM objects, used for parsing XML) in IE do not, and failure behaviour varies from throwing errors to silent failure.
In IE, the ability to add properties can be switched off for a whole document, including all nodes within the document, with the line document.expando = false;. Thus if any of the code in your page includes this line, all code relying on adding properties to DOM nodes will fail.