javascript cost of comparison with undefined - javascript

While messing around with JavaScript I found that comparing an array element with undefined was very interesting. Considering:
L = [1,2,3];
if (L[1] == undefined)
console.log('no element for key 1');
else
console.log('Value for key 1'+L[1]);
I think thats an awesome way to check for values in sequences in JavaScript, instead of iterating over sequences or other containers, but my question is: is that error prone or not efficient? Whats the cost of such comparison?

The code does not test if a particular value exists; it tests if an [Array] index was assigned a non-undefined value. (It will also incorrectly detect some false-positive values like null due to using ==, but that's for another question ..)
Consider this:
L = ["hello","world","bye"]
a = L["bye"]
b = L[1]
What is the value of a and what does it say about "bye"? What is the value of b and how does 1 relate to any of the values which may (or may not) exist as elements of L?
That is, iterating an Array - to find a value of unknown index? to perform an operation on multiple values? - and accessing an element by index are two different operations and cannot be generally interchanged.
On the flip side, object properties can be used to achieve a similar (but useful) effect:
M = {hello: 1, world: 1, bye: 1}
c = M["hello"]
What is the value of c now? How does the value used as the key relate to the data?
In this case the property name (used as a lookup key) relates to the data being checked and can say something useful about it - yes, there is a "hello"! (This can detect some false-positives without using hasOwnProperty, but that's for another question ..)
And, of course .. for a small sequence, or an infrequent operation, iterating (or using a handy method like Array.indexOf or Array.some) to find existence of a value is "just fine" and will not result in a "performance impact".

In V8 accessing array out of bounds to elicit undefined is unspeakably slow because for instance if it's done in optimized code, the optimized code is thrown away and deoptimized. In other languages an exception would be thrown which is very very slow too or in unmanaged language your program would have undefined behavior and for example crash if you're lucky.
So always check the .length of the collection to ensure you don't do out of bounds accesses.
Also, for performance prefer void 0 over undefined as it is a compile time constant rather than runtime variable lookup.

Related

Isn't string.length actually a method in JavaScript?

I would like to get a better understanding of what is actually going on when I find the length of a string. I tried looking on W3, ECMA, and at the V8 Ignition website but not much luck.
I keep reading that 'JavaScript treats primitive values as objects when executing methods and properties.' But, I can't seem to find out how exactly this happens. If I call a method/property on a primitive which, I assume gets interpreted as an object by Ignition, doesn't the String class need to call a function at some point to iterate the string? I feel like myString.length should be called a method and String.length could MAYBE be called a property, depending on at which point the "property" is found and how it's found.
Basically, I don't understand why it's touted as a property if it doesn't seem to be inherent and has to be fetched/determined. That seems like a method to me (let alone the fact that string.length) isn't even a real thing and is interpreted.
(V8 developer here.)
I can see several issues here that can be looked at separately:
1. From a language specification perspective, is something a method or a property?
Intuitively, the distinction is: if you write a function call like obj.method(), then it's a method; if you write obj.property (no ()), then it's a property.
Of course in JavaScript, you could also say that everything is a property, and in case the current value of the property is a function, then that makes it a method. So obj.method gets you a reference to that function, and obj.method() gets and immediately calls it:
var obj = {};
obj.foo = function() { console.log("function called"); return 42; }
var x = obj.foo(); // A method!
var func = obj.foo; // A property!
x = func(); // A call!
obj.foo = 42;
obj.foo(); // A TypeError!
2. When it looks like a property access, is it always a direct read/write from/to memory, or might some function get executed under the hood?
The latter. JavaScript itself even provides this capability to objects you can create:
var obj = {};
Object.defineProperty(obj, "property", {
get: function() { console.log("getter was called"); return 42; },
set: function(x) { console.log("setter was called"); }
});
// *Looks* like a pair of property accesses, but will call getter and setter:
obj.property = obj.property + 1;
The key is that users of this obj don't have to care that getters/setters are involved, to them .property looks like a property. This is of course very much intentional: implementation details of obj are abstracted away; you could modify the part of the code that sets up obj and its .property from a plain property to a getter/setter pair or vice versa without having to worry about updating other parts of the code that read/write it.
Some built-in objects rely on this trick, the most common example is arrays' .length: while it's specified to be a property with certain "magic" behavior, the most straightforward way for engines to implement this is to use a getter/setter pair under the hood, where in particular the setter does the work of truncating any extra array elements if you set the length to a smaller value than before.
3. So what does "abc".length do in V8?
It reads a property directly from memory. All strings in V8 always have a length field internally. As commenters have pointed out, JavaScript strings are immutable, so the internal length field is written only once (when the string is created), and then becomes a read-only property.
Of course this is an internal implementation detail. Hypothetically, an engine could use a "C-style" string format internally, and then it would have to use a strlen()-like function to determine a string's length when needed. However, on a managed heap, being able to quickly determine each object's size is generally important for performance, so I'd be surprised if an engine actually made this choice. "Pascal-style" strings, where the length is stored explicitly, are more suitable for JavaScript and similar garbage-collected languages.
So, in particular, I'd say it's fair to assume that reading myString.length in JavaScript is always a very fast operation regardless of the string's length, because it does not iterate the string.
4. What about String.length?
Well, this doesn't have anything to do with strings or their lengths :-)
String is a function (e.g. you can call String(123) to get "123"), and all functions have a length property describing their number of formal parameters:
function two_params(a, b) { }
console.log(two_params.length); // 2
As for whether that's a "simple property" or a getter under the hood: there's no reason to assume that it's not a simple property, but there's also no reason to assume that engines can't internally do whatever they want (so long as there's no observable functional difference) if they think it increases performance or saves memory or simplifies things or improves some other metric they care about :-)
(And engines can and do make use of this freedom, for various forms of "lazy"/on-demand computation, caching, optimization -- there are plenty of internal function calls that you probably wouldn't expect, and on the flip side what you "clearly see" as a function call in the JS source might (or might not!) get inlined or otherwise optimized away. The details change over time, and across different engines.)
Length is not a method, it is a property. It doesn't actually do anything but return the length of an array, a string, or the number of parameters expected by a function. When you use .length, you are just asking the JavaScript interpreter to return a variable stored within an object; you are not calling a method.
Also, note that the String.length property gives the actual number of code units in a string, rather than a literal character count. One code unit is 16 bits as defined by UTF-16 (used by JavaScript). However, some special characters use 32 bits which means that in a string containing one of these characters the String.length property might give you a higher character count than the literal number of characters.
Link:- https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/length
And also one fact length work very different with string.length from Array.length
let myString = "bluebells";
myString.length = 4;
console.log(myString); //bluebells
console.log(myString.length); //9
//--
let myArr = [5,6,8,2,4,7];
myArr.length = 2;
console.log(myArr); //[5, 6]
console.log(myArr.length); //2

What Does the Javascript Spec say should be returned from an out of bounds array index on a NodeList

I recently started a new job working on a browser-based application and am still learning the finer points of Javascript. I've got a question about what a JS interpreter following the spec should do with the following code snippet.
node_list_foo = document.getElementsByName("bar");
//node_list_foo is now a node list with a length of lets say 3.
var element_from_item = NodeListFoo.item(4);
//element_from_item is now NULL according to [url]https://dom.spec.whatwg.org/#interface-nodelist[/url]
var element_from_index = NodeListFoo[4];
// my broswers are setting element_from_index to undefined in this case, but what should it be?
https://dom.spec.whatwg.org/#interface-nodelist appears to defer defining the behavior of out of range indices to the definition of supported-property-indices, am I reading that right?
https://heycam.github.io/webidl/#dfn-supported-property-indices has this to say about using array style syntax to access objects with a getter. NodeList has a relevant getter - item(index).
If an indexed property getter was specified using an operation with an >identifier, then the value returned when indexing the object with a given >supported property index is the value that would be returned by invoking the >operation, passing the index as its only argument. If the operation used to >declare the indexed property getter did not have an identifier, then the >interface definition must be accompanied by a description of how to determine >the value of an indexed property for a given index.
If I'm reading that right, the spec says the behavior of the two access methods should be the same? So both should return undefined?
Curiously, https://developer.mozilla.org/en-US/docs/Web/API/NodeList, documents the observed behavior and not the behavior that I think the spec calls for.
What's the correct interpretation of the spec and is there another approach I should take to finding out how these kinds of things work?
Finally, if my interpretation of the spec is correct; nobody seems to actually follow the spec for using array style syntax; so as a paranoid and perfectionist programmer should I be using nodelist.item(x) as the preferred approach?

Storing components in an Entity System

Note: this introduction is about entity systems. But, even if you don't know what these are, or haven't implemented them yourself, it's pretty basic and if you have general Javascript experience you will probably qualify more than enough to answer.
I am reading articles about Entity Systems on the T=machine blog.
The author, Adam, suggests that an entity should just be an id, that can be used to obtain it's components (ie, the actual data that the entity is supposed to represent).
I chose the model where all entities should be stored in "one place", and my primary suspects for implementing this storage are the array-of-arrays approach many people use, which would imply dynamic entity id's that represent the index of a component belonging to an entity, while components are grouped by type in that "one place" (from now on I'll just call it "storage"), which I plan to implement as a Scene. The Scene would be an object that handles entity composition, storage, and can do some basic operations on entities (.addComponent(entityID, component) and such).
I am not concerned about the Scene object, I'm pretty sure that it's a good design, but what I am not sure is the implementation of the storage.
I have two options:
A) Go with the array-of-array approach, in which the storage looks like this:
//storage[i][j] - i denotes component type, while j denotes the entity, this returns a component instance
//j this is the entity id
[
[ComponentPosition, ComponentPosition, ComponentPosition],
[ComponentVelocity, undefined, ComponentVelocity],
[ComponentCamera, undefined, undefined]
]
//It's obvious that the entity `1` doesn't have the velocity and camera components, for example.
B) Implement the storage object as a dictionary (technically an object in Javascript)
{
"componentType":
{
"entityId": ComponentInstance
}
}
The dictionary approach would imply that entity id's are static, which seems like a very good thing for implementing game loops and other functionality outside the Entity System itself. Also, this means that systems could easily store an array of entity ids that they are interested in. The entityId variable would also be a string, as opposed to an integer index, obviously.
The reason why I am against array-of-arrays approach is that deleting entities would make other entity ids change when a single entity is deleted.
Actual implementation details may wary, but I would like to know which approach would be better performance wise?
Things that I am also interested in (please be as cross-platform as possible, but if needed be, use V8 as an example):
How big is the overhead when accessing properties, and how is that implemented under the hoof? Lets say that they are being access from inside the local scope.
What is undefined in memory, and how much does it take? I ask this, because in the array-of-arrays approach all of the inner arrays must be of the same length, and if an entity doesn't have a certain component, that field is set to undefined.
Don't worry about the Array. It is an Object in JavaScript i.e. no "real" arrays, it's just the indices are a numeric "names" for the properties of the object (dictionary, hash, map).
The idea is simple, an Array has a length property that allows for loops to know where to stop iterating. By simply removing an element off the Array (remember, it's an Object) the length property doesn't actually change. So...
// create an array object
var array = ['one','two', 'three'];
console.log(array.length); // 3
// these don't actually change the length
delete array['two']; // 'remove' the property with key 'two'
console.log(array.length); // 3
array['two'] = undefined; // put undefined as the value to the property with key 'two'
console.log(array.length); // 3
array.splice(1,1); // remove the second element, and reorder
console.log(array.length); // 2
console.log(array); // ['one','three']
You've got to realize that JavaScript doesn't "work" like you expect. Performance wise objects and arrays are same i.e. arrays are accessed like dictionaries;
Scope is not like other "c style" languages. There are only global and function scopes i.e. no block scope (never write for(var i) inside another for(var i));
undefined in memory takes exactly the same amount as null . The difference is that null is deliberate missing of value, while undefined is just accidental (non-deliberate) missing;
Don't check if a field exists by doing if(array['two']) because, a field can actually hold the falsy values of undefined, null, 0, "", false and evaluate as false. Always check with if('two' in array);
When looping with for(key in array) always use if(array.hasOwnProperty(key)) so you don't iterate over a prototype's property (the parent's in a manner of speaking). Also, objects created by a constructor function might loop with the 'constructor' key also.

calling 5 times foo.length() more efficient or setting it to a variable and then calling it more efficient

I am new to javascript and I have a situation like where I check like
if(foo.length>0){
// do something...
}
I do it in 8 places. I wanted to know if its equal to i set foo.length() condition to a variable and then use that variable in if statement. Which one is efficient and how can I do it..?
Is there any way to perform this kind of task in javascript which is more efficient like in java we have apache commons stringUtils which is efficient, easy to use.
I would store it in a variable just for better readability, and a slightly smaller footprint. Now as a bonus you get imperceptibly better performance - and nobody can accuse you of unnecessary micro-optimization when you throw those arguments at them :) (Yes, I probably need help).
The performance gain is negligible. The string object already has the information about the length of the string, so using the length property doesn't do something like looping through all the characters in the string to find the end of it.
There is actually a drawback with putting the length of the string in a variable. You get two variables that are loosly coupled, and changing the string requires you to also update the length variable for the code to work properly. If you miss to update the length variable somewhere you will have inconsistent values.
Also, keeping the length variable updated can reduce the performance rather than increase it. You have to update the length variable every time the string changes, even if you don't use the length in between. You may end up reading the length properties more often than you would if you didn't have the variable.
Edit:
An actual performance test shows that the performance differs between browsers, but using the length property is generally faster than using a variable:
http://jsperf.com/length-in-a-variable
You can surly save the length of an array in a variable, and it performance better in some case. Especially if foo is not an real array but an list of DOM elements like you get it when calling document.getElementsByTagname().
If you need the value of foo.length in 8 different places, then yes, put it in a variable. But you're doing this to prevent redundancy foremost, not so much for efficiency.
Btw, the length property of array and array-like objects is not computed on retrieval. It is populated with a Number value at all times. The JavaScript engine updates its value automatically. (For instance, when you add a new array element to the array.) So, foo.length is an ordinary property retrieval - no methods or algorithms are executed internally on the array object.
Perhaps a function will help?
// returns true if the specified len is greater than the obj.length
// returns false of the object doesn't have a length property or if
// the len is <= the obj.length
function checkLen(obj, len) {
try {
if(obj.length > len) return true;
} catch(e) {
console.log(e);
}
return false;
}

If I set only a high index in an array, does it waste memory?

In Javascript, if I do something like
var alpha = [];
alpha[1000000] = 2;
does this waste memory somehow? I remember reading something about Javascript arrays still setting values for unspecified indices (maybe sets them to undefined?), but I think this may have had something to do with delete. I can't really remember.
See this topic:
are-javascript-arrays-sparse
In most implementations of Javascript (probably all modern ones) arrays are sparse. That means no, it's not going to allocate memory up to the maximum index.
If it's anything like a Lua implementation there is actually an internal array and dictionary. Densely populated parts from the starting index will be stored in the array, sparse portions in the dictionary.
This is an old myth. The other indexes on the array will not be assigned.
When you assign a property name that is an "array index" (e.g. alpha[10] = 'foo', a name that represents an unsigned 32-bit integer) and it is greater than the current value of the length property of an Array object, two things will happen:
The "index named" property will be created on the object.
The length will be incremented to be that index + 1.
Proof of concept:
var alpha = [];
alpha[10] = 2;
alpha.hasOwnProperty(0); // false, the property doesn't exist
alpha.hasOwnProperty(9); // false
alpha.hasOwnProperty(10); // true, the property exist
alpha.length; // 11
As you can see, the hasOwnProperty method returns false when we test the presence of the 0 or 9 properties, because they don't exist physically on the object, whereas it returns true for 10, the property was created.
This misconception probably comes from popular JS consoles, like Firebug, because when they detect that the object being printed is an array-like one, they will simply make a loop, showing each of the index values from 0 to length - 1.
For example, Firebug detects array-like objects simply by looking if they have a length property whose its value is an unsigned 32-bit integer (less than 2^32 - 1), and if they have a splice property that is a function:
console.log({length:3, splice:function(){}});
// Firebug will log: `[undefined, undefined, undefined]`
In the above case, Firebug will internally make a sequential loop, to show each of the property values, but no one of the indexes really exist and showing [undefined, undefined, undefined] will give you the false sensation that those properties exist, or that they were "allocated", but that's not the case...
This has been like that since ever, it's specified even of the ECMAScript 1st Edition Specification (as of 1997), you shouldn't worry to have implementation differences.
About a year ago, I did some testing on how browsers handle arrays (obligatory self-promotional link to my blog post.) My testing was aimed more at CPU performance than at memory consumption, which is much harder to measure. The bottom line, though, was that every browser I tested with seemed to treat sparse arrays as hash tables. That is, unless you initialized the array from the get-go by putting values in consecutive indexes (starting from 0), the array would be implemented in a way that seemed to optimize for space.
So while there's no guarantee, I don't think that setting array[100000] will take any more room than setting array[1] -- unless you also set all the indexes leading up to those.
I dont think so because javascript treats arrays kinda like dictionaries, but with integer keys.
alpha[1000000] = alpha["1000000"]
I don't really know javascript, but it would be pretty odd behaviour if it DIDN'T allocate space for the entire array. Why would you think it wouldn't take up space? You're asking for a huge array. If it didn't give it to you, that would be a specific optimisation.
This obviously ignores OS optimisations such as memory overcommit and other kernel and implementation specifics.

Categories