Readibility of anonymous closures in nested loops - javascript

A friend of mine got bitten by the all too famous 'anonymous functions in loop' javascript issues. (It's been explained to death on SO, and I'm actually expecting someone to my question as a duplicate, which would probably be fair game).
The issue amounts to what John Resig explained in this tutorial :
http://ejohn.org/apps/learn/#62
var count = 0;
for ( var i = 0; i < 4; i++ ) {
setTimeout(function(){
assert( i == count++, "Check the value of i." );
}, i * 200);
}
To a new user it should work, but indeed "i always have the same values", they say, hence crying and teeth gnashing.
I explained the issue with a lot of hand waving and some stuff about scopes, and pointed him to some solutions provided on SO or other sites (really, when you know it is such a common issue, google is your friend).
Of course the real answer is that in JS the scope is at the function level. So when the anonymous functions run, 'i' is not defined in the scope of any of them, but it is defined in the global scope, and it has the value of the end of the loop, 4.
Since we've all most likely be trained in languages that use block-level scope, it's yet-another-thing-that-js-does-a-little-bit-different-than-the-rest-of-the-world (meaning of "this", anyone ?)
What bogs me is that the common answer, actually even the one provided by John himself is the following :
var count = 0;
for ( var i = 0; i < 4; i++ ) (function(i){
setTimeout(function(){
assert( i == count++, "Check the value of i." );
}, i * 200);
})(i);
Which obviously works and demonstrate mastery of the language and a taste for nested parenthesis that suspiciously makes you look like a LISP-er.
And yet I can't help thinking that other solutions would be much more readible and easier to explain.
This one is just pushing the anonymous closure a bit closure closer to the setTimeout (or closer to the addEventListener is 99.9% of the case where it bites someone) :
var count = 0;
for (var i = 0 ; i < 4 ; i++) {
setTimeout((function (index) {
return function () {
assert( index == count++, "Check the value of i." );
}
})(i), i*200);
};
This one is about explaining what we're doing, with an explicit function factory :
var count = 0
function makeHandler(index) {
return function() {
assert(index == count ++);
};
};
for (var i = 0 ; i < 4 ; i++) {
setTimeout(makeHandler(i), i*200);
};
Finally, there is another solution that removes the problem altogether, and would seem even more natural to me (although I agree that it somehow sidesteps the problem 'by accident')
var count = 0;
function prepareTimeout(index) {
setTimeout(function () {
assert(index == count++);
}, index * 200);
};
for (var k = 0; k < 4 ; k++) {
prepareTimeout(k);
};
Are those solutions entirely equivalent, in terms of memory usage, number of scope created, possible leaking ?
Sorry if this is really a FAQ or subjective or whatever.

Solution #1
for (var i = 0; i < 4; i++) (function (i) {
// scope #1
setTimeOut(function () {
// scope #2
})(i), i*200);
})(i);
is actually quite nice as it reduces indentation and therefore perceived code complexity. On the other hand the for loop does not have its own block which is something jsLint (rightfully, in my opinion) would complain about.
Solution #2
for (var i = 0; i < 4; i++) {
setTimeout((function (i) {
// scope #1
return function () {
// scope #2
}
})(i), i*200);
};
Is the way I would do it most of the time. The for loop has an actual block, which I find increases readability and falls more into place with what you'd expect from a conventional loop (as in "for x do y", opposed to the "for x create anonymous function and execute it right away" from #1). But that's in the eye of the beholder, really, the more experience you have, the more these approaches start to look the same for you.
Solution #3
function makeHandler(index) {
// scope #1
return function() {
// scope #2
};
};
for (var i = 0 ; i < 4 ; i++) {
setTimeout(makeHandler(i), i*200);
};
as you say makes things clearer in the for loop itself. The human mind can adapt easier to named blocks that do something predefined than to a bunch of nested anonymous functions.
function prepareTimeout(index) {
// scope #1
setTimeout(function () {
// scope #2
}, index * 200);
};
for (var k = 0; k < 4 ; k++) {
prepareTimeout(k);
};
is the absolute same thing. I don't see any "issue sidestepping" here, it's just equivalent to #3.
As I see it, the approaches do not differ in the least bit, semantically - only syntactically. Sometimes there are reasons to prefer one over the other (the "function factory" approach is very re-usable, for example), but that doesn't apply to the standard situation you describe.
In any case, there are three concepts that a new user of JavaScript must get his head around:
functions are objects and can be passed around like integers (and therefore they don't need a name)
function scope and preservation of scope (how closures work)
how asynchronicity works
Once these concepts have sunken in, there will be a point at which you no longer see such a big difference in those approaches. They are just different ways to "put it". Until then you simply choose the one you are most comfortable with.
EDIT: You could argue that that point is when you transition from a "you must do it like this" attitude to a "you could do it like this, or like this, or like this" attitude. This applies to programming languages, cooking and pretty much anything else.
To say something about the implicit question in the title: Readability is also in the eye of the beholder. Ask any Perl programmer. Or someone comfortable with regular expressions.

In my opinion the last pattern (using a predefined named function) is the most readable, 'debugable' and on top of that the most usable in my editors (KomodoEdit or Visual Studio in combination with Resharper 6.0), where it's easy to jump to the function definition from the function call. It just asks a bit more discipline in coding.

Related

Is this is a valid pattern and what is it called?

I find myself writing the following JavaScript more and more and I would like to know if this is a common pattern and if so, what is it called?
Part of the code and pattern:
var fruits = ["pear", "apple", "banana"];
var getNextFruit = function() {
var _index = 0,
_numberOfFruits = fruits.length;
getNextFruit = function() {
render(fruits[_index]);
_index = (_index + 1) % _numberOfFruits;
}
getNextFruit();
};
I have a function which takes no parameters, inside the function I redefine the function and immediately call it. In a functional language this might be a function being returned, JavaScript just makes it easier because you can reuse the name of the function. Thus you are able to extend the functionality without having to change your implementation.
I can also imagine this pattern to be very useful for memoization where your "cache" is the state we wrap around.
I even sometimes implement this with a get or a set method on the function where I can get the state if it's meaningful. The added fiddle shows an example of this.
Because this is a primarily JavaScript oriented question: The obligatory fiddle
I have a function which takes no parameters, inside the function I redefine the function and immediately call it.
Is this is a valid pattern and what is it called?
A function redefining itself is usually an antipattern, as it complicates stuff a lot. Yes, it sometimes can be more efficient to swap out the whole function than to put an if (alreadyInitialised) condition inside the function, but it's very rarely worth it. When you need to optimise performance, you can try and benchmark both approaches, but otherwise the advice is to keep it as simple as you can.
The pattern "initialises itself on the first call" is known as laziness for pure computations (in functional programming) and as a singleton for objects (in OOP).
However, most of the time there's no reason to defer the initialisation of the object/function/module whatever until it is used for the first time. The ressources taken for it (both time and memory) are insignificant, especially when you are sure that you will need it in your program at least once. For that, use an IIFE in JavaScript, which is also known as the module pattern when creating an object.
Creating a function via a closure is a pretty common pattern in JavaScript. I would personally do that differently:
var fruits = ["pear", "apple", "banana"];
var getNextFruit = function(fruits) {
var index = 0,
numberOfFruits = fruits.length;
function getNextFruit() {
render(fruits[_index]);
index = (_index + 1) % numberOfFruits;
}
return getNextFruit;
}(fruits);
There's no good reason (in my opinion) to clutter up the variable names with leading underscores because they're private to the closure anyway. The above also does not couple the workings of the closure with the external variable name. My version can be made a reusable service:
function fruitGetter(fruits) {
var index = 0, numberOfFruits = fruits.length;
function getNextFruit() {
render(fruits[_index]);
index = (_index + 1) % numberOfFruits;
}
return getNextFruit;
}
// ...
var getNextFruit = fruitGetter(someFruits);
var otherFruits = fruitGetter(["kumquat", "lychee", "mango"]);

Javascript, counters and real-time applications

I'm currently working on a game and i have decided to go with javascript to make a prototype. During the development i noticed that i use a lot of counters in this way:
function update() {
for(var i = 0; i < n; i++) {
// do stuff
}
for(var i = 0; i < m; i++) {
// do other stuff
}
}
Keeping in mind that this is a real time application so the update function is executed almost 60 times per second, we can say that i'm creating a lot of variables. I was wondering how that piece of code would affect performance (does the javascript engine make some optimization here?) and how the garbage collector behaves in this situation (i don't even know how the GC manages primitive types...).
For now i changed the code to look like this:
var counters = {};
function update() {
for(counters['taskA'] = 0; counters['taskA'] < n; counters['taskA']++) {
// do stuff
}
for(counters['taskB'] = 0; counters['taskB'] < m; counters['taskB']++) {
// do other stuff
}
}
Does this code make any difference?
There shouldn't be any significant performance difference. However, the counters variable will not be garbage collected if its on the global scope. It will only get GCed when it goes out of scope so if its within another function that will be mostly fine.
On your first example the i variables definitely get GCed as there within the update function.

Is it bad practice to use the same variable name in multiple for-loops?

I was just linting some JavaScript code using JSHint. In the code I have two for-loops both used like this:
for (var i = 0; i < somevalue; i++) { ... }
So both for-loops use the var i for iteration.
Now JSHint shows me an error for the second for-loop: "'i' is already defined". I can't say that this isn't true (because it obviously is) but I always thought this wouldn't matter as the var i is only used in that specific place.
Is it bad practice to use for-loops this way? Should I use a different variable for each for-loop in my code like
//for-loop 1
for (var i = 0; ...; i++) { ... }
//for-loop 2
for (var j = 0; ...; j++) { ... }
Or is this on e of the errors I can ignore (because it doesn't break my code, it still does what it is supposed to do)?
JSLint btw. stops validating at the first for loop because I don't define var i at the top of the function (that's why I switched to JSHint in the first place). So according to the example in this question: Should I use JSLint or JSHint JavaScript validation? – I should use for-loops like this anyway to confirm JSLint:
...
var i;
...
//for-loop 1
for (i = 0; ...; i++) { ... }
...
//for-loop 2
for (i = 0; ...; i++) { ... }
This also looks good to me, because this way I should avoid both errors in JSLint and JSHint. But what I am uncertain about is if I should use a different variable for each for-loop like this:
...
var i, j;
...
//for-loop 1
for (i = 0; ...; i++) { ... }
//for-loop 2
for (j = 0; ...; j++) { ... }
So is there a best practice for this or could I just go with any of the code above, meaning I choose "my" best practice?
Since variable declarations are hoisted to the top of the scope in which they appear the interpreter will effectively interpret both versions in the same way. For that reason, JSHint and JSLint suggest moving the declarations out of the loop initialiser.
The following code...
for (var i = 0; i < 10; i++) {}
for (var i = 5; i < 15; i++) {}
... is effectively interpreted as this:
var i;
for (i = 0; i < 10; i++) {}
for (i = 5; i < 15; i++) {}
Notice that there is really only one declaration of i, and multiple assignments to it - you can't really "redeclare" a variable in the same scope.
To actually answer your question...
is there a best practice for this or could I just go with any of the code above?
There are varying opinions on how best to handle this. Personally, I agree with JSLint and think the code is clearer when you declare all variables together at the top of each scope. Since that's how the code will be interpreted, why not write code that looks as it behaves?
But, as you've observed, the code will work regardless of the approach taken so it's a style/convention choice, and you can use whichever form you feel most comfortable with.
It has been mentioned only in the comment by #TSCrowder: If your environment supports it (Firefox, Node.js), in ES6 you can use let declaration
//for-loop 1
for (let i = 0; ...; i++) { ... }
//for-loop 2
for (let i = 0; ...; i++) { ... }
which limits the scope to within the for-loop. Bonus: JSHint stops complaining.
Variables in javascript are function scoped (not block scoped).
When you define var i in a loop, it remains there in loop and also in the function having that loop.
See below,
function myfun() {
//for-loop 1
for (var i = 0; ...; i++) { ... }
// i is already defined, its scope is visible outside of the loop1.
// so you should do something like this in second loop.
for (i = 0; ...; j++) { ... }
// But doing such will be inappropriate, as you will need to remember
// if `i` has been defined already or not. If not, the `i` would be global variable.
}
The reason JSHint shows the error is because in JS variable scope is function and variable declarations are hoisted to the top of the function.
In Firefox you can use let keyword to define block scope, but is not currently supported by other browsers.
The let keyword is included ECMAScript 6 specification.
I know this question has been answered, but if you want super for loops, write them like this:
var names = ['alex','john','paul','nemo'],
name = '',
idx = 0,
len = names.length;
for(;idx<len;++idx)
{
name = names[idx];
// do processing...
}
A couple of things going on here...
The array length is being stored in len. This stops JS evaluating names.length every iteration
The idx increment is a PRE-INCREMENT (e.g. ++idx NOT idx++). Pre-increments are natively faster than Post-increments.
The storing of a reference to name. This is optional but recommended if you'll be using the name variable a lot. Every call to names[idx] requires finding the index in the array. Whether this search be a linear search, tree search or hash table, the find is still happening. So store a reference in another variable to reduce lookups.
Finally, this is just my personal preference, and I have no proof or any performance benefits. However I always like initialising variables to the type they're going to be e.g. name = '',.
The best practice is to reduce the scope of variables, so the best way to declare iteration variable for the loops is
//for-loop 1
for (var i = 0; ...; i++) { ... }
//for-loop 2
for (var j = 0; ...; j++) { ... }
I know the scope of the variables declared with var but I am taking about code readability here.

Why can't I assign for loop to a variable?

So I am just wondering why the following code dosen't work. I am looking for a similar strategy to put the for loop in a variable.
var whatever = for (i=1;i<6;i++) {
console.log(i)
};
Thanks!
Because a for loop is a statement and in JavaScript statements don't have values. It's simply not something provided for in the syntax and semantics of the language.
In some languages, every statement is treated as an expression (Erlang for example). In others, that's not the case. JavaScript is in the latter category.
It's kind-of like asking why horses have long stringy tails and no wings.
edit — look into things like the Underscore library or the "modern" add-ons to the Array prototype for "map" and "reduce" and "forEach" functionality. Those allow iterative operations in an expression evaluation context (at a cost, of course).
I suppose what you look for is function:
var whatever = function(min, max) {
for (var i = min; i < max; ++i) {
console.log(i);
}
}
... and later ...
whatever(1, 6);
This approach allows you to encapsulate the loop (or any other code, even declaring another functions) within a variable.
Your issue is that for loops do not return values. You could construct an array with enough elements to hold all the iterations of your loop, then assign to it within the loop:
arry[j++] = i;
You can do this, but it seems that you might want to check out anonymous functions. With an anonymous function you could do this:
var whatever = function(){
for (var i=1;i<6;i++) {
console.log(i);
}
};
and then
whatever(); //runs console.log(i) i times.

Is not having local functions a micro optimisation?

Would moving the inner function outside of this one so that its not created everytime the function is called be a micro-optimisation?
In this particular case the doMoreStuff function is only used inside doStuff. Should I worry about having local functions like these?
function doStuff() {
var doMoreStuff = function(val) {
// do some stuff
}
// do something
for (var i = 0; i < list.length; i++) {
doMoreStuff(list[i]);
for (var j = 0; j < list[i].children.length; j++) {
doMoreStuff(list[i].children[j]);
}
}
// do some other stuff
}
An actaul example would be say :
function sendDataToServer(data) {
var callback = function(incoming) {
// handle incoming
}
ajaxCall("url", data, callback);
}
Not sure if this falls under the category "micro-optimization". I would say no.
But it depends on how often you call doStuff. If you call it often, then creating the function over and over again is just unnecessary and will definitely add overhead.
If you don't want to have the "helper function" in global scope but avoid recreating it, you can wrap it like so:
var doStuff = (function() {
var doMoreStuff = function(val) {
// do some stuff
}
return function() {
// do something
for (var i = 0; i < list.length; i++) {
doMoreStuff(list[i]);
}
// do some other stuff
}
}());
As the function which is returned is a closure, it has access to doMoreStuff. Note that the outer function is immediately executed ( (function(){...}()) ).
Or you create an object that holds references to the functions:
var stuff = {
doMoreStuff: function() {...},
doStuff: function() {...}
};
More information about encapsulation, object creation patterns and other concepts can be found in the book JavaScript Patterns.
For optimal speed with a nested function (function within internal scope of an outer function), I suspect you should use declarations, not expressions.
The question asks about "local functions" and optimization, but doesn't specify how the local functions are created. But it should, because the question's answer probably is different for the different techniques by which the "inner function" can be created.
Looking at the answer and test results by #cleong, I suspect that only his answer is using the optimal technique for function creation. There are three ways to create a function, and #cleong is showing us the one that provides fast execution. The three techniques are:
constructor
declaration
expression
Constructor isn't used much, it requires a string that has the text of the function body. This would be useful in reflective programming, where you do a "toString()" to get the function body, modify, then construct a new function. And that, of course, is more-or-less never done.
Declaration is used, but mostly for outer functions, not inner functions (by "inner function" I mean a function nested within another). Yet, based upon #cleong tests, it seems to be very fast; just as fast as an outer function.
Expressions are what everyone uses. This might not be the best idea; but it's what everyone does.
One major difference between function declarations and function expressions is that the declarations are subject to hoisting. Everyone knows that "var" declarations are hoisted; but so are "function" declarations. For things that are hoisted, computations are performed at compile time to determine the memory space that will be needed for the thing. Presumably, one would expect that the inner function is compiled at compile time, and can run much as would a compiled outer function.
I have a copy of Flannigan's "The Definitive Guide" book from about six years ago, and I remember reading the reverse of what I just wrote here. He said something like: expressions are compiled, and declarations are not. While he is the world's "definitive guide" to JavaScript, I have always suspected he might have gotten this one mixed up and backwards. I suspect that function inner declarations are more "ready to go" than are function expressions. The test results on this stackOverflow page seem to confirm my long held suspicions.
Looking at the #cleong test results, it just seems that declaration, not expression, is the way to go for inner functions, if optimal execution speed is a concern.
The original question was asked in 2011. Given the rise of Node.js since then, I thought it's worth revisiting the issue. In a server environment, a few milliseconds here and there can matter a lot. It could be difference between remaining responsive under load or not.
While inner functions are nice conceptually, they can pose problems for the JavaScript engine's code optimizer. The following example illustrate this:
function a1(n) {
return n + 2;
}
function a2(n) {
return 2 - n;
}
function a() {
var k = 5;
for (var i = 0; i < 100000000; i++) {
k = a1(k) + a2(k);
}
return k;
}
function b() {
function b1(n) {
return n + 2;
}
function b2(n) {
return 2 - n;
}
var k = 5;
for (var i = 0; i < 100000000; i++) {
k = b1(k) + b2(k);
}
return k;
}
function measure(label, fn) {
var s = new Date();
var r = fn();
var e = new Date();
console.log(label, e - s);
}
for (var i = 0; i < 4; i++) {
measure('A', a);
measure('B', b);
}
The command for running the code:
node --trace_deopt test.js
The output:
[deoptimize global object # 0x2431b35106e9]
A 128
B 130
A 132
[deoptimizing (DEOPT eager): begin 0x3ee3d709a821 b (opt #5) #4, FP to SP delta: 72]
translating b => node=36, height=32
0x7fffb88a9960: [top + 64] <- 0x2431b3504121 ; rdi 0x2431b3504121 <undefined>
0x7fffb88a9958: [top + 56] <- 0x17210dea8376 ; caller's pc
0x7fffb88a9950: [top + 48] <- 0x7fffb88a9998 ; caller's fp
0x7fffb88a9948: [top + 40] <- 0x3ee3d709a709; context
0x7fffb88a9940: [top + 32] <- 0x3ee3d709a821; function
0x7fffb88a9938: [top + 24] <- 0x3ee3d70efa71 ; rcx 0x3ee3d70efa71 <JS Function b1 (SharedFunctionInfo 0x361602434ae1)>
0x7fffb88a9930: [top + 16] <- 0x3ee3d70efab9 ; rdx 0x3ee3d70efab9 <JS Function b2 (SharedFunctionInfo 0x361602434b71)>
0x7fffb88a9928: [top + 8] <- 5 ; rbx (smi)
0x7fffb88a9920: [top + 0] <- 0 ; rax (smi)
[deoptimizing (eager): end 0x3ee3d709a821 b #4 => node=36, pc=0x17210dec9129, state=NO_REGISTERS, alignment=no padding, took 0.203 ms]
[removing optimized code for: b]
B 1000
A 125
B 1032
A 132
B 1033
As you can see, function A and B ran at the same speed initially. Then for some reason a deoptimization event occurred. From then on B is nearly an order of magnitude slower.
If you're writing code where performance is importantly, it's best to avoid inner functions.
It completely depends on how often the function is called. If it's a OnUpdate function that is called 10 times per second it is a decent optimalisation. If it's called three times per page, it is a micro optimalisation.
Though handy, nested function definitions are never needed (they can be replaced by extra arguments for the function).
Example with nested function:
function somefunc() {
var localvar = 5
var otherfunc = function() {
alert(localvar);
}
otherfunc();
}
Same thing, now with argument instead:
function otherfunc(localvar) {
alert(localvar);
}
function somefunc() {
var localvar = 5
otherfunc(localvar);
}
It is absolutely a micro-optimization. The whole reason for having functions in the first place is so that you make your code cleaner, more maintainable and more readable. Functions add a semantic boundary to sections of code. Each function should only do one thing, and it should do it cleanly. So if you find your functions performing multiple things at the same time, you've got a candidate for refactoring it into multiple routines.
Only optimize when you've got something working that's too slow (If it's not working yet, it's too early to optimize. Period). Remember, nobody ever paid extra for a program that was faster than their needs/requirements...
Edit: Considering that the program isn't finished yet, it's also a premature optimization. Why is that bad? Well, first you're spending time working on something that may not matter in the long run. Second, you don't have a baseline to see if your optimizations improved anything in a realistic sense. Third, you're reducing maintainability and readability before you've even got it running, so it'll be harder to get running than if you went with clean concise code. Fourth, you don't know if you'll need doMoreStuff somewhere else in the program until you've finished it and understand all your needs (perhaps a longshot depending on the exact details, but not outside the realm of possibility).
There's a reason that Donnald Knuth said Premature optimization is the root of all evil...
A quick "benchmark" run on an average PC (i know there are lots of unaccounted-for variables, so dont comment on the obvious, but it's interesting in any case):
count = 0;
t1 = +new Date();
while(count < 1000000) {
p = function(){};
++count;
}
t2 = +new Date();
console.log(t2-t1); // milliseconds
It could be optimised by moving the increment to the condition for example (brings running time down by about 100 milliseconds, although it doesn't affect the difference between with and without function creation, so it isn't really relevant)
Running 3 times gave:
913
878
890
Then comment out the function creation line, 3 runs gave:
462
458
464
So purely on 1000,000 empty function creations you add about half a second. Even assuming your original code is running 10 times a second on a handheld device (let's say that devices overall performance is 1/100 of this laptop, which is exaggerated - it's probably closer to 1/10, although will provide a nice upper bound), that's equivalent to 1000 function creations/sec on this computer, which happens in 1/2000 of a second. So every second the handheld device is adding overhead of 1/2000 second of processing... half a millisecond every second isn't very much.
From this primitive test I would conclude that on a PC this is definitely a micro-optimisation, and if you're developing for weaker devices, it is almost certainly as well.

Categories