I've been trying to make an experiment to see if the local variables in functions are stored on a stack.
So I wrote a little performance test
function test(fn, times){
var i = times;
var t = Date.now()
while(i--){
fn()
}
return Date.now() - t;
}
ene
function straight(){
var a = 1
var b = 2
var c = 3
var d = 4
var e = 5
a = a * 5
b = Math.pow(b, 10)
c = Math.pow(c, 11)
d = Math.pow(d, 12)
e = Math.pow(e, 25)
}
function inversed(){
var a = 1
var b = 2
var c = 3
var d = 4
var e = 5
e = Math.pow(e, 25)
d = Math.pow(d, 12)
c = Math.pow(c, 11)
b = Math.pow(b, 10)
a = a * 5
}
I expected to get inversed function work much faster. Instead an amazing result came out.
Untill I test one of the functions it runs 10 times faster than after testing the second one.
Example:
> test(straight, 10000000)
30
> test(straight, 10000000)
32
> test(inversed, 10000000)
390
> test(straight, 10000000)
392
> test(inversed, 10000000)
390
Same behaviour when tested in alternative order.
> test(inversed, 10000000)
25
> test(straight, 10000000)
392
> test(inversed, 10000000)
394
I've tested it both in the Chrome browser and in Node.js and I've got absolutely no clue why would it happen.
The effect lasts till I refresh the current page or restart Node REPL.
What could be a source of such significant (~12 times worse) performance?
PS. Since it seems to work only in some environemnts please write the environment You're using to test it.
Mine were:
OS: Ubuntu 14.04
Node v0.10.37
Chrome 43.0.2357.134 (Official Build) (64-bit)
/Edit
On Firefox 39 it takes ~5500 ms for each test regardless of the order. It seems to occur only on specific engines.
/Edit2
Inlining the function to the test function makes it run always the same time.
Is it possible that there is an optimization that inlines the function parameter if it's always the same function?
Once you call test with two different functions fn() callsite inside it becomes megamorphic and V8 is unable to inline at it.
Function calls (as opposed to method calls o.m(...)) in V8 are accompanied by one element inline cache instead of a true polymorphic inline cache.
Because V8 is unable to inline at fn() callsite it is unable to apply a variety of optimizations to your code. If you look at your code in IRHydra (I uploaded compilation artifacts to gist for your convinience) you will notice that first optimized version of test (when it was specialized for fn = straight) has a completely empty main loop.
V8 just inlined straight and removed all the code your hoped to benchmark with Dead Code Elimination optimization. On an older version of V8 instead of DCE V8 would just hoist the code out of the loop via LICM - because the code is completely loop invariant.
When straight is not inlined V8 can't apply these optimizations - hence the performance difference. Newer version of V8 would still apply DCE to straight and inversed themselves turning them into empty functions
so the performance difference is not that big (around 2-3x). Older V8 was not aggressive enough with DCE - and that would manifest in bigger difference between inlined and not-inlined cases, because peak performance of inlined case was solely result of aggressive loop-invariant code motion (LICM).
On related note this shows why benchmarks should never be written like this - as their results are not of any use as you end up measuring an empty loop.
If you are interested in polymorphism and its implications in V8 check out my post "What's up with monomorphism" (section "Not all caches are the same" talks about the caches associated with function calls). I also recommend reading through one of my talks about dangers of microbenchmarking, e.g. most recent "Benchmarking JS" talk from GOTO Chicago 2015 (video) - it might help you to avoid common pitfalls.
You're misunderstanding the stack.
While the "real" stack indeed only has the Push and Pop operations, this doesn't really apply for the kind of stack used for execution. Apart from Push and Pop, you can also access any variable at random, as long as you have its address. This means that the order of locals doesn't matter, even if the compiler doesn't reorder it for you. In pseudo-assembly, you seem to think that
var x = 1;
var y = 2;
x = x + 1;
y = y + 1;
translates to something like
push 1 ; x
push 2 ; y
; get y and save it
pop tmp
; get x and put it in the accumulator
pop a
; add 1 to the accumulator
add a, 1
; store the accumulator back in x
push a
; restore y
push tmp
; ... and add 1 to y
In truth, the real code is more like this:
push 1 ; x
push 2 ; y
add [bp], 1
add [bp+4], 1
If the thread stack really was a real, strict stack, this would be impossible, true. In that case, the order of operations and locals would matter much more than it does now. Instead, by allowing random access to values on the stack, you save a lot of work for both the compilers, and the CPU.
To answer your actual question, I'm suspecting neither of the functions actually does anything. You're only ever modifying locals, and your functions aren't returning anything - it's perfectly legal for the compiler to completely drop the function bodies, and possibly even the function calls. If that's indeed so, whatever performance difference you're observing is probably just a measurement artifact, or something related to the inherent costs of calling a function / iterating.
Inlining the function to the test function makes it run always the same time.
Is it possible that there is an optimization that inlines the function parameter if it's always the same function?
Yes, this seems to be exactly what you are observing. As already mentioned by #Luaan, the compiler likely drops the bodies of your straight and inverse functions anyway because they are not having any side effects, but only manipulating some local variables.
When you are calling test(…, 100000) for the first time, the optimising compiler realises after some iterations that the fn() being called is always the same, and does inline it, avoiding the costly function call. All that it does now is 10 million times decrementing a variable and testing it against 0.
But when you are calling test with a different fn then, it has to de-optimise. It may later do some other optimisations again, but now knowing that there are two different functions to be called it cannot inline them any more.
Since the only thing you're really measuring is the function call, that leads to the grave differences in your results.
An experiment to see if the local variables in functions are stored on a stack
Regarding your actual question, no, single variables are not stored on a stack (stack machine), but in registers (register machine). It doesn't matter in which order they are declared or used in your function.
Yet, they are stored on the stack, as part of so called "stack frames". You'll have one frame per function call, storing the variables of its execution context. In your case, the stack might look like this:
[straight: a, b, c, d, e]
[test: fn, times, i, t]
…
Related
I discovered a weird behavior in nodejs/chrome/v8. It seems this code:
var x = str.charCodeAt(5);
x = str.charCodeAt(5);
is faster than this
var x = str.charCodeAt(5); // x is not greater than 170
if (x > 170) {
x = str.charCodeAt(5);
}
At first I though maybe the comparison is more expensive than the actual second call, but when the content inside the if block is not calling str.charCodeAt(5) the performance is the same as with a single call.
Why is this? My best guess is v8 is optimizing/deoptimizing something, but I have no idea how to exactly figure this out or how to prevent this from happening.
Here is the link to jsperf that demonstrates this behavior pretty well at least on my machine:
https://jsperf.com/charcodeat-single-vs-ifstatment/1
Background: The reason i discovered this because I tried to optimize the token reading inside of babel-parser.
I tested and str.charCodeAt() is double as fast as str.codePointAt() so I though I can replace this code:
var x = str.codePointAt(index);
with
var x = str.charCodeAt(index);
if (x >= 0xaa) {
x = str.codePointAt(index);
}
But the second code does not give me any performance advantage because of above behavior.
V8 developer here. As Bergi points out: don't use microbenchmarks to inform such decisions, because they will mislead you.
Seeing a result of hundreds of millions of operations per second usually means that the optimizing compiler was able to eliminate all your code, and you're measuring empty loops. You'll have to look at generated machine code to see if that's what's happening.
When I copy the four snippets into a small stand-alone file for local investigation, I see vastly different performance results. Which of the two are closer to your real-world use case? No idea. And that kind of makes any further analysis of what's happening here meaningless.
As a general rule of thumb, branches are slower than straight-line code (on all CPUs, and with all programming languages). So (dead code elimination and other microbenchmarking pitfalls aside) I wouldn't be surprised if the "twice" case actually were faster than either of the two "if" cases. That said, calling String.charCodeAt could well be heavyweight enough to offset this effect.
I usually prefer polymorphism instead of switch when it possible. I find it more readable and it requires fewer lines. I believe these facts are enough to continue to use it. But what about performance? I've create a pretty simple (and bad) bench and it looks that switch is faster in my case. Could you please explain why?
https://jsfiddle.net/oqzpfqcg/1/
var class1 = { GetImportantValue: () => 1 };
var class2 = { GetImportantValue: () => 2 };
var class3 = { GetImportantValue: () => 3 };
var class4 = { GetImportantValue: () => 4 };
var class5 = { GetImportantValue: () => 5 };
getImportantValueSwitch = (myClassEnum) => {
switch (myClassEnum.type) {
case 'MyClass1': return 1;
case 'MyClass2': return 2;
case 'MyClass3': return 3;
case 'MyClass4': return 4;
case 'MyClass5': return 5;
}
}
getImportantValuePolymorphism = (myClass) => myClass.GetImportantValue();
test = () => {
var INTERATION_COUNT = 10000000;
var t0 = performance.now();
for (var i = 0; i < INTERATION_COUNT; i++) {
getImportantValuePolymorphism(class1);
getImportantValuePolymorphism(class2);
getImportantValuePolymorphism(class3);
getImportantValuePolymorphism(class4);
getImportantValuePolymorphism(class5);
}
var t1 = performance.now();
var t2 = performance.now();
for (var i = 0; i < INTERATION_COUNT; i++) {
getImportantValueSwitch({type: 'MyClass1'});
getImportantValueSwitch({type: 'MyClass2'});
getImportantValueSwitch({type: 'MyClass3'});
getImportantValueSwitch({type: 'MyClass4'});
getImportantValueSwitch({type: 'MyClass5'});
}
var t3 = performance.now();
var first = t1 - t0;
var second = t3 - t2;
console.log("The first sample took " + first + " ms");
console.log("The second sample took " + second + " ms");
console.log("first / second = " + (first/second));
};
test();
So as far as I understand the first sample has one dynamic/virtual runtime call myClass.GetImportantValue() and that's it. But the second has one dynamic/virtual runtime call as well myClassEnum.type and then check the condition in the switch.
Most probably I have some mistakes in the code but I cannot find it. The only thing that I suppose can affect result is performance.now(). But I think it does not affect so much.
V8 developer here. Your intuition is right: this microbenchmark isn't very useful.
One issue is that all your "classes" have the same shape, so the "polymorphic" case is in fact monomorphic. (If you fix this, be aware that V8 has vastly different performance characteristics for <= 4 and >= 5 polymorphic cases!)
One issue is that you're relying on on-stack replacement (OSR) for optimization, so the performance impact of that pollutes your timings in a misleading way -- especially for functions that have this pattern of two subsequent long-running loops: they get OSR-optimized for the first loop, deoptimized in the middle, then OSR-optimized again for the second loop.
One issue is that the compiler inlines many things, so the actually executed machine code can have a very different structure from the JavaScript code you wrote. In particular in this case, getImportantValueSwitch gets inlined, the {type: 'MyClass*'} constant object creations get elided, and the resulting code is just a few comparisons, which are very fast.
One issue is that with small functions, call overhead pretty much dominates everything else. V8's optimizing compiler doesn't currently do polymorphic inlining (because that's not always a win), so significant time is spent calling the () => 1 etc functions. That's unrelated to the fact that they're dynamically dispatched -- retrieving the right function from the object is pretty fast, calling it is what has the overhead. For larger functions, you wouldn't notice it much, but for almost-empty functions, it's quite significant compared to the switch-based alternative that doesn't do any calls.
Long story short: in microbenchmarks, one tends to measure weird effects unrelated to what one intended to measure; and in larger apps, most implementation details like this one don't have measurable impact. Write the code that makes sense to you (is readable, maintainable, etc), let the JavaScript engine worry about the rest! (Exception: Sometimes profiling indicates that your app has a particular bottleneck -- in such cases, hand-optimizing things can have big impact, but that's usually achieved by taking context into account and making the overall algorithm / control flow more efficient, rather than following simple rules of thumb like "prefer polymorphism over switch statements" (or the other way round).)
I do not see a "mistake" in your script. Although I really do not encourage performance testing this way, I might still be able to say a couple of things based on my intuition. I do not have solid, well tested results with control groups etc so take everything I say with a pinch of salt.
Now, for me it is quite normal to assume that the first option is gonna eat the dust of the second because there are couple of things more expensive than variable access in js:
object property access (presumably O(1) hash table, but still slower than variable access)
function call
If we count the function call and object access:
first case: 5 call [to getImportantValuePolymorphism] x (1 object access [to myClass] + 1 function call [to GetImportantValue] ===> TOTAL OF 10 function calls + 5 object access
second case: 5 call [to getImportantValueSwitch ] + 5 object access [to MyClassEnum] ===> TOTAL 5 function calls + 5 object access
One more things to mention, that in the first case, you have a function that calls another function so you end up with a scope chain. Net effect of this is minute, but still detrimental in means of performance.
If we account all the above factors, first will be slower. But how much? That is not easy to answer as it will depend on vendor implementations but in your case it about 25 times slower in chrome. Assuming we have double the function calls in the first case and a scope chain, one would expect it to be 2 or 3 times slower, but not 25.
This exponential decrease in performance I presume is due to the fact that you are starving the event loop, meaning that when you give a synchronous task to js, since it is single threaded, if the task is a cumbersome one, the event loop cannot proceed and gets stuck for good second or so. This question comes around when people see strange behavior of setTimeout or other async calls when they fire way off from the target time frame. This is as I said, due to the fact that previous synchronous task is taking way too long. In your case you have a synchronous for loop that iterates 10 million times.
To test my hypothesis, decrease the ITERATION_COUNT to 100000, that is 100 times less, you will see that in chrome, the ratio will decrase from ~20 to ~2. So the bottom line 1: Part of the inefficiency you observe is stemming from the fact that you are starving the event loop, but it still does not change the fact that first option is slower.
To test that function calls are indeed the bottle neck here, change the relevant parts of your script to this:
class1 = class1.GetImportantValue;
class2 = class2.GetImportantValue;
class3 = class3.GetImportantValue;
class4 = class4.GetImportantValue;
class5 = class5.GetImportantValue;
and for the test:
for (var i = 0; i < INTERATION_COUNT; i++) {
class1();
class2();
class3();
class4();
class5();
}
Resulting fiddle: https://jsfiddle.net/ibowankenobi/oqzpfqcg/2/
This time you will see that the first one is faster because it is (5 function calls) vs ( 5 function calls + 5 object access).
Given an array having .length 100 containing elements having values 0 to 99 at the respective indexes, where the requirement is to find element of of array equal to n : 51.
Why is using a loop to iterate from start of array to end faster than iterating both start to end and end to start?
const arr = Array.from({length: 100}, (_, i) => i);
const n = 51;
const len = arr.length;
console.time("iterate from start");
for (let i = 0; i < len; i++) {
if (arr[i] === n) break;
}
console.timeEnd("iterate from start");
const arr = Array.from({length: 100}, (_, i) => i);
const n = 51;
const len = arr.length;
console.time("iterate from start and end");
for (let i = 0, k = len - 1; i < len && k >= 0; i++, k--) {
if (arr[i] === n || arr[k] === n) break;
}
console.timeEnd("iterate from start and end");
jsperf https://jsperf.com/iterate-from-start-iterate-from-start-and-end/1
The answer is pretty obvious:
More operations take more time.
When judging the speed of code, you look at how many operations it will perform. Just step through and count them. Every instruction will take one or more CPU cycles, and the more there are the longer it will take to run. That different instructions take a different amount of cycles mostly does not matter - while an array lookup might be more costly than integer arithmetic, both of them basically take constant time and if there are too many, it dominates the cost of our algorithm.
In your example, there are few different types of operations that you might want to count individually:
comparisons
increments/decrements
array lookup
conditional jumps
(we could be more granular, such as counting variable fetch and store operations, but those hardly matter - everything is in registers anyway - and their number basically is linear to the others).
Now both of your code iterate about 50 times - they element on which they break the loop is in the middle of the array. Ignoring off-by-a-few errors, those are the counts:
| forwards | forwards and backwards
---------------+------------+------------------------
>=/===/< | 100 | 200
++/-- | 50 | 100
a[b] | 50 | 100
&&/||/if/for | 100 | 200
Given that, it's not unexpected that doing twice the works takes considerably longer.
I'll also answer a few questions from your comments:
Is additional time needed for the second object lookup?
Yes, every individual lookup counts. It's not like they could be performed at once, or optimised into a single lookup (imaginable if they had looked up the same index).
Should there be two separate loops for each start to end and end to start?
Doesn't matter for the number of operations, just for their order.
Or, put differently still, what is the fastest approach to find an element in an array?
There is no "fastest" regarding the order, if you don't know where the element is (and they are evenly distributed) you have to try every index. Any order - even random ones - would work the same. Notice however that your code is strictly worse, as it looks at each index twice when the element is not found - it does not stop in the middle.
But still, there are a few different approaches at micro-optimising such a loop - check these benchmarks.
let is (still?) slower than var, see Why is using `let` inside a `for` loop so slow on Chrome? and Why is let slower than var in a for loop in nodejs?. This tear-up and tear-down (about 50 times) of the loop body scope in fact does dominate your runtime - that's why your inefficient code isn't completely twice as slow.
comparing against 0 is marginally faster than comparing against the length, which puts looping backwards at an advantage. See Why is iterating through an array backwards faster than forwards, JavaScript loop performance - Why is to decrement the iterator toward 0 faster than incrementing and Are loops really faster in reverse?
in general, see What's the fastest way to loop through an array in JavaScript?: it changes from engine update to engine update. Don't do anything weird, write idiomatic code, that's what will get optimised better.
#Bergi is correct. More operations is more time. Why? More CPU clock cycles.
Time is really a reference to how many clock cycles it takes to execute the code.
In order to get to the nitty-gritty of that you need to look at the machine level code (like assembly level code) to find the true evidence. Each CPU (core?) clock cycle can execute one instruction, so how many instructions are you executing?
I haven't counted the clock cycles in a long time since programming Motorola CPUs for embedded applications. If your code is taking longer then it is in fact generating a larger instruction set of machine code, even if the loop is shorter or runs an equal amount of times.
Never forget that your code is actually getting compiled into a set of commands that the CPU is going to execute (memory pointers, instruction-code level pointers, interrupts, etc.). That is how computers work and its easier to understand at the micro controller level like an ARM or Motorola processor but the same is true for the sophisticated machines that we are running on today.
Your code simply does not run the way you write it (sounds crazy right?). It is run as it is compiled to run as machine level instructions (writing a compiler is no fun). Mathematical expression and logic can be compiled in to quite a heap of assembly, machine level code and that is up to how the compiler chooses to interpret it (it is bit shifting, etc, remember binary mathematics anyone?)
Reference:
https://software.intel.com/en-us/articles/introduction-to-x64-assembly
Your question is hard to answer but as #Bergi stated the more operations the longer, but why? The more clock cycles it takes to execute your code. Dual core, quad core, threading, assembly (machine language) it is complex. But no code gets executed as you have written it. C++, C, Pascal, JavaScript, Java, unless you are writing in assembly (even that compiles down to machine code) but it is closer to actual execution code.
A masters in CS and you will get to counting clock cycles and sort times. You will likely make you own language framed on machine instruction sets.
Most people say who cares? Memory is cheap today and CPUs are screaming fast and getting faster.
But there are some critical applications where 10 ms matters, where an immediate interrupt is needed, etc.
Commerce, NASA, a Nuclear power plant, Defense Contractors, some robotics, you get the idea . . .
I vote let it ride and keep moving.
Cheers,
Wookie
Since the element you're looking for is always roughly in the middle of the array, you should expect the version that walks inward from both the start and end of the array to take about twice as long as one that just starts from the beginning.
Each variable update takes time, each comparison takes time, and you're doing twice as many of them. Since you know it will take one or two less iterations of the loop to terminate in this version, you should reason it will cost about twice as much CPU time.
This strategy is still O(n) time complexity since it only looks at each item once, it's just specifically worse when the item is near the center of the list. If it's near the end, this approach will have a better expected runtime. Try looking for item 90 in both, for example.
Selected answer is excellent. I'd like to add another aspect: Try findIndex(), it's 2-3 times faster than using loops:
const arr = Array.from({length: 900}, (_, i) => i);
const n = 51;
const len = arr.length;
console.time("iterate from start");
for (let i = 0; i < len; i++) {
if (arr[i] === n) break;
}
console.timeEnd("iterate from start");
console.time("iterate using findIndex");
var i = arr.findIndex(function(v) {
return v === n;
});
console.timeEnd("iterate using findIndex");
The other answers here cover the main reasons, but I think an interesting addition could be mentioning cache.
In general, sequentially accessing an array will be more efficient, particularly with large arrays. When your CPU reads an array from memory, it also fetches nearby memory locations into cache. This means that when you fetch element n, element n+1 is also probably loaded into cache. Now, cache is relatively big these days, so your 100 int array can probably fit comfortably in cache. However, on an array of much larger size, reading sequentially will be faster than switching between the beginning and the end of the array.
Let's say we need to check 1m users, how should it be done?
for (var i = 0;i<1000000;i++){
users[i].abc();
users[i].abc2();
}
or
for (var i = 0;i<1000000;i++){
var user = users[i];
user.abc();
user.abc2();
}
Which one would be faster and why?
The second loop is about 20%-30% faster. See the results of the snippet below. I.e. creation of a reference takes less time than addressing by index in an array.
var users = [];
for (var i = 0;i<1000000;i++){
users.push({abc: function() {}, abc2: function() {}});
}
var now = new Date();
for (var i = 0;i<1000000;i++){
users[i].abc();
users[i].abc2();
}
console.log('The first loop requires ' + (new Date().getTime() - now.getTime()) + 'ms');
now = new Date();
for (var i = 0;i<1000000;i++){
var user = users[i];
user.abc();
user.abc2();
}
console.log('The second loop requires ' + (new Date().getTime() - now.getTime()) + 'ms');
Loop version 1 will run slower but will take less memory usage. The reason is that it accesses your array with the iterator i 2 times per loop iteration.
Loop version 2 will run faster but will consume more memory. The reason is that it accesses the array only once per loop iteration, but creates a variable instance (user).
Having said that, both versions are very similar and all performance / memory usage differences are basically insignificant.
According to https://en.wikipedia.org/wiki/Chrome_V8 the v8 compiler will compile your code to native machine code.
Depending of the optimizations made by the compiler, you have no precise way to guarantee which version will be faster.
As pointed out in other answers, the difference, if any, will not be relevant.
The compiled code is additionally optimized (and re-optimized)
dynamically at runtime, based on heuristics of the code's execution
profile. Optimization techniques used include inlining, elision of
expensive runtime properties, and inline caching, among many others.
So, the points to take in consideration for your case are not based on execution speed.
I would say that if you run a lot of users[i] then dereferencing to a local user variable is ok, because on the long run it saves you characters to type ("s[i]")
If you run only one or two users[i] then stay with that because dereferencing will only use more line codes.
In short, I would opt for the code which is the more compact.
UPDATE:
I tried #Alexander Elgin code and it shows huge differences on local execution from 50% to 20% speed gain, so it is not 'irrelevant' as I or others stated (+1 for him)
But, I stand on the idea that it all depends of the optimizations performed by the execution engine, but indeed on my nodejs version, the dereferencing seems to be a lot faster on huge loops.
I wrote an algorithm to solve N-puzzle problem using breadth-first search. In trying to make things faster I decided to allocate a large array up front instead of repeatedly pushing and shifting values to an empty array.
By chance I noticed an odd behavior where allocating the large array twice actually made the wall clock time faster. I created a gist with the complete code but the part that's giving me odd behavior is here:
var values = new Array(1000000);
function breadthFirstSearch(state, goalState) {
values = new Array(1000000);
// implementation of a breadth first search for the n-puzzle problem
}
The typical runtime when values is initialized twice using node.js is about 550ms on my machine. When I comment out one of the places where I am instantiating values so I am only instantiating it once, the runtime increases to about 650ms - 700ms. It seems to me like the runtime should decrease when only allocating the array one time. I created a video to explain what I am doing and that shows the run times on my machine here.
Oddly, when I ran it on repl.it the runtime is about the same whether it's commented out or not, which makes me think it has something to do with the v8 engine.
Can anyone give me an explanation why the wall clock time increases when doing what should be less work?
I have written most of what will be in this answer in the comments, because at the time of writing those comments, this question was locked.
I have extended original code with better time measuring calls, which are using performance.now, which is available in modern browser or as node module performance-now. The code is uploaded as this gist.
What I've changed:
Added function init(), which (re)initializes all global variables, otherwise the program won't work across multiple runs.
Duplicated function breadthFirstSearch into breadthFirstSearchUncommented and breadthFirstSearchCommented, first one initializes values = new Array(1000000); on each call and the second one uses global variable values as it is, i.e. the statement mentioned before is commented / removed.
Modified function time to accept which BFS function should be used.
Added function measureTime, which calls time with given BFS function argument and measures min, max and average of runs (argument) measures. At the end, it outputs total time (sum of times for all time calls). Code for this function is below and of course on the gist, at the end of index.js file.
Removed add logging messages, only log messages left are the beginning and end of each measureTime call.
Function measureTime:
function measureTime(bfsFn, label, runs) {
console.log('starting', runs, 'measures for', label);
var diff = 0;
var min = Number.MAX_VALUE, max = Number.MIN_VALUE;
for (var i = 0; i < runs; i++) {
init();
var start = now();
time(bfsFn);
var d = now() - start;
diff += d;
min = Math.min(d, min);
max = Math.max(d, max);
}
var avg = diff / runs;
console.log('%s - total time for %d measures: %sms', label, runs, diff.toFixed(3));
console.log('%s - avg time: %sms (%sms - %sms, Δt = %sms)',
label, avg.toFixed(3), min.toFixed(3), max.toFixed(3), (max - min).toFixed(3));
}
The actual answer: The question is why was program which was defining new array twice, i.e. once at the beginning and once at each BFS function call, faster than when the initialization code was commented.
The answer is that there is no explanation for this. It's not a result of how JavaScript VMs work, it's not any JavaScript quirks, it's much simpler: The statement in question is actually false.
What OP was saying, and was later even kind enough to provide a video explanation, was true only for single runs. The most important reason is, as we all know, modern operating systems are doing a lot of work simultaneously -- I know this is not exactly true, but bear with me for the sake of this explanation. This means that it's very difficult to have exact same runtime in multiple runs, therefore we see some fluctuation in execution times among multiple runs.
To get more accurate measurements, I called function measureTime 100 times for each commented and another 100 times for uncommented function. This created a sample which minimizes the fluctuations.
I created table of comparison for different execution environments. All tests were ran on OS X Yosemite, using most recent stable versions of browsers and io.js as of 6/18/2015.
| Environment | Uncommented time [ms] | Commented time [ms] |
| :-------------- | :---------------------- | :-------------------- |
| Chrome | 460.254 | 424.725 |
| Firefox | 796.471 | 781.662 |
| Safari | 529.492 | 474.580 |
| Node.js (io.js) | 411.804 | 394.807 |
This is the best table I could do since Stack Overflow doesn't support table markdown. For an actual table style, take a look at the gist.
In this table, it's clear that original statement saying code with uncommented array reinitialization is faster, as we can see that this is false for all four environments I tested.
The lesson to be learned here is do not make performance conclusions based on just few runs. Data of multiple test runs should be collected and be interpreted average of all values or in some cases, more sophisticated statistics methods should be used to get more accurate results.