Perfomance: Switch vs Polymorphism - javascript

I usually prefer polymorphism instead of switch when it possible. I find it more readable and it requires fewer lines. I believe these facts are enough to continue to use it. But what about performance? I've create a pretty simple (and bad) bench and it looks that switch is faster in my case. Could you please explain why?
https://jsfiddle.net/oqzpfqcg/1/
var class1 = { GetImportantValue: () => 1 };
var class2 = { GetImportantValue: () => 2 };
var class3 = { GetImportantValue: () => 3 };
var class4 = { GetImportantValue: () => 4 };
var class5 = { GetImportantValue: () => 5 };
getImportantValueSwitch = (myClassEnum) => {
switch (myClassEnum.type) {
case 'MyClass1': return 1;
case 'MyClass2': return 2;
case 'MyClass3': return 3;
case 'MyClass4': return 4;
case 'MyClass5': return 5;
}
}
getImportantValuePolymorphism = (myClass) => myClass.GetImportantValue();
test = () => {
var INTERATION_COUNT = 10000000;
var t0 = performance.now();
for (var i = 0; i < INTERATION_COUNT; i++) {
getImportantValuePolymorphism(class1);
getImportantValuePolymorphism(class2);
getImportantValuePolymorphism(class3);
getImportantValuePolymorphism(class4);
getImportantValuePolymorphism(class5);
}
var t1 = performance.now();
var t2 = performance.now();
for (var i = 0; i < INTERATION_COUNT; i++) {
getImportantValueSwitch({type: 'MyClass1'});
getImportantValueSwitch({type: 'MyClass2'});
getImportantValueSwitch({type: 'MyClass3'});
getImportantValueSwitch({type: 'MyClass4'});
getImportantValueSwitch({type: 'MyClass5'});
}
var t3 = performance.now();
var first = t1 - t0;
var second = t3 - t2;
console.log("The first sample took " + first + " ms");
console.log("The second sample took " + second + " ms");
console.log("first / second = " + (first/second));
};
test();
So as far as I understand the first sample has one dynamic/virtual runtime call myClass.GetImportantValue() and that's it. But the second has one dynamic/virtual runtime call as well myClassEnum.type and then check the condition in the switch.
Most probably I have some mistakes in the code but I cannot find it. The only thing that I suppose can affect result is performance.now(). But I think it does not affect so much.

V8 developer here. Your intuition is right: this microbenchmark isn't very useful.
One issue is that all your "classes" have the same shape, so the "polymorphic" case is in fact monomorphic. (If you fix this, be aware that V8 has vastly different performance characteristics for <= 4 and >= 5 polymorphic cases!)
One issue is that you're relying on on-stack replacement (OSR) for optimization, so the performance impact of that pollutes your timings in a misleading way -- especially for functions that have this pattern of two subsequent long-running loops: they get OSR-optimized for the first loop, deoptimized in the middle, then OSR-optimized again for the second loop.
One issue is that the compiler inlines many things, so the actually executed machine code can have a very different structure from the JavaScript code you wrote. In particular in this case, getImportantValueSwitch gets inlined, the {type: 'MyClass*'} constant object creations get elided, and the resulting code is just a few comparisons, which are very fast.
One issue is that with small functions, call overhead pretty much dominates everything else. V8's optimizing compiler doesn't currently do polymorphic inlining (because that's not always a win), so significant time is spent calling the () => 1 etc functions. That's unrelated to the fact that they're dynamically dispatched -- retrieving the right function from the object is pretty fast, calling it is what has the overhead. For larger functions, you wouldn't notice it much, but for almost-empty functions, it's quite significant compared to the switch-based alternative that doesn't do any calls.
Long story short: in microbenchmarks, one tends to measure weird effects unrelated to what one intended to measure; and in larger apps, most implementation details like this one don't have measurable impact. Write the code that makes sense to you (is readable, maintainable, etc), let the JavaScript engine worry about the rest! (Exception: Sometimes profiling indicates that your app has a particular bottleneck -- in such cases, hand-optimizing things can have big impact, but that's usually achieved by taking context into account and making the overall algorithm / control flow more efficient, rather than following simple rules of thumb like "prefer polymorphism over switch statements" (or the other way round).)

I do not see a "mistake" in your script. Although I really do not encourage performance testing this way, I might still be able to say a couple of things based on my intuition. I do not have solid, well tested results with control groups etc so take everything I say with a pinch of salt.
Now, for me it is quite normal to assume that the first option is gonna eat the dust of the second because there are couple of things more expensive than variable access in js:
object property access (presumably O(1) hash table, but still slower than variable access)
function call
If we count the function call and object access:
first case: 5 call [to getImportantValuePolymorphism] x (1 object access [to myClass] + 1 function call [to GetImportantValue] ===> TOTAL OF 10 function calls + 5 object access
second case: 5 call [to getImportantValueSwitch ] + 5 object access [to MyClassEnum] ===> TOTAL 5 function calls + 5 object access
One more things to mention, that in the first case, you have a function that calls another function so you end up with a scope chain. Net effect of this is minute, but still detrimental in means of performance.
If we account all the above factors, first will be slower. But how much? That is not easy to answer as it will depend on vendor implementations but in your case it about 25 times slower in chrome. Assuming we have double the function calls in the first case and a scope chain, one would expect it to be 2 or 3 times slower, but not 25.
This exponential decrease in performance I presume is due to the fact that you are starving the event loop, meaning that when you give a synchronous task to js, since it is single threaded, if the task is a cumbersome one, the event loop cannot proceed and gets stuck for good second or so. This question comes around when people see strange behavior of setTimeout or other async calls when they fire way off from the target time frame. This is as I said, due to the fact that previous synchronous task is taking way too long. In your case you have a synchronous for loop that iterates 10 million times.
To test my hypothesis, decrease the ITERATION_COUNT to 100000, that is 100 times less, you will see that in chrome, the ratio will decrase from ~20 to ~2. So the bottom line 1: Part of the inefficiency you observe is stemming from the fact that you are starving the event loop, but it still does not change the fact that first option is slower.
To test that function calls are indeed the bottle neck here, change the relevant parts of your script to this:
class1 = class1.GetImportantValue;
class2 = class2.GetImportantValue;
class3 = class3.GetImportantValue;
class4 = class4.GetImportantValue;
class5 = class5.GetImportantValue;
and for the test:
for (var i = 0; i < INTERATION_COUNT; i++) {
class1();
class2();
class3();
class4();
class5();
}
Resulting fiddle: https://jsfiddle.net/ibowankenobi/oqzpfqcg/2/
This time you will see that the first one is faster because it is (5 function calls) vs ( 5 function calls + 5 object access).

Related

Javascript loop performance

Let's say we need to check 1m users, how should it be done?
for (var i = 0;i<1000000;i++){
users[i].abc();
users[i].abc2();
}
or
for (var i = 0;i<1000000;i++){
var user = users[i];
user.abc();
user.abc2();
}
Which one would be faster and why?
The second loop is about 20%-30% faster. See the results of the snippet below. I.e. creation of a reference takes less time than addressing by index in an array.
var users = [];
for (var i = 0;i<1000000;i++){
users.push({abc: function() {}, abc2: function() {}});
}
var now = new Date();
for (var i = 0;i<1000000;i++){
users[i].abc();
users[i].abc2();
}
console.log('The first loop requires ' + (new Date().getTime() - now.getTime()) + 'ms');
now = new Date();
for (var i = 0;i<1000000;i++){
var user = users[i];
user.abc();
user.abc2();
}
console.log('The second loop requires ' + (new Date().getTime() - now.getTime()) + 'ms');
Loop version 1 will run slower but will take less memory usage. The reason is that it accesses your array with the iterator i 2 times per loop iteration.
Loop version 2 will run faster but will consume more memory. The reason is that it accesses the array only once per loop iteration, but creates a variable instance (user).
Having said that, both versions are very similar and all performance / memory usage differences are basically insignificant.
According to https://en.wikipedia.org/wiki/Chrome_V8 the v8 compiler will compile your code to native machine code.
Depending of the optimizations made by the compiler, you have no precise way to guarantee which version will be faster.
As pointed out in other answers, the difference, if any, will not be relevant.
The compiled code is additionally optimized (and re-optimized)
dynamically at runtime, based on heuristics of the code's execution
profile. Optimization techniques used include inlining, elision of
expensive runtime properties, and inline caching, among many others.
So, the points to take in consideration for your case are not based on execution speed.
I would say that if you run a lot of users[i] then dereferencing to a local user variable is ok, because on the long run it saves you characters to type ("s[i]")
If you run only one or two users[i] then stay with that because dereferencing will only use more line codes.
In short, I would opt for the code which is the more compact.
UPDATE:
I tried #Alexander Elgin code and it shows huge differences on local execution from 50% to 20% speed gain, so it is not 'irrelevant' as I or others stated (+1 for him)
But, I stand on the idea that it all depends of the optimizations performed by the execution engine, but indeed on my nodejs version, the dereferencing seems to be a lot faster on huge loops.

Which Boolean is Faster? < or <=

I'm doing some work involving processing an insane amount of data in browser. As a result I'm trying to optimize everything down to the nuts and bolts. I don't need anyone telling me that I'm wasting my time or that premature optimization is the root of all evil.
I would just like to know if anyone that understands how JS works would know whether or not a lesser than boolean runs faster than a lesser than equals boolean. What I mean by that is, would:
return (i<2? 0:1)
Be parsed and run faster than:
return (i<=1? 0:1)
In this example we're assuming that i is an integer. Thanks.
JavaScript standard desribes the steps that needs to be taken in order to evaluate those expressions. You can take a look at ECMAScript 2015 Language Specification, section 12.9.3.
Be aware that even if there is slightly difference between steps of those two operation, other stuff in your application will have much more influence on performance that these simple operations that you cannot control in JavaScript. For example work of garbage collector, just-in-time compiler, ...
Even if you try measuring time in JavaScript, this will not work as just taking time stamps has much bigger influence on the performance than the actual expression you want to measure. Also the code that you wrote might not be the one which is really evaluated as some preoptimizations might me taken by the engine prior to actual running the code.
I wouldn't call this micro-optimisation, but rather nano-optimisation.
Cases are so similar you'll most likely have a measure precision below the gain you can expect...
(Edit)
If this code is optimised, the generated assembly code will just change from JAto JAE (in (x86) , and they use the same cycle count. 0,0000% change.
If it is not, you might win one step within a selectof the engine...
The annoying thing being that it makes you miss the larger picture : unless i'm wrong, you need a branch here, and if you're that worried about time, the statistical distribution of your input will influence WAY more the execution time. (but still not that much...)
So walk a step back and compare :
if (i<2)
return 0;
else
return 1;
and :
if (i>=2)
return 1;
else
return 0;
You see that for ( 100, 20, 10, 1, 50, 10) (1) will branch way more and for (0, 1, 0, 0, 20, 1), (2) branches more.
That will make much more difference... that might just as well be very difficult to measure !!!
(As a question left to the reader, i wonder how return +(i>1) compiles, and if there's a trick to avoid branching... )
(By the way i'm not against early optimisation, i even posted some advices here, if it might interest you : https://gamealchemist.wordpress.com/2016/04/15/writing-efficient-javascript-a-few-tips/ )
I have created a fiddle using performance.now API and console.time API's
Both the API says how much ms of the time was taken to execute the functions/loops.
I feel the major difference is the result, performance.now gives more accurate value i.e. upto 1/1000th ms.
https://jsfiddle.net/ztacgxf1/
function lessThan(){
var t0 = performance.now();
console.time("lessThan");
for(var i = 0; i < 100000; i++){
if(i < 1000){}
}
console.timeEnd("lessThan");
var t1 = performance.now();
console.log("Perf -- >>" + (t1-t0));
}
function lessThanEq(){
var t0 = performance.now();
console.time("lessThanEq")
for(var i = 0; i < 100000; i++){
if(i <= 999){}
}
console.timeEnd("lessThanEq");
var t1 = performance.now();
console.log("Perf -- >>" + (t1-t0));
}
lessThan()
lessThanEq()
I haven't much difference. May be iterating more may give different result.
Hope this helps you.

What makes this function run much slower?

I've been trying to make an experiment to see if the local variables in functions are stored on a stack.
So I wrote a little performance test
function test(fn, times){
var i = times;
var t = Date.now()
while(i--){
fn()
}
return Date.now() - t;
}
ene
function straight(){
var a = 1
var b = 2
var c = 3
var d = 4
var e = 5
a = a * 5
b = Math.pow(b, 10)
c = Math.pow(c, 11)
d = Math.pow(d, 12)
e = Math.pow(e, 25)
}
function inversed(){
var a = 1
var b = 2
var c = 3
var d = 4
var e = 5
e = Math.pow(e, 25)
d = Math.pow(d, 12)
c = Math.pow(c, 11)
b = Math.pow(b, 10)
a = a * 5
}
I expected to get inversed function work much faster. Instead an amazing result came out.
Untill I test one of the functions it runs 10 times faster than after testing the second one.
Example:
> test(straight, 10000000)
30
> test(straight, 10000000)
32
> test(inversed, 10000000)
390
> test(straight, 10000000)
392
> test(inversed, 10000000)
390
Same behaviour when tested in alternative order.
> test(inversed, 10000000)
25
> test(straight, 10000000)
392
> test(inversed, 10000000)
394
I've tested it both in the Chrome browser and in Node.js and I've got absolutely no clue why would it happen.
The effect lasts till I refresh the current page or restart Node REPL.
What could be a source of such significant (~12 times worse) performance?
PS. Since it seems to work only in some environemnts please write the environment You're using to test it.
Mine were:
OS: Ubuntu 14.04
Node v0.10.37
Chrome 43.0.2357.134 (Official Build) (64-bit)
/Edit
On Firefox 39 it takes ~5500 ms for each test regardless of the order. It seems to occur only on specific engines.
/Edit2
Inlining the function to the test function makes it run always the same time.
Is it possible that there is an optimization that inlines the function parameter if it's always the same function?
Once you call test with two different functions fn() callsite inside it becomes megamorphic and V8 is unable to inline at it.
Function calls (as opposed to method calls o.m(...)) in V8 are accompanied by one element inline cache instead of a true polymorphic inline cache.
Because V8 is unable to inline at fn() callsite it is unable to apply a variety of optimizations to your code. If you look at your code in IRHydra (I uploaded compilation artifacts to gist for your convinience) you will notice that first optimized version of test (when it was specialized for fn = straight) has a completely empty main loop.
V8 just inlined straight and removed all the code your hoped to benchmark with Dead Code Elimination optimization. On an older version of V8 instead of DCE V8 would just hoist the code out of the loop via LICM - because the code is completely loop invariant.
When straight is not inlined V8 can't apply these optimizations - hence the performance difference. Newer version of V8 would still apply DCE to straight and inversed themselves turning them into empty functions
so the performance difference is not that big (around 2-3x). Older V8 was not aggressive enough with DCE - and that would manifest in bigger difference between inlined and not-inlined cases, because peak performance of inlined case was solely result of aggressive loop-invariant code motion (LICM).
On related note this shows why benchmarks should never be written like this - as their results are not of any use as you end up measuring an empty loop.
If you are interested in polymorphism and its implications in V8 check out my post "What's up with monomorphism" (section "Not all caches are the same" talks about the caches associated with function calls). I also recommend reading through one of my talks about dangers of microbenchmarking, e.g. most recent "Benchmarking JS" talk from GOTO Chicago 2015 (video) - it might help you to avoid common pitfalls.
You're misunderstanding the stack.
While the "real" stack indeed only has the Push and Pop operations, this doesn't really apply for the kind of stack used for execution. Apart from Push and Pop, you can also access any variable at random, as long as you have its address. This means that the order of locals doesn't matter, even if the compiler doesn't reorder it for you. In pseudo-assembly, you seem to think that
var x = 1;
var y = 2;
x = x + 1;
y = y + 1;
translates to something like
push 1 ; x
push 2 ; y
; get y and save it
pop tmp
; get x and put it in the accumulator
pop a
; add 1 to the accumulator
add a, 1
; store the accumulator back in x
push a
; restore y
push tmp
; ... and add 1 to y
In truth, the real code is more like this:
push 1 ; x
push 2 ; y
add [bp], 1
add [bp+4], 1
If the thread stack really was a real, strict stack, this would be impossible, true. In that case, the order of operations and locals would matter much more than it does now. Instead, by allowing random access to values on the stack, you save a lot of work for both the compilers, and the CPU.
To answer your actual question, I'm suspecting neither of the functions actually does anything. You're only ever modifying locals, and your functions aren't returning anything - it's perfectly legal for the compiler to completely drop the function bodies, and possibly even the function calls. If that's indeed so, whatever performance difference you're observing is probably just a measurement artifact, or something related to the inherent costs of calling a function / iterating.
Inlining the function to the test function makes it run always the same time.
Is it possible that there is an optimization that inlines the function parameter if it's always the same function?
Yes, this seems to be exactly what you are observing. As already mentioned by #Luaan, the compiler likely drops the bodies of your straight and inverse functions anyway because they are not having any side effects, but only manipulating some local variables.
When you are calling test(…, 100000) for the first time, the optimising compiler realises after some iterations that the fn() being called is always the same, and does inline it, avoiding the costly function call. All that it does now is 10 million times decrementing a variable and testing it against 0.
But when you are calling test with a different fn then, it has to de-optimise. It may later do some other optimisations again, but now knowing that there are two different functions to be called it cannot inline them any more.
Since the only thing you're really measuring is the function call, that leads to the grave differences in your results.
An experiment to see if the local variables in functions are stored on a stack
Regarding your actual question, no, single variables are not stored on a stack (stack machine), but in registers (register machine). It doesn't matter in which order they are declared or used in your function.
Yet, they are stored on the stack, as part of so called "stack frames". You'll have one frame per function call, storing the variables of its execution context. In your case, the stack might look like this:
[straight: a, b, c, d, e]
[test: fn, times, i, t]
…

Javascript perf, weird results

I want to know what is the better way to code in javascript for my nodejs project, so I did this:
function clas(){
}
clas.prototype.index = function(){
var i = 0;
while(i < 1000){
i++;
}
}
var t1 = new clas();
var f = 0;
var d1 = new Date();
while(f < 1000){
t1.index();
f++;
}
console.log("t1: "+(new Date()-d1)+"ms");
f=0;
var d2 = new Date();
while(f < 1000){
var t2 = new clas();
t2.index();
f++;
}
console.log("t2: "+(new Date()-d2)+"ms");
on my browser, the first and the second are the same... 1ms and with nodejs, i have t1 = 15ms and t2 = 1ms, why? why the first take more time than the second as he doesn't initialise my class?
Here are several issues. Your example shows that you have very little experience in benchmarking or system performance. That is why I recommend brushing up on the very basics, and until you got some more feel for it, don't try optimizing at all. Optimizing prematurely is generally a bad thing. If done by someone who does not know anything about performance optimization in the first place, "optimizations" end up being pure noise: Some work and some don't, pretty much at random.
For completeness, here are some things that are wrong with your test case:
First of all, 1000 is not enough for a performance test. You want to do iterations in the order of millions for your CPU to actually spend a remarkable amount of time on it.
Secondly, for benchmarking, you want to use a high performance timer. The reason as to why node gives you 15ms, is because it uses a coarse-grained system timer whose smallest unit is about 15ms, which most probably corresponds to your system's scheduling granularity.
Thirdly, regarding your actual question: Allocating a new object inside your loop, if not necessary, is almost always a bad choice for performance. There is a lot going on under the hood, including the possibility of heap allocations. However, in your simple case, most run-times will probably optimize away most of the overhead, for two reasons:
Your test case is too simple, and the optimizer can easily optimize simple code segments, but has a much harder time in real situations.
Your test case is transient. If the optimizer is smart enough, it will detect that, and it will skip the entire loop.
It is because node does jut-in-time ( JIT )compilation optimizations to the code.
by JIT-optimization, we mean that the node tries to optimize the code when it is executed.
So... the first call the the function is taking more time, and node realizes that it can optimize this for-loop, as it does nothing at all. Whereas for all other calls the optimized loop is executed.
So... subsequent calls will take less time.
You can try by changing the order. The first call will take the more time.
Where as in some browser's the code is optimized ahead-of-time (ie. before running the code).

Improve performance in js `for` loop

So I'm looking for some advice on the best method for toggling the class (set of three) of an element in a loop ending at 360 iterations. I'm trying to avoid nested loops, and ensure good performance.
What I have:
// jQuery flavour js
// vars
var framesCount = '360'; // total frames
var framesInterval = '5000'; // interval
var statesCount = 3; // number of states
var statesCountSplit = framesInterval/statesCount;
var $scene = $('#scene');
var $counter = $scene.find('.counter');
// An early brain dump
for (f = 1; f < framesCount; f += 1) {
$counter.text(f);
for (i = 1; i < statesCount; i += 1) {
setTimeout(function() {
$scene.removeClass().addClass('state-'+i);
}, statesCountSplit);
}
}
So you see for each of 360 frames there are three class switchouts at intervals. Although I haven't tested I'm concerned for the performance hit here once that frames value goes into the thousands (which it might).
This snippet is obviously flawed (very), please let me know what I can do to make this a) work, b) work efficiently. Thanks :-)
Some general advice:
1) Don't declare functions in a loop
Does this really need to be done in a setTimeout?
for (i = 1; i < statesCount; i += 1) {
setTimeout(function() {
$scene.removeClass().addClass('state-'+i);
}, statesCountSplit);
}
2) DOM operations are expensive
Is this really necessary? This will toggle so fast that you won't notice the counter going up. I don't understand the intent here, but it seems unnecessary.
$counter.text(f);
3) Don't optimize early
In your question, you stated that you haven't profiled the code in question. Currently, there's only about 1000 iterations, which shouldn't be that bad. DOM operations aren't too bad, as long as you aren't inserting/removing elements, and you're just modifying them.
I really wouldn't worry about performance at this point. There are other micro-optimizations you could apply (like changing the for loop into a decrementing while loop to save on a compare), but you gave no indication that the performance is a problem.
Closing thoughts
If I understand the logic correctly, you're code doesn't match it. The code will currently increment .counter as fast as the processor can iterate over your loops (should only take a few milliseconds or so for everything) and each of your "class switchouts" will fire 360 times within a few milliseconds of each other.
Fix your logic errors first, then worry about optimization if it becomes a problem.
Don't use a for loop for this. This will generate lots of setTimeout events which is known to slow browsers down. Use a single setTimeout instead:
function animate(framesCount, statesCount) {
$scene.removeClass().addClass('state-'+statesCount);
if (framesCount) {
setTimeout(
function(){
animate(framesCount-1,(statesCount%3)+1);
},
statesCountSplit
);
}
}
animate(360*3,1);

Categories