What's the point of running unit tests in random order?

What's the point of running unit tests in random order? - javascript

I noticed the default behavior of Karma for my Angular builds is to run the Jasmine unit tests in random order. What is the benefit of running your tests in random order, vs running them in the same order every time?

Sometimes tests modify state that doesn't get reset for every test run. Maybe a test modifies a global variable that is shared by all tests. Maybe a test writew something to the database and doesn't clean it up when the test is done. These things shouldn't happen, but sometimes they do happen.
For example:
// Stupid contrived example. Don't ever do this.
let num = 0
it('A', () => {
expect(num + 1).toEqual(1)
})
it('B', () => {
num = 10
expect(num + 2).toEqual(12)
})
it('C', () => {
expect(num + 3).toEqual(13) // passes if B is run before C, otherwise fails
})
This may work fine if you always run test A, then B, then C. But then strange things my happen if you run only test C. Because of the changes that test A and B made, C passes when you run them all together. But then you run test C alone and it fails.
And now you are staring at you terminal output dumbfounded muttering "how on earth can test C fail?! It just passed!"
Randomizing the order helps with this a bit. Each test should be totally isolated and it shouldn't matter what order they run in. Randomizing them may help uncover this badness.
So, given that the order shouldn't matter, then why not randomize the order?

Related

How to Avoid or Test Randomness in Functional Programming

I started to make a dungeon-game that creates randomly new dungeons.
Currently I write the function createRoom that returns a new room object based on the arguments. This function is called in createDungeon, but it receivs random parameters there.
function createRoom (width = 1, height = 1, topLeftCoordinate = {x: 0, y: 0}) {
return {
width,
height,
topLeftCoordinate
}
}
function createDungeon (numberOfRooms, rooms = []) {
if (numberOfRooms <= 0) {
return rooms
}
const randomWidth = getRandomNumber(5, 10)
const randomHeight = getRandomNumber(5, 10)
const randomTopLeftCoordinate = {x: getRandomNumber(5, 10), y: getRandomNumber(5, 10)}
return createDungeon (
numberOfRooms - 1,
rooms.concat(Room.create(randomWidth, randomHeight, randomTopLeftCoordinate))
)
}
I don't know if this is the right why, because I don't how to test createDungeon. I can only test if this function retuns an array and the length of the array.. Is this enough or is there a design pattern for the randomness?

Well, first off I'm assuming that your getRandomNumber is in fact a pseudorandom seed-based generator with a global seed. To make it more in the spirit of true FP, you'd need to make the seed/generator passing and mutation explicit, but that's not something you absolutely have to do.
Now, the answer depends on what you want to test. If you need to make sure that your random generation provides the same values for a given seed (which is important when e.g. you want to have "world seeds" like Minecraft does), then it's enough to hardcode the seed and then proceed with known output.
An important note is that when using a global random number generator, every number "drawn out" of it will impact the future numbers. This means that if you change your test code later on to include some other numbers before the previous test cases, your hardcoded values will completely mismatch. You could mitigate that by ensuring that all independent test runs start with a fresh generator.
If you want to test the behavior in a more "reasonable" way, that is, whether the function generally behaves ok, you'll need to use more seeds and run it multiple times. Now whether the seeds are themselves random or hardcoded doesn't matter; the important difference is that your validation rules now can't test for specific value equality, but instead need to check for boundaries or some other range criteria.

In general, test execution should be deterministic and repeatable, so dealing with randomness in tests should be avoided (there's a great book, Lasse Koskela's "Effective Unit Testing", where you can find out how to deal with randomness in tests, among many other issues).
This means, for example, that if you write a test to check what is the result of createDungeon(3), the results of that call should always be the same, so the assertions you make about that result are always valid.
May I suggest a small refactor, in your example: the random number generator should be passed as an argument to createDungeon:
function createDungeon (numberOfRooms, randomGenerator, rooms = []) {
...
const randomWidth = randomGenerator.getRandomNumber(5, 10)
...
}
In your tests, you pass a test double (a mock object) for your randomGenerator, previously set up to return some known values. You can use some mock framework like JsMockito, and then you can just do something like:
generator = mock(RandomGenerator);
when(generator).getRandomNumber(5,10).thenReturn(8);
// run the test
generateDungeon(3, generator);
// verify results
assert(...)

Automatically detect test coupling in Protractor (randomizing test execution order)

The Problem:
We have a rather large test codebase. From time to time, instead of executing all the tests, we execute them individually or in packs. But, sometimes, we see the unexpected test failures because of the tests being interconnected, coupled. For example, one test assumes there is some data created by a previous test - running this kind of test individually will fail.
The Question:
Is it possible to automatically detect which Protractor tests are coupled in the project?
Our current idea is to somehow randomize the test execution order or randomly pick up a pack of tests from all the available tests and check if there are no failures. Hence, the other question: is it possible to change/randomize the Protractor test discovery and change the order of test execution?
Inspired by the Ned Batchelder's "Finding test coupling" blogpost and the Python nose test runner's nose-randomly plugin:
Randomness in testing can be quite powerful to discover hidden flaws
in the tests themselves, as well as giving a little more coverage to
your system.
By randomly ordering the tests, the risk of surprising inter-test
dependencies is reduced - a technique used in many places, for example
Google’s C++ test runner googletest.

You can run tests randomly (at the file level) by setting the random property in your config . You can also set your salt/seed so it's reproducibly random.
/**
* If true, run specs in semi-random order
*/
random?: boolean,
/**
* Set the randomization seed if randomization is turned on
*/
seed?: string,
You could also turn on shardTestFiles (parallel test runs), which should also be very telling in how coupled your tests are.

Did you try shuffling "it" blocks like below:
var shuffle = function (items) {
var item, randomIndex;
for(var i = 0; i < items.length; i++){
randomIndex= (Math.random() * items.length) | 0;
item = items[i];
items[i] = items[randomIndex];
items[randomIndex] = item;
}
}
describe('Suite', function() {
it("should a", function () {
console.log("execute a");
});
it("should b", function () {
console.log("execute b");
});
it("should c", function () {
console.log("execute c");
});
shuffle(this.children); // shuffle the 'it' blocks
});
Source: Can protractor tests be run in a random order?

One problem is you likely have no idea how tests might be coupled. If one test referenced some variables from another test, you might be able to find those automatically but that's only one way tests might be coupled and probably not a likely scenario.
My first thought was to just run them individually and see which ones fail. The problem is that if you aren't cleaning state between tests, you might change the order (randomizing them, as you suggested) but if test 50 expects the data that test 20 set up but in the new order, test 20 still runs before test 50... test 50 will still pass. You will find some but probably not all until you run all of them in a random order several times.
You don't describe your application but my second thought was if there was a way to get back to a clean slate between tests, you should be able to find the tests that rely on other tests to set up data. I'm kinda surprised you aren't doing that already but if there's a long setup process that has to run to install a clean slate, etc. that might be an issue. Depending on your system, you might be able to snapshot a VM after a clean install and restore it to "quickly" get back to clean or you may be able to roll back SQL tables, etc. It really depends on your system and without more details on the system, it's hard to offer advice.
Another option is to go to those that wrote or maintain the tests and have them self-identify the tests they own that are coupled and fix them. This likely won't find them all but it might be a semi-quick start.
Oh... I just thought of one more thing. If you could reverse the order of test execution, that should be better than randomizing the execution. With reverse order, NO test would run after it's former predecessor and you should be able to find them all in one go.

Whack-A-Mole game with huge bug! Can I get some help fixing it?

I am writing a Whack-A-Mole game for class using HTML5, CSS3 and JavaScript. I have run into a very interesting bug where, at seemingly random intervals, my moles with stop changing their "onBoard" variables and, as a result, will stop being assigned to the board. Something similar has also happened with the holes, but not as often in my testing. All of this is completely independent of user interaction.
You guys and gals are my absolute last hope before I scrap the project and start completely from scratch. This has frustrated me to no end. Here is the Codepen and my github if you prefer to have the images.
Since Codepen links apparently require accompanying code, here is the function where I believe the problem is occuring.
// Run the game
function run() {
var interval = (Math.floor(Math.random() * 7) * 1000);
if(firstRound) {
renderHole(mole(), hole(), lifeSpan());
firstRound = false;
}
setTimeout(function() {
renderHole(mole(), hole(), lifeSpan());
run();
}, interval);
}
What I believe is happening is this. The function runs at random intervals, between 0-6 seconds. If the function runs too quickly, the data that is passed to my renderHole() function gets overwritten with the new data, thus causing the previous hole and mole to never be taken off the board (variable wise at least).
EDIT: It turns out that my issue came from my not having returns on my recursive function calls. Having come from a different language, I was not aware that, in JavaScript, functions return "undefined" if nothing else is indicated. I am, however, marking GameAlchemist's answer as the correct one due to the fact that my original code was convoluted and confusing, as well as redundant in places. Thank you all for your help!

You have done here and there in your code some design mistakes that, one after another, makes the code hard to read and follow, and quite impossible to debug.
the mole() function might return a mole... or not... or create a timeout to call itself later.. what will be done with the result when mole calls itself again ? nothing, so it will just be marked as onBoard never to be seen again.
--->>> Have a clear definition and a single responsibility for mole(): for instance 'returns an available non-displayed mole character or null'. And that's all, no count, no marking of the objects, just KISS (Keep It Simple S...) : it should always return a value and never trigger a timeout.
Quite the same goes for hole() : return a free hole or null, no marking, no timeout set.
render should be simplified : get a mole, get a hole, if either couldn't be found bye bye. If a mole+hole was found, just setup the new mole/hole couple + event handler (in a separate function). Your main run function will ensure to try again and again to spawn moles.

What makes this function run much slower?

I've been trying to make an experiment to see if the local variables in functions are stored on a stack.
So I wrote a little performance test
function test(fn, times){
var i = times;
var t = Date.now()
while(i--){
fn()
}
return Date.now() - t;
}
ene
function straight(){
var a = 1
var b = 2
var c = 3
var d = 4
var e = 5
a = a * 5
b = Math.pow(b, 10)
c = Math.pow(c, 11)
d = Math.pow(d, 12)
e = Math.pow(e, 25)
}
function inversed(){
var a = 1
var b = 2
var c = 3
var d = 4
var e = 5
e = Math.pow(e, 25)
d = Math.pow(d, 12)
c = Math.pow(c, 11)
b = Math.pow(b, 10)
a = a * 5
}
I expected to get inversed function work much faster. Instead an amazing result came out.
Untill I test one of the functions it runs 10 times faster than after testing the second one.
Example:
> test(straight, 10000000)
30
> test(straight, 10000000)
32
> test(inversed, 10000000)
390
> test(straight, 10000000)
392
> test(inversed, 10000000)
390
Same behaviour when tested in alternative order.
> test(inversed, 10000000)
25
> test(straight, 10000000)
392
> test(inversed, 10000000)
394
I've tested it both in the Chrome browser and in Node.js and I've got absolutely no clue why would it happen.
The effect lasts till I refresh the current page or restart Node REPL.
What could be a source of such significant (~12 times worse) performance?
PS. Since it seems to work only in some environemnts please write the environment You're using to test it.
Mine were:
OS: Ubuntu 14.04
Node v0.10.37
Chrome 43.0.2357.134 (Official Build) (64-bit)
/Edit
On Firefox 39 it takes ~5500 ms for each test regardless of the order. It seems to occur only on specific engines.
/Edit2
Inlining the function to the test function makes it run always the same time.
Is it possible that there is an optimization that inlines the function parameter if it's always the same function?

Once you call test with two different functions fn() callsite inside it becomes megamorphic and V8 is unable to inline at it.
Function calls (as opposed to method calls o.m(...)) in V8 are accompanied by one element inline cache instead of a true polymorphic inline cache.
Because V8 is unable to inline at fn() callsite it is unable to apply a variety of optimizations to your code. If you look at your code in IRHydra (I uploaded compilation artifacts to gist for your convinience) you will notice that first optimized version of test (when it was specialized for fn = straight) has a completely empty main loop.
V8 just inlined straight and removed all the code your hoped to benchmark with Dead Code Elimination optimization. On an older version of V8 instead of DCE V8 would just hoist the code out of the loop via LICM - because the code is completely loop invariant.
When straight is not inlined V8 can't apply these optimizations - hence the performance difference. Newer version of V8 would still apply DCE to straight and inversed themselves turning them into empty functions
so the performance difference is not that big (around 2-3x). Older V8 was not aggressive enough with DCE - and that would manifest in bigger difference between inlined and not-inlined cases, because peak performance of inlined case was solely result of aggressive loop-invariant code motion (LICM).
On related note this shows why benchmarks should never be written like this - as their results are not of any use as you end up measuring an empty loop.
If you are interested in polymorphism and its implications in V8 check out my post "What's up with monomorphism" (section "Not all caches are the same" talks about the caches associated with function calls). I also recommend reading through one of my talks about dangers of microbenchmarking, e.g. most recent "Benchmarking JS" talk from GOTO Chicago 2015 (video) - it might help you to avoid common pitfalls.

You're misunderstanding the stack.
While the "real" stack indeed only has the Push and Pop operations, this doesn't really apply for the kind of stack used for execution. Apart from Push and Pop, you can also access any variable at random, as long as you have its address. This means that the order of locals doesn't matter, even if the compiler doesn't reorder it for you. In pseudo-assembly, you seem to think that
var x = 1;
var y = 2;
x = x + 1;
y = y + 1;
translates to something like
push 1 ; x
push 2 ; y
; get y and save it
pop tmp
; get x and put it in the accumulator
pop a
; add 1 to the accumulator
add a, 1
; store the accumulator back in x
push a
; restore y
push tmp
; ... and add 1 to y
In truth, the real code is more like this:
push 1 ; x
push 2 ; y
add [bp], 1
add [bp+4], 1
If the thread stack really was a real, strict stack, this would be impossible, true. In that case, the order of operations and locals would matter much more than it does now. Instead, by allowing random access to values on the stack, you save a lot of work for both the compilers, and the CPU.
To answer your actual question, I'm suspecting neither of the functions actually does anything. You're only ever modifying locals, and your functions aren't returning anything - it's perfectly legal for the compiler to completely drop the function bodies, and possibly even the function calls. If that's indeed so, whatever performance difference you're observing is probably just a measurement artifact, or something related to the inherent costs of calling a function / iterating.

Inlining the function to the test function makes it run always the same time.
Is it possible that there is an optimization that inlines the function parameter if it's always the same function?
Yes, this seems to be exactly what you are observing. As already mentioned by #Luaan, the compiler likely drops the bodies of your straight and inverse functions anyway because they are not having any side effects, but only manipulating some local variables.
When you are calling test(…, 100000) for the first time, the optimising compiler realises after some iterations that the fn() being called is always the same, and does inline it, avoiding the costly function call. All that it does now is 10 million times decrementing a variable and testing it against 0.
But when you are calling test with a different fn then, it has to de-optimise. It may later do some other optimisations again, but now knowing that there are two different functions to be called it cannot inline them any more.
Since the only thing you're really measuring is the function call, that leads to the grave differences in your results.
An experiment to see if the local variables in functions are stored on a stack
Regarding your actual question, no, single variables are not stored on a stack (stack machine), but in registers (register machine). It doesn't matter in which order they are declared or used in your function.
Yet, they are stored on the stack, as part of so called "stack frames". You'll have one frame per function call, storing the variables of its execution context. In your case, the stack might look like this:
[straight: a, b, c, d, e]
[test: fn, times, i, t]
…

Testing/validating/evaluating the outcome of every path in a function?

Disclaimer - I've tried finding an answer to this via google/stackoverflow, but I don't know how to define the problem (I don't know the proper term)
I have many small AI snippets such as what follows. There is an ._ai snippet (like below) per enemy type, with one function next() which is called by the finite state machine in the main game loop (fyi: the next function doesn't get called every update iteration, only when the enemy is shifted from the queue).
The question: How do I test every case (taking into account some enemy AI snippets might be more complex, having cases that may occur 1 in 1000 turns) and ensure the code is valid?
In the example below, if I added the line blabla/1 under count++, the error might not crop for a long time, as the Javascript interpreter won't catch the error until it hits that particular path. In compiled languages, adding garbage such as blabla/1 would be caught at compile time.
// AI Snippet
this._ai = (function(commands){
var count = 0;
return {
next: function(onDone, goodies, baddies) {
// If the internal counter reaches
// 2, launch a super attack and
// reset the count
if(count >= 2) {
commands.super(onDone);
count = 0;
}
else {
// If not performing the super attack
// there is a 50% chance of calling
// the `attack` command
if(chance(50)) {
var target = goodies[0];
commands.attack(onDone, target);
}
// Or a 50% chance of calling the
// `charge` command
else {
commands.charge(onDone);
count++;
}
}
}
};
})(this._commands);
I could rig the random generator to return a table of values from 0-n and run next 1000's of times against each number. I just don't feel like that is will concretely tell me every path is error free.

As you say, unit tests must test every path so you will be sure all works well.
But you should be able to decide which path the method will follow before calling it on your tests, so you're be able to know if the method behaviour is the expected one, and if there is any error.
So, for example, if there is a path that will be followed in only one of every 1000 executions, you shouldn't need to test all 0, 1, 2 ... 999 cases. You only one combination of results that behave distinctly.
For example, in the snippet shown you have these cases:
the counter has reached 2
the counter has not reached 2 and chance returns true
the counter has not reached 2 and chance returns false
One way to archieve this is taking control of the counter and of the chance method by mocking them.
If you want to know what happens when the counter has reached 2 and the next method is called, just pass a counter with 2 and call next. You don't need to reach 2 on the counter by really passing for all the code.
As for the randomizer, you don't need to try until the randomizer returns the value you want to test. Make it a mock and configure it to behave as you need for each case.
I hope this helps.

We Keep Coding

JavaScript is the programming language of the Web.