Giacomo Debidda

A 5 minute Intro to Hypothesis

October 14, 2017 | 7 min Read

Let’s say you have a Python module called example.py where you have a function called add_numbers that you want to test.

# example.py
def add_numbers(a, b):
    return a + b

This function adds two numbers, so among other things you might want to check that the commutative property holds for all the inputs that the function receives. You create a file called test_example.py and start writing a simple unit test to prove it.

A simple unit test

# test_example.py
import unittest
from your_python_module.example import add_numbers


class TestAddNumbers(unittest.TestCase):

    def test_add_numers_is_commutative(self):
        self.assertEqual(add_numbers(1.23, 4.56), add_numbers(4.56, 1.23))


if __name__ == '__main__':
    unittest.main()

That’s cool, but you actually didn’t prove that the commutative property holds in general. You have just proved that for this specific case such property holds.

You realize that the combination of 1.23 and 4.56 is a very tiny subset of the entire input space of numbers that your function can receive, so you write more tests.

A common solution: write more test cases

# test_example.py

# more tests here...
    def test_add_numers_is_commutative_another_case(self):
        self.assertEqual(add_numbers(0.789, 321), add_numbers(321, 0.789))
# more tests here...

Not a huge gain. You have just proved that for these other specific cases that you wrote the commutative property holds. And obviously you don’t want to write a million test cases by hand.

Maybe you have heard about fuzzing, and you want to use it to create random test cases every time you run the test.

A better solution: fuzzing and ddt

You can create a pair of random floats, so every time you run the test you have a new test case. That’s a bit dangerous though, because if you have a failure you cannot reproduce it easily.

# test_example.py
import random
import unittest


class TestAddNumbers(unittest.TestCase):

    def test_add_numers_is_commutative(self):
        a = random.random() * 10.0
        b = random.random() * 10.0
        self.assertEqual(add_numbers(a, b), add_numbers(b, a))


if __name__ == '__main__':
    unittest.main()

You can also use a library called ddt to generate many random test cases.

# test_example.py
import random
import unittest
from ddt import ddt, idata, unpack


def float_pairs_generator():
    num_test_cases = 100
    for i in range(num_test_cases):
        a = random.random() * 10.0
        b = random.random() * 10.0
        yield (a, b)


@ddt
class TestAddNumbers(unittest.TestCase):

    @idata(float_pairs_generator())
    @unpack
    def test_add_floats_ddt(self, a, b):
        self.assertEqual(add_numbers(a, b), add_numbers(b, a))


if __name__ == '__main__':
    unittest.main()

Here the float_pairs_generator generator function creates 100 random pairs of floats. With this trick you can multiply the number of test cases while keeping your tests easy to maintain.

That’s definitely a step in the right direction, but if you think about it we are still testing some random combinations of numbers between 0.0 and 10.0 here. Not a very extensive portion of the input domain of the function add_numbers.

You have two options:

  1. find a way to generate domain objects that your function can accept. In this case the domain objects are the floats that add_numbers can receive.
  2. use hypothesis

I don’t know about you, but I’m going for the second one.

The best solution: Hypothesis

Here is how you write a test that checks that the commutative property holds for a pair of floats.

# test_example.py
from hypothesis import given
from hypothesis.strategies import floats
from your_python_module.example import add_numbers


@given(a=floats(), b=floats())
def test_add_numbers(a, b):
    assert add_numbers(a, b) == add_numbers(b, a)


if __name__ == '__main__':
    test_add_numbers()

When I ran this test I was shocked. It failed!

WTF! How it that possible that this test fails, after I have tried 100 test cases with ddt?

Luckily with hypothesis you can increase the verbosity level of your test by using the @settings decorator.

Let’s say you also want to test a specific test case: a == 1.23 and b == 4.56. For this you can use the @example decorator. This is nice because now your test provides some documentation to anyone who wants to use the add_numbers function, and at the same time you are testing a specific case that you know about or that might be particularly hard to hit.

# test_example.py
from hypothesis import given, example, settings, Verbosity
from hypothesis.strategies import floats
from your_python_module.example import add_numbers


@settings(verbosity=Verbosity.verbose)
@example(a=1.23, b=4.56)
@given(a=floats(), b=floats())
def test_add_numbers(a, b):
    assert add_numbers(a, b) == add_numbers(b, a)


if __name__ == '__main__':
    test_add_numbers()

Obviously the test fails again, but this time you get more insights about it.

Trying example: test_add_numbers(a=1.23, b=4.56)
Trying example: test_add_numbers(a=0.0, b=nan)
Traceback (most recent call last):
  # Traceback here...
AssertionError
Trying example: test_add_numbers(a=0.0, b=nan)
Traceback (most recent call last):
  # Traceback here...
AssertionError
Trying example: test_add_numbers(a=0.0, b=1.0)
Trying example: test_add_numbers(a=0.0, b=4293918720.0)
Trying example: test_add_numbers(a=0.0, b=281406257233920.0)
Trying example: test_add_numbers(a=0.0, b=7.204000185188352e+16)
Trying example: test_add_numbers(a=0.0, b=inf)
# more test cases...
You can add @seed(247616548810050264291730850370106354271) to this test to reproduce this failure.

That’s really helpful. You can see all the test case that were successful and the ones that caused a failure. You get also a seed that you can use to reproduce this very specific failure at a later time or on a different computer. This is so awesome.

Anyway, why does this test fail? It fails because in Python nan and inf are valid floats, so the function floats() might create some test cases that have a == nan and/or b == inf.

Are nan and inf valid inputs for your application? Maybe. It depends on your application.

If you are absolutely sure that add_numbers will never receive a nan or a inf as inputs, you can write a test that never generates either nan or inf. You just have to set allow_nan and allow_infinity to False in the @given decorator. Easy peasy.

# test_example.py
from hypothesis import given, example, settings, Verbosity
from hypothesis.strategies import floats
from your_python_module.example import add_numbers


@given(
  a=floats(allow_nan=False, allow_infinity=False),
  b=floats(allow_nan=False, allow_infinity=False))
def test_add_numbers(a, b):
    assert add_numbers(a, b) == add_numbers(b, a)


if __name__ == '__main__':
    test_add_numbers()

But what if add_numbers could in fact receive nan or inf as inputs (a much more realistic assumption). In this case the test should be able to generate nan or inf, your function should raise specific exceptions that you will have to handle somewhere else in your application, and the test should not consider such specific exceptions as failures.

Here is how example.py might look:

# example.py
import math

class NaNIsNotAllowed(ValueError):
    pass

class InfIsNotAllowed(ValueError):
    pass

def add_numbers(a, b):
    if math.isnan(a) or math.isnan(b):
        raise NaNIsNotAllowed('nan is not a valid input')
    elif math.isinf(a) or math.isinf(b):
        raise InfIsNotAllowed('inf is not a valid input')
    return a + b

And here is how you write the test with hypothesis:

# test_example.py
from hypothesis import given
from hypothesis.strategies import floats
from your_python_module.example import add_numbers, NaNIsNotAllowed, InfIsNotAllowed


@given(a=floats(), b=floats())
def test_add_numbers(a, b):
    try:
        assert add_numbers(a, b) == add_numbers(b, a)
    except (NaNIsNotAllowed, InfIsNotAllowed):
        reject()


if __name__ == '__main__':
    test_add_numbers()

This test will generate some nan and inf inputs, add_numbers will raise either NaNIsNotAllowed or InfIsNotAllowed and the test will catch these exceptions and reject them as test failures (i.e. the test case will be considered a success when either NaNIsNotAllowed or InfIsNotAllowed occurs).

Can you really afford to reject nan as an input value for add_numbers? Maybe not. Let’s say your code needs to sum two samples in a time series, and one sample of the time series is missing: nan would be a perfectly valid input for add_numbers in such case.

References


Giacomo DebiddaWritten by Giacomo Debidda, Pythonista & JS lover (D3, React). You can find me on Twitter & Github