SICP 2.4.1 and Unit Testing
April 26, 2015 11:00
There are no exercises for this section, but that’s no reason not to examine the code. Even with it already written and functioning, it’s an opportune moment to talk more about testing. This section happens to be a great example to use. It has the same interface with different underlying implementations, meaning that the same external tests can be used whatever the implementation is.
Up until now we have often tested for correct answers by defining
check functions. For instance, we had
check-is-element when we wanted to test if something was an element of a set. These check functions would give us a true or false result, although in a few the wrong answer would be returned to help figure out where the error lies (as with
check-equal? in 2.3.4 ). Much of the time these functions ended up looking pretty similar to each other, even if they were short to write.
It should come as no surprise that dedicated frameworks have been written to help us with testing. These not only provide us with pre-written functions, but they make it easier to run a whole group at once, and also do a better job of handling errors and detailing test failure.
The one I’ll be using and discussing for this section is RackUnit since at the core my environment is Racket. The expressions will mostly look like Scheme, so it should be fairly easy to comprehend. Most testing frameworks share the same capabilities as RackUnit.
In RackUnit, a check is a special function that tests whether some condition is true. If the condition holds, the test passes and nothing happens. But if it does not hold, a failure is signaled which leads to a report of information on the test (including, in many circumstances, what the expected and actual values were). The program can then continue, typically to other tests. One point where the testing framework differs from our old check functions is that it can also handle errors without exiting. In other words, if any error occurs (whether directly related to the test or not), the program can just stop testing that part and continue. This feature makes it possible to run through many tests at once even with very buggy or incomplete code.
Since multiple tests can be run despite errors, tests related to the same feature of the program this is where the term ‘unit’ comes in are usually organized so that they can be run as a set. RackUnit has the test-case, which is some amount of code that includes a check in it. These test cases can then be grouped into a test-suite, which is just a list of test-cases.
With our tests defined and grouped, we then need some way to run them and report the results, and so we will have a testrunner. Separating the test definition from its execution allows us to do things like modify the information reported from the tests, or to change our procedures and repeat the tests on only the portion we changed. For this section, I’m just using RackUnit’s text UI, which must be loaded separately (refer to the first couple lines of the test file). This testrunner will let us know in the interpreter window how many tests passed and how many failed or had an error, along with the information generated for each test that did not pass. There is also a GUI testrunner, which you’re free to experiment with — it can be used without needing to modify the test definitions. Refer to the RackUnit documentation linked above.
With the test system discussion out of the way, we can talk about how it applies to the actual topic of this section. There are two implementations of the complex number system. They both use the same interface. This means that, as long as we stick to that interface, we can use the exact same tests to see if they are working properly.
The first thing we do is define our own custom check function. We’d like an easy way to compare complex numbers to test the results. One issue we face is that our results are often the result of calculations with floating point numbers, more on floating-point math which are rarely exact. Normally RackUnit has this part covered, with
check-=. It takes the expected value, actual value, and then some delta which is the largest allowed difference between them. Unfortunately our complex numbers can’t be compared directly with the
= operator, so we must write a custom check that will compare them by parts. Custom checks are pretty easy to create for RackUnit; all you really need is a predicate function and a name for the check.
Here’s the code that defines our equality predicate, and the custom check:
Here we’ve defined a
complex-= predicate that compares the real and imaginary parts separately to see if they are in some predefined range (marked as
complex-delta). Then we just create the check with
define-binary-check. Calling this a ‘binary check’ means its operates on two values. We add two more arguments for the actual and expected value, so the test report can include that information in the case of a failure. While it won’t exactly be a serious problem, it’s advisable to ensure the order of the actual and expected arguments are correct for those moments when it does matter. Different test frameworks adopt different conventions, and I myself am always getting them mixed up.
There are two test suites in the test file. The first suite tests the interface to our complex numbers, using the various defined accessors and constructors. The second one tests the actual math operations to ensure that they, too, are functioning properly. This is in some ways related to ‘unit’ testing, in which each part of the system might be tested separately, and then the components may be tested at a higher level when they work together (this distinction will be easier to see in future sections).
Something that I also did is put several check functions into one test case. This was deliberate, to allow for some experimentation. It’s considered good practice to just have one check/test per test case, in order to have a better idea of precisely what the problem is. Consider what happens when only some of the checks within a case fail or pass (experiment by changing some of the expected values to be wrong).
Something we may discover in the tests is that some of our calculation tests (involving 0) end up failing for one of the given implementations. (This depends on the built-in math functions, so it is Scheme implementation-dependant and might not have occurred for you). It would be a mistake, however, to think that just because the tests reveal an apparent flaw in a particular approach, that approach is the only one with the problem. This particular issue affects both styles of complex numbers, just in different ways; only one of them was detected by the tests. Sometimes something like this might never cause trouble, but there’s a good chance that future changes would reveal it. With no other exercises to do, consider it as an extra exercise: see what sort of tests would reveal the problem for Alyssa’s implementation, and how to fix it in both cases.
One last note: The tests are contained within a separate file from the definitions. This is not only so that they can be used in more than one section, but to keep more of the code within the exercise files as pure Scheme. It also allows for using tests tailored to other interpreters, although for this section only Racket/Pretty Big is supported. The file containing the tests is placed in a new directory, the ‘library’, which will contain files that will be used in more than one set of exercises. Be sure that this directory structure exists, or alternately change the
load command at the start of any files that reference it.