SICP
SICP 2.5 and testing
June 28, 2015 10:25
This is an extra post in which I’ll discuss some more advanced testing procedures, which will be used in the next section. It requires one special type of function not handled in the book, but otherwise only uses concepts we’ve already covered. While this testing system is not as full-featured or as elegant as something like rackunit, it is a definite improvement over a simple list of check functions. Understanding how it works is entirely optional; the tests can be applied without much explanation merely by having the files present, subject to some modification depending on your Scheme implementation.
The first part of the test-functions file is a set of check
functions. These work the same as those used in older exercises. The only difference is that the underlying procedures check for failure instead of a true state. These check functions take the opposite of that result, and thus work by ensuring that the false outcome does not hold. The reason for this will be clearer in a moment, but for now let’s compare a few of the failure and check functions.
These three functions check similar operations, and consequently they have a similar pattern to them. There is an if
statement that uses the particular function for the check, and works with an ‘expected’ and ‘observed’ argument. They do all vary in the way results are reported. This is deliberate; you can decide what the merits and failings of each style is. (Note that the check functions don’t use this report, they only return a true or false; the reports will be used elsewhere).
Using true-false check functions is the same thing we’ve done before. They determine whether we have the correct answer or the wrong one. However, very often the problem in our code causes an error to occur, stopping execution immediately. That means it’s only possible to look at one problem at a time, and each issue must be fixed in sequence before continuing. That can make it tougher to figure out exactly what is going wrong, especially in a more complex system. To get around that problem, we need a different sort of function. This new type of function I’ve named using test
to distinguish it from the check
functions.
This is a test function for equality. The first line assigns testname
from nameargs
, or uses a default name of test-equal
if nameargs
is empty. This allows the test name to be optional. We then use exec-test
to actually perform the test. The second argument, the expected
value, is passed via a list, and we use the same failure-checking function for equal?
that check-equal
had.
To really understand the system, we need to know what that call to exec-test
does. Moving on to that procedure, we see this :
There’s a special form here. It’s called with-handler
, and it takes a handler function and an expression. What this does is execute a given expression within a sort of bubble. Inside this bubble, errors do not lead directly to the program aborting. Program control is instead passed to that handler function when an error occurs. Once the handler is done, the with-handler
block can exit normally and the program can proceed instead of exiting.
This is generally known as exception handling (or ‘error handling’) and is a feature of many programming languages. When an error occurs, the normal flow of the program is skipped in some way. A special response is created, usually containing information about the error. This allows either the interpreter or some programmer-provided mechanism to decide what to do about the problem. All the errors you’ve encountered in Scheme are in fact exceptions, being handled with the ‘default’ handler that will just end execution entirely, after reporting where it stopped. While Scheme at the time SICP was written didn’t really have a formal specification for exceptions, most variations on the language have had something like this with-handler
procedure (there’s a slight tweak for each implementation in the files). Without getting too far into the implementation details, we can go through the procedures as they’re used here as a demonstration.
We’ll start with how with-handler
works. The first argument to with-handler
is the handler, which needs to be a procedure to identify the type of exception that occurred and what to do with it. We have defined our handler to simply be exc-display
, and that is what gets executed once an exception occurs inside our test block and we have something to handle. In our case we want to report the error and then continue from after the failed test. The function exception-message
lets us get the information associated with the exception. That means the ‘exception’ is some sort of data structure that can give us a message about what happened, using this procedure as an interface. That information is then something we can use with display
(in general, it will be a string).
With our handlers in place, we can get on with how to execute a test so it can be handled specially when errors occur. This is done by assigning to failure
the result when we apply the test procedure using the arguments given. There’s also something important that is done with the ‘expression under test’ as it is passed to apply
: it is executed as a procedure. Looking back at our test functions, we see that this is what ‘observed’ was, and therefore we know it must be a procedure. The reason for doing this is so that the observed value is only executed within the with-handlers
block. If it were simply passed as an argument, the expression would be evaluated as an argument, prior to entering the bubble. We would not be able to use our own handler for it and go on to the next test, and the error would instead be handled by whatever was in place at the higher level. (You may note here that exception handlers can conceivably be nested.)
This special treatment to ensure execution inside the exception-handling bubble is only used on the ‘observed’ expression. That does make the observed
argument unique in the tests. While this was done here merely as a matter of convenience, there could be some value in treating the tests in this fashion. It would enforce the condition that all computations that might result in an error are confined to the ‘observed’ section, not the ‘expected’ answer. However, it also makes testing slightly less flexible, as there are situations where it’s more natural and preferable to use computed values from the system under test for the expected results as well.
Whatever test-predicate
is, it is supposed to return false if nothing went wrong, and may return anything else if failure occurs. This way, newly-written test functions can report failure in any format desired. Success is merely indicated with a ‘pass’ message. It is a convention in testing that success should be quiet, and report nothing or nearly nothing, since it’s only the failures that require attention. Tests typically are run repeatedly (and continuously if possible) and generating a lot of noisy ‘success’ messages can make it harder to find the parts that actually need fixing.
Exception handling also allows us to add another type of test: one to ensure that an expected error actually does occur. This can be quite useful, as there are exercises that require an error to happen on certain improper input.
Testing Example
To see how this works in action, here are some examples from the tests used for the ‘Generic Arithmetic’ system:
We see that each test requires its first argument to be a procedure, and this is accomplished using a lambda expression with no arguments. (A similar approach was used when measuring execution time in Racket). The first two tests also provide the optional name, which is only displayed if a failure occurs. Note that if errors occur we cannot display the test name, since that isn’t provided as part of the exception data.
The second test shown here highlights the potential for problems when only one ‘observed’ value is allowed. If an error occurs when evaluating the ‘expected’ result of (mul n2 n1)
, the normal program flow will still be halted. One possible way around that is to use something like test-true?
and put all computation inside the lambda ‘bubble’, similar to the way the final test shown here uses equ?
inside a test-false
statement.
Which format to use may also depend on the purpose of the test. What is important to test when checking for commutativity? Only that the two expressions yield identical results, not that either is the ‘correct’ value. Since neither is really being tested more than the other, using ‘observed’ and ‘expected’ in this manner is arguably inaccurate. On the other hand, adding a test-true
wrapper is adding extra words to the expression and perhaps obfuscating the purpose of the test a bit. I prefer the more concise expression here, but feel free to modify the tests if you disagree. Note that in the future, we’ll also find a way around the need for these lambda ‘wrappers’ Special forms will be used and avoid the issue altogether.
The first file given below is just the test function framework. The second one contains tests used for the next set of exercises. Note that the ‘test-functions’ definition file will need modification, depending on your implementation (see the comments for what to do). It will also require the appropriate exception handler file.