SICP 3.3.5 Exercises
January 19, 2020 10:55
Even though this constraint system is only used for this section, it is still a complex system, and testing our components requires us to make a fairly large sets of checks. Moreover, there will be frequent points during the tests that we’ll need to reset or somehow clear the system, so that we can test with a new set of values, and not have to deal with prior tests having an effect on those conducted after them. As we head deeper into the book, and especially once we get to Chapter 4, we will want a more robust approach to testing than some of the methods we’ve been using.
In this section, the tests will make use of an already-written test suite, namely the one created for SRFI 64 . What is SRFI 64? It’s a SRFI (Scheme Request for Implementation) for a testing API, so that tests can be written and run on any implementation of the SRFI, and thus on many implementations of Scheme. The SRFIs are not quite standards, but serve nearly the same purpose by providing an agreed-upon set of procedures for working with various programming aspects that are not explictly part of the Scheme language itself. They are typically designed to work in any implementation of Scheme. While I don’t plan on using SRFI 64-compatible test suites for future tests, it’s a good way to see the kind of features that can be available in them. It was chosen because it is the one most likely to have an implementation for a broad range of interpreters.
It should be clarified that the SRFI itself is not the direct source of the test procedures used. It only provides the format and how they should function. However, each SRFI also includes a sample implementation. That means that in theory there’s a good chance the reference implementation will work for multiple implementations of Scheme.
However, there is one slight problem with that. SRFI-64 was proposed quite a while ago, and thus the file originally provided with it is unlikely to work now, even if it names the interpreter you are using. Nonetheless, it is still probable that you can find one that works with your implementation. Racket, for example, has SRFI implementations available for inclusion (and these will work with its R5RS, so we can stick with that for now as well). In Chicken Scheme, there is an ‘egg’ for SRFI-64 (see the exercise file for how to load it). I have also found this project for Chibi Scheme specifically, and this one that tries to give SRFI implementations for R7RS. I have not done any work to get those working, although I did include notes in the exercises file for the Racket and Chicken methods.
I could not find any implementation of SRFI 64 for MIT Scheme that would work for me. What I did find is a somewhat similar testing system, called Test Manager. While the test sequence mostly looks identical to the SRFI-64 tests, there are enough differences in syntax to make this require a separate test definition list, including a slight difference in the way that tests are defined and invoked. Another exercise file is included for MIT Scheme using this system. You will need to download the test-manager project yourself so that the exercise file can load it. See the file for further instructions if you need help with that.
Here are some examples of the tests I’ve written for SRFI 64, so that you can add on to them or modify if you want. If you wish, you can skip this section and just go down to the next section, which describes the actual tests on the constraint system. What follows here is provided to explain how automated testing commonly works. There will also be a short explanation of how the MIT Scheme tests are different, which might be useful even if you are not using MIT Scheme itself.
I should also clarify that this isn’t the only or even the best way to specify tests, but it is one that will serve our purposes. Here are the first few tests used for the temperature converter:
Each set of tests here is defined as a procedure of no arguments, to make them a bit easier to re-run when we make changes to the system. The parts that actually signal tests are the
test-group labels. We want at least one test group, so that we can run the different types of tests in our file (temperature converter tests, squarer tests, etc) separately. The additional test groups simply provide more labels for the test report. At the heart of the test we use a
test-group-with-cleanup. This sets up another test group, with the difference that the last statement in the sequence is a ‘clean-up’ procedure and will always be run, even if there’s an error during testing. With the constraint system, we want there to be no constraints already set when we’re done with a test, so we make our clean-up routine clear out all the constraints. Technically there is no need for the
begin statement wrapping the testing sequence, but in my opinion it clarifies the clean-up routine by setting it apart.
We can see that within the test groups we can execute regular procedures along with test case statements. The test case statements will run one test each. The
test-assert test case statement is provided by SRFI 64, and
test-connector-= is actually not a test case, but a procedure that I wrote that has its own test case statement. This
test-connector-= procedure can handle the case when the connector has no value, which is considered an instant failure. As shown here, any test statement can also have its own optional label, again to make the report easier to read.
When a test case statement is encountered, what happens next is handled by something called a ‘test runner’. The test runner is given the results of the test and in theory it can do whatever it wants with them. The default test runner (which is what we’ll use) should print a summary of the tests it ran and then output more detailed results to a log file. Custom test runners could also be written, but the default will work fine for our needs. There is one more detail regarding the test runner, though; in some implementations the same tests cannot be run through this default test runner multiple times. I made a separate procedure
run-tests that creates a new test runner every time to deal with this issue. That procedure also takes care of disabling probe reporting, if any are active, so that running tests does not produce a large amount of text from any probes. However, it might be useful to have the probes enabled while debugging, so feel free to comment out the line that quiets the probes if you want to run the tests with them in.
Getting back to test reporting, our test results will only be of the kind “pass” or “fail”. Any error or exception during testing is likely reported as a failure (unless you use a test case specifically checking for an exception). The test-runner will execute all tests within a group, and then give the summary statement, with more detailed information when failures occurred. While there is no specific requirement of how the test runner reports results, the default one should include these details. Here’s an example of the output of tests using R5RS in DrRacket that included some failures (the full file path has been trimmed down here):
As the tests are run, failures are normally reported immediately. When the test group is finished there is the final summary, telling how many tests passed and how many failed (the phrasing ‘expected’/‘unexpected’ is a quirk of SRFI 64, which we don’t need to worry about now). When the tests fail, we are told the line number in the file of the test case statement. Here we see one slight failing of using the
test-connector-= approach, as all errors point to the same place in the source file.
To get more information about which tests are actually failing, we would then look at the log file (helpfully indicated by the test runner above). This is again a feature of the default test runner, not just the test system itself. The log file gives a report for every test that is run, and what that result was. As each test group starts (or ends), the name of the group should be marked in the file. The actual format of this log file (or even the test summary) might vary slightly between implementations, since there is no requirement on how the test runner presents its results.
All of this may seem a little much to deal with, but for most of the exercises, you can get by with simply letting the tests run and figure out where they are failing.
The MIT Scheme tests look basically the same, although the labels used is different for the tests. As an example, here’s the version of
test-connector (compare to the one above):
The difference is that
assert starts off each test case instead of
test, and the order of the arguments is different as well. If you look in the file, you will also see the use of
check in most of the tests, which functions much the same way as the
assert method does.
The other big change is that the test groups are not defined as their own procedure, but as a ‘top-level’ test group with a label. That label is then used to call the tests (although the way it has been defined, running the tests looks identical to the SRFI-64 method I used). The difference here is that the procedures do not now have internally defined connectors and blocks, but they are globally defined.
That works without issue up until we get to the tests for the squarer. One of the main reasons for setting up tests like this is that we can re-use them as we fix problems with the procedure, and we don’t have to create a new set of tests each time. But if the squarer used in the tests is globally defined, that causes a problem if we want to change the squarer itself. Our particular system has no way of destroying blocks. We wouldn’t want to have our connectors hooked up to both old, likely non-working blocks while also connected to new blocks. It would also be cumbersome to redefine the global test connectors every single time we want to run the tests (though we do end up doing this for the Celsius/Fahrenheit converter).
Fortunately, we have a way that handles that re-definition more cleanly. The test-manager system as a
define-group-setup procedure that will create a ‘setup’ that is run once when the group is run (while it’s not done here, one reason for having test groups is so that they can be combined and re-used in other groups). That ensures that a newly defined connector and block is assigned when the test is run later, and indeed allows the tests to be run even if the procedures are modified directly via the REPL.
Of note, there is also a ‘teardown’ procedure that is defined within several of the groups. This performs the same function as
test-group-with-cleanup, with the difference being that it will be run for every test sequence within the group, and we don’t need to explicitly call it.
The Constraint System Tests
This set of exercises has a preliminary set of tests to run (on the temperature converter) that are not connected to any exercise. Since the constraint system is being used along with a new testing system, I think it is beneficial to have a working example to start from. This also has the benefit of being a monitor on the system itself. Since these tests should always pass, we’ll know that nothing was inadvertently modified that breaks either the testing or constraint system in the course of working through the exercises. This also allows a means to experiment with other tests and get used to SRFI-64. Also, the set of tests used for the temperature converter will end up being reused in Exercise 3.37, so it’s not a totally unconnected set of tests. Since the means of constructing the temperature converter itself varies, this set of tests relies on externally (i.e. globally) defined variables, while the other tests create their own connectors when the tests are run.
The exercise file also includes a ‘blank’ implementation of the
averager for Exercise 3.33. With this blank implementation that does nothing, the tests can be run and several of them will actually pass! This is unsurprising, however. Several of the tests check to see that signals are not being set improperly. Thus, an implementation that does nothing will indeed avoid setting any improper signals, and therefore pass the tests. We do want to run all of these tests, to ensure that whenever we actually make something that produces the correct output, it’s not also giving us unexpected behavior along with it.
Finally, the squarer tests included to work with Exercise 3.34 and 3.35 demonstrate a faulty implementation followed by a correct one. Here, the tests are blank and should be filled in (use the other tests as a model if needed). Building a set of tests this way, to show the expected behavior of a squarer procedure, is partly a solution to Exercise 3.34 itself. I leave at as part of the exercise to figure out what tests are needed to show the faults of the first version, which is to say, write some tests that fail. Then, once the fixed version is written for Exercise 3.35, you can use the same test set to prove that the new version does not have the same problems. But don’t forget to check additional behavior, which might well be passing in both versions but still should be verified.