May 3, 2020 12:06
This section is fairly lengthy, yet it still kind of rushes through one of the more important topics in computer science. One of the ways SICP really shows its age is that it does not cover concurrency in as much depth as the topic deserves, especially in consideration of how modern processors work. It represents only a tiny introduction to the topic of designing software with concurrency in mind.
In order to even implement any of the procedures for this set of exercises, you will need a Scheme interpreter that supports concurrency, and not all of them do. That said, only a few exercises rely on actual implementations, and it’s therefore possible to get something out of this section even without testing on the computer. In truth, I feel like this section is almost one that it might not be worth it to put too much effort into running it in an interpreter, as there are likely better ways to learn about concurrency that will include more modern methods. Nevertheless, I did run it in mine, and will detail how I went about it if you want to follow my example.
Even with concurrency available in the interpreter, there is still the matter of testing reliably. Verifying that concurrent procedures are actually working as intended is no easy matter. So this post is as much about what’s needed for the interpreter as it is about how the testing can be done.
For myself, I am using Dr Racket and reverting for this section to Pretty Big, the legacy language that lets in many standard Racket procedures while hewing closer to the formatting of Scheme (and also allows for redefinition of procedures). The Racket routines used for concurrency are not exactly the same as in the text, so I defined some that mostly fit the procedures as described in the text. I have also written a similar set of routines that will work for MIT/GNU Scheme.
If your Scheme supports some means of atomic operations (as in
test-and-set!) and parallel execution in some form, it should be possible to use the testing approach, although some tweaking may be required. I will note that SRFI-18 is one that supports the mutex construct, as well as ‘threads’ (which can be used to accomplish parallel execution). It does not include atomic procedures like
In order to do meaningful tests of concurrency in these exercises, we want to somehow force the concurrent processes to vary in the time they take, so that we can see what sort of overlap can occur in real-time operation. Most of these procedures are quite simple and would probably execute too quickly to ever run into concurrency issues. To easily extend the time required, we need some way of telling the thread (which is where the process is taking place) to
sleep, usually some random amount of time, so that intermixing of the concurrent procedures is more likely. If your interpreter doesn’t have this built-in you can probably create a time-wasting function using some complex calculation — after all, we do not need precise times, just something that’s variable and takes enough time to make a difference.
It is also necessary for the interpreter to be able to monitor the concurrent processes for testing, since it’s possible that an incorrect solution might get stuck and never finish its task. You will likely need to get used to how to stop ongoing concurrent processes in your interpreter, as often while debugging you might need to end the procedure externally (usually called ‘killing’ it). Even then, it might be necessary to reset the interpreter entirely, to clear out the problems that can result from processes staying in memory or variables that can’t be accessed getting stuck in some state. It may save some headaches to simply start the interpreter fresh with every run, since the exercises and tests are not all designed to be run through more than once.
All of the routines to support concurrency are wrapped up in a separate file. There’s one for Racket, and another for MIT/GNU Scheme. Please note that running this under MIT Scheme needs a ‘redefine’ file that must only be loaded once (it’s there to allow procedures like
random to work as they do in Racket/Pretty Big). If you are working with another variety of Scheme that does support concurrency, you’ll need to implement your own version of these files to handle parallel execution and the other procedures.
Additionally, at the top of the exercise file for this section is a procedure that’s not part of an exercise, but only there to ensure the set-up of concurrency. It’s called
demo-interleave and will demonstrate interleaving routines — similar to what was done in the previous exercise set, but with actual repeated (random) trials. It should verify that concurrency is working as intended.
There was one other Racket-specific change that might affect debugging. Since Pretty Big does not by default allow list mutation (
set-cdr! and the like), I had to specially define a number of the list routines so that they can be still be used transparently. This means that when working with lists, procedures will be jumping into these defined routines instead of using the built-in or primitive ones. That can occasionally result in unclear or confusing error messages.
Once we have the ability to run (and test) concurrent routines, we need to look at how the testing is actually done. Although the number of exercises is many, the only ones that require implementation are 3.47 and 3.48. That said, testing semaphores (in 3.47) requires a fairly complicated testing apparatus, so I’ll explain it here in some detail.
A semaphore is intended to work by allowing a set number of concurrent accessors to its resource. To test it, we want another way to keep track of the number of concurrent procedures accessing that resource. I used a fixed-length buffer that removes items only after they are read; the semaphore would then control access to that buffer. It is much like a queue, but multiple procedures can write to it, making the output order unpredictable.
In some ways, this could be compared to the operation of a restaurant kitchen. Customers could come in with orders at any time, and multiple orders can be processed at once, but the kitchen staff is limited in the total number of items that it can handle at any one time. Once a dish is prepared, it leaves the kitchen. Of course in real life, there’d be a separate ‘waiting queue’ when too many orders come in at once, but we aren’t modeling that here; only the semaphore is what controls what’s allowed to go into this buffer.
The semaphore should not allow the buffer to have more items in it than the allowed limit, so it needs to be informed of both reads and writes. The listing below shows how the buffer’s read and write routines are implemented. You can see that the buffer has its own mutex; it’s called
bufman (for ‘buffer manager’). It controls access to the actual storage of the buffer (a list), while the
limiter is the external controller, presumably the semaphore. This separation makes sure that while the buffer’s storage can only ever be accessed sequentially, the semaphore takes its own approach to sharing the buffer as a whole.
When a write occurs, we acquire the limiter. The buffer manager then does its own acquiring and releasing in order to access the buffer storage. The limiter is not released until a read occurs, which could happen at any later point.
For testing purposes, the buffer also keeps track of how many items have been put into it. If too many items are written in, an overflow flag is set. So if the limiter (i.e. semaphore) isn’t limiting correctly, then we will know by checking this flag during or after the test.
Since all of these operations are being done at unknown points in time, we need to have another concurrent monitoring process. The procedure
test-control is used for this; it is given the number of trials to run, and a procedure that it will continually run to check if a test is finished. If that procedure returns true, then the trial is considered complete. Another routine is run once a trial is finished (to record or report results), and another one can run when all testing is done, in case values need to be reset or there is ‘clean-up’ to do after the tests have completed. Slightly unusual operations are also possible by using the ‘check’ procedure to do more than just check; see the
test-control-read-write for an example.
One important aspect of
test-control is that it has a short pause (implemented using the
sleep function) between each tests. Without that in place, the continual attempts to check on the tests could actually slow the other threads down and even prevent them from running. This is something to keep in mind when working on your own routines — it may be necessary to insert a short break between attempts at acquiring a mutex, or else other attempts might be locked out from it entirely. (Fixing this sort of thing is doable, but well beyond the scope of this introductory section).
There are two tests provided to exercise the limiter and buffer interaction. One waits for the buffer to fill for each trial. This should show that the semaphore is allowing the proper number of writes but not allowing overflow. The second test is a more ‘real-world’ test using concurrent reads and writes, to ensure that the release and acquire process does not lead to problems. Be aware that with the procedures as set up, the random delays are significantly long, and it could take several seconds to output any results.
Exercise 3.48, fortunately, does not require as elaborate a testing process. We can test for deadlock by running
wait-with-timeout on the parallel exchange sequence, since when deadlock is avoided the timeout should not occur. The serializer is additionally modified here to make the deadlock situation more likely, by adding a fixed time delay in the acquire step.
You should ensure that deadlock is actually occurring in your interpreter by running the unmodified serialized-exchange and allowing it to time out, before implementing the deadlock avoidance. It can be somewhat tricky to detect this, since deadlock is not guaranteed to occur. Even once we have a revised version written, it’s not necessarily the case that it is avoiding deadlock, as we cannot easily tell when such a situation might be possible.
On other note is that any deadlock avoidance procedure cannot be run on the same set of accounts as the unmodified one, since once deadlock has happened the mutex cells are already set and likely cannot be cleared. If you do run both versions in the same file, make sure the accounts used are distinct.