[Haskell] Haddock, QuickCheck, and Functional Design by Contract
Robert Will
robertw at stud.tu-ilmenau.de
Tue Feb 17 10:50:30 EST 2004
> import QuickCheck
Dear developers of QuickCheck and Haddock,
Dear Haskell-Community,
((Summary: I'm trying to find notations for Design by Contract in Haskell
which can be standardised to be used by tools like Haddock and
QuickCheck. This will give Haskell programmers an order-of-magnitude
gain in productivity, since it brings semantic abstraction into the
programming language. See [1] and my earlier posts to this list [2].))
I consider it the most important advantage of Haddock, that it
encourages the separation of module internal and external
documentation (marked with -- | or -- ^). This is added value, even
if a module user only reads the source, not the generated
documentation.
I consider it one of the most important advantanges of QuickCheck,
that it allows the automatic verification of (formal) documentation,
namely algebraic properties of functions. Yes, I think that that
(well-written) QuickCheck properties are excellent documentation, much
more worth than just type signatures (which I can look at in the
interpreter).
I consider it the essence of Design by Contract to have an explicit
redundancy in the code, namely Pre- and Post-Conditions, Assertions,
whose consistency with the rest of the code is automatically checked
during run-time, making it a tight net in which bugs are caught. And
what's more, this redundant view is more abstract than the original
code, so that it makes for excellent documentation, which can never be
out of date (due to the run-time checks).
Both the principles "Only one document for code and documentation" and
"Contracts have to be run-time checked" were already articulated in
1988 [3]. A standarised notation for contracts should therefore be
part of any programming language. The following six points summarise
a functional version of Design by Contract. Point 4 is still unclear,
others can still change syntax. The rest is relatively straight-
forward and we should make it a standard as soon as possible.
1. A notation for postconditions: No problem, use QuickCheck
properties.
> prop_sort_post xs = let xs' = sort xs
> in is_sorted xs' && bag xs == bag xs'
- Haddock can automatically include this "postcondition" in the
documentation for function @sort@ (removing the prefix "prop_"
and the suffix "_post" from the QuickCheck-Property).
- For QuickCheck it is already in the right format.
2. A notation for specificative functions. That is, we document /
specify a function by giving a more readable (but less efficient)
version with the same semantics.
For example, the best way to document exponentiation (besides the
informal "@x ^ n@ is the @n at th power of @x at .") is the following
code:
> x ^ 0 = 1
> x ^ n = x * x^(n-1)
Or the standard function @scanl@ is specified as
> scanl f e = map (foldl f e) . inits
> -- @inits@ see appendix.
Both of those definitions are not the actual implementation (which
is asymptotically faster in both cases), but they are perfectly
suited for inclusion in the documentation and as raw material for
QuickCheck. We can realise that by a simple notation:
> scanl_spec f e = map (foldl f e) . inits
Haddock would then publish the definition @scanl f e = ..@
(i.e. with the suffix "_spec" removed) in the documentation.
QuickCheck would generate the testing property:
prop_scanl_spec f e = scanl f e == scanl_spec f e
The same can be done for functions with operator names, e.g. with
"//" as suffix for the specification function and "\\" as prefix in
"prop_"'s place:
x \\^// n = x ^ n == x ^// n
(We could also generate names, like "roof_spec" and
"prop_roof_spec" for use in the code. The documentation, however,
will only show the clear @x ^ n at .)
3. A notation to mark self-specifying functions. These are functions
where the implementation is at the same time the simplest (still
formal) specification.
For example the function @square@ from the Haddock manual:
square x = x * x -- ||
The double bar tells Haddock to export the complete function
definition to the user documentation. Another example comes from
DData:
(<>) = append -- || Only syntactic sugar.
Or:
lookup k map | member k map = Just (find k map) -- ||
| otherwise = Nothing -- ||
-- | Perhaps implement more efficiently later.
The documentation that contains those defintions does not reveal
whether the functions are really implemented as given, or whether
it's only a specificative functions according to point 2. However,
since point 2 guarantees semantic equivalence through tests, users
don't need to care.
For many people self-specifying code seems to be opposed to design
by contract, however, this is not at all the case. The modern view
of Programming by Refinement (see, e.g. [5]) softens the frontier
between specifications and implementations to allow practical
compromises between readability and efficiency while at the same
time avoiding negative redundancy between code and spec
(i.e. simply duplicated information instead of a useful different
view). Incidentally most of the Prelude functions are self-
specifying in this way. The examples of point 2 are notable
exceptions.
4. A notation for preconditions. For simple functions a Precondition
can be calculated automatically from the Patterns:
> head (x:xs) = x
Gives the Precondition @xs /= []@ for @head xs at . This only needs
some simple knowledge about @data@ and @newtype@ declarations. (We
could also choose to leave it out, if it's too much work, for too
few cases where it works. Then programmers would have to state all
preconditions explicitly as shown next.)
Presently I use the following coding style:
> take n xs | not (n >= 9) = error "take.Precondition"
The precondition can simply be extracted as <<the bracketed
expression behind @not@ in a line containing the words @error@ and
"Precondition">>.
However it has the disadvantage that I can't simply disable
run-time checking of the Precondition (as I could when using
@assert@). Furthermore it is also rather ugly. Does anyone have a
better idea?
The use of Preconditions is expecially important:
- Automatically checked on run-time (like @assert@).
- Used in documentation.
- Used by QuickCheck to create (==>) guards in the automatic
@prop_foo_spec@ properties.
5. Default implementations of type class members are often a special
case of specification functions: the overwriting provides the same
semantics, more efficiently. (Occurs often in Abstract Data
Structures, see [2].)
A typical example is:
> is_empty coll = size coll == 0 -- ||
We can use the double bar (or whatever we decide to standardise),
to mark the function definition for export to the documentation.
However, it is more difficult to create test properties which
compare the two implementations of a function, since the default
implementation is no more accessible after having been overwritten.
The property generator will have to copy the default definition to
a definition with name @is_empty_spec@ (for example) and can then
generate the usual @prop_is_empty_spec@ (as in point 2). The
copying of the function definition happens only for testing and is
not seen in normal code or documentation.
(The overwriting corresponds slightly to the contract refinement in
object-oriented inheritance. See [3].)
By the way, we can use -- || on a type class header, saying to
export all it't default definitions to the documentation. (In my
data structure library this is always the case. See, e.g.,
http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/AlgebraBasic.hs )
6. Invariants are an important part of Design by Contract according to
[3].
In Functional Programming it has been proven practical to check
(and at the same time: document) data structure invariants in a
smart constructor; the invariant resides only in this place (is
not redundantly mentioned in the code) and is run-time checked on
each creation of a data cell. This works with @assert@ and doesn't
need QuickCheck properties. Also, such internal invariants are not
relevant for the external documentation. (Otherwise we could
documentation- export the definition of the smart constructor,
using -- ||.)
External Invariants are nothing else than algebraic properties of
data structures. They can be documented using normal QuickCheck-
Properties, and these can be documentation- exported via -- ||.
As a consequence, invariants need no special notation. I only
mention them her for methodological reasons.
In order to keep the standard short, I suggest only to include this
notations, the discrimination between internal and external comments
(as in Haddock) and the discrimination of identifiers in comments
(also editors can have hyperlinking and different colors.) Further
markup concerns only the documentation generator and not other tools.
Users of QuickCheck have reported an order-of-magnitude gain in
productivity. That's because redundancy in form of checkable
documentation comes very near to semantically automated software
engineering. Together with proper documentation mechanisms, we can
make such rendundant programming more literate, thus bringing an
order-of-magnitude gain in productivity to whole programmer _teams_.
Finally, I think, that the topic "Functional Programming and Design by
Contract" is also a topic for a keynote lecture. Those who are in the
relevant positions might about that. Similar applies to the authors
of functional programming text books (and teachers, by the way). Or,
and fans of program proving (hello LARCH and OBJ2) might want to
statically check those contracts...
so far,
Robert
[1] myself:
"Modularity without Abstraction is no more than Syntactic Sugar"
http://www.stud.tu-ilmenau.de/~robertw/eiffel/modularity_abstraction.txt
[2] myself: "Posts on Functional Data Structures"
http://www.mail-archive.com/haskell@haskell.org/msg13980.html
http://www.mail-archive.com/haskell@haskell.org/msg13979.html
[3] Bertrand Meyer: "Object-Oriented Software Construction", 1st
edition, 1988
[4] Richard Bird: "Introduction to Functional Programming using
Haskell", 2nd ed.
[5] Eric Hehner: "A Practical Theory for Programming", 2nd ed., 2004
http://www.cs.toronto.edu/~hehner/aPToP/
> -- | List of initial segments (including []) of a list
> -- in increasing order of size.
> inits [] = [[]]
> inits (x:xs) = [] :
> map (x:) (inits xs)
>
> prop_inits_initial xs = forAll (elements $ inits xs) (\ ys ->
> ys == take (length ys) xs )
>
> prop_inits_increasing_length xs
> = striclty_increasing $ map length $ inits xs
> -- Also checks, that no segment occurs twice.
More information about the Haskell
mailing list