[Haskell] Haddock, QuickCheck, and Functional Design by Contract

Tue Feb 17 10:50:30 EST 2004

>  import QuickCheck

Dear developers of QuickCheck and Haddock,
Dear Haskell-Community,

((Summary: I'm trying to find notations for Design by Contract in Haskell
  which can be standardised to be used by tools like Haddock and
  QuickCheck. This will give Haskell programmers an order-of-magnitude
  gain in productivity, since it brings semantic abstraction into the
  programming language.  See [1] and my earlier posts to this list [2].))

I consider it the most important advantage of Haddock, that it
encourages the separation of module internal and external
documentation (marked with -- | or -- ^).  This is added value, even
if a module user only reads the source, not the generated
documentation.

I consider it one of the most important advantanges of QuickCheck,
that it allows the automatic verification of (formal) documentation,
namely algebraic properties of functions.  Yes, I think that that
(well-written) QuickCheck properties are excellent documentation, much
more worth than just type signatures (which I can look at in the
interpreter).

I consider it the essence of Design by Contract to have an explicit
redundancy in the code, namely Pre- and Post-Conditions, Assertions,
whose consistency with the rest of the code is automatically checked
during run-time, making it a tight net in which bugs are caught.  And
what's more, this redundant view is more abstract than the original
code, so that it makes for excellent documentation, which can never be
out of date (due to the run-time checks).

Both the principles "Only one document for code and documentation" and
"Contracts have to be run-time checked" were already articulated in
1988 [3].  A standarised notation for contracts should therefore be
part of any programming language.  The following six points summarise
a functional version of Design by Contract.  Point 4 is still unclear,
others can still change syntax.  The rest is relatively straight-
forward and we should make it a standard as soon as possible.

1. A notation for postconditions: No problem, use QuickCheck
   properties.

>  prop_sort_post xs = let xs' = sort xs
>                      in is_sorted xs' && bag xs == bag xs'

   - Haddock can automatically include this "postcondition" in the
     documentation for function @sort@ (removing the prefix "prop_"
     and the suffix "_post" from the QuickCheck-Property).

   - For QuickCheck it is already in the right format.

2. A notation for specificative functions.  That is, we document /
   specify a function by giving a more readable (but less efficient)
   version with the same semantics.

   For example, the best way to document exponentiation (besides the
   informal "@x ^ n@ is the @n at th power of @x at .") is the following
   code:

>  x ^ 0 = 1
>  x ^ n = x * x^(n-1)

   Or the standard function @scanl@ is specified as

>  scanl f e = map (foldl f e) . inits
>  -- @inits@ see appendix.

   Both of those definitions are not the actual implementation (which
   is asymptotically faster in both cases), but they are perfectly
   suited for inclusion in the documentation and as raw material for
   QuickCheck.  We can realise that by a simple notation:

>  scanl_spec f e = map (foldl f e) . inits

   Haddock would then publish the definition @scanl f e = ..@
   (i.e. with the suffix "_spec" removed) in the documentation.

   QuickCheck would generate the testing property:

   prop_scanl_spec f e =  scanl f e == scanl_spec f e

   The same can be done for functions with operator names, e.g. with
   "//" as suffix for the specification function and "\\" as prefix in
   "prop_"'s place:

   x \\^// n =  x ^ n == x ^// n

   (We could also generate names, like "roof_spec" and
   "prop_roof_spec" for use in the code.  The documentation, however,
   will only show the clear @x ^ n at .)

3. A notation to mark self-specifying functions.  These are functions
   where the implementation is at the same time the simplest (still
   formal) specification.

   For example the function @square@ from the Haddock manual:
   square x = x * x -- ||

   The double bar tells Haddock to export the complete function
   definition to the user documentation.  Another example comes from
   DData:

   (<>) = append -- || Only syntactic sugar.

   Or:
   lookup k map | member k map = Just (find k map)   -- ||
  	        | otherwise    = Nothing             -- ||
   -- | Perhaps implement more efficiently later.

   The documentation that contains those defintions does not reveal
   whether the functions are really implemented as given, or whether
   it's only a specificative functions according to point 2.  However,
   since point 2 guarantees semantic equivalence through tests, users
   don't need to care.

   For many people self-specifying code seems to be opposed to design
   by contract, however, this is not at all the case.  The modern view
   of Programming by Refinement (see, e.g. [5]) softens the frontier
   between specifications and implementations to allow practical
   compromises between readability and efficiency while at the same
   time avoiding negative redundancy between code and spec
   (i.e. simply duplicated information instead of a useful different
   view).  Incidentally most of the Prelude functions are self-
   specifying in this way.  The examples of point 2 are notable
   exceptions.

4. A notation for preconditions.  For simple functions a Precondition
   can be calculated automatically from the Patterns:

>  head (x:xs) = x

   Gives the Precondition @xs /= []@ for @head xs at .  This only needs
   some simple knowledge about @data@ and @newtype@ declarations.  (We
   could also choose to leave it out, if it's too much work, for too
   few cases where it works.  Then programmers would have to state all
   preconditions explicitly as shown next.)

   Presently I use the following coding style:

>  take n xs | not (n >= 9) = error "take.Precondition"

   The precondition can simply be extracted as <<the bracketed
   expression behind @not@ in a line containing the words @error@ and
   "Precondition">>.

   However it has the disadvantage that I can't simply disable
   run-time checking of the Precondition (as I could when using
   @assert@).  Furthermore it is also rather ugly.  Does anyone have a
   better idea?

   The use of Preconditions is expecially important:

   - Automatically checked on run-time (like @assert@).

   - Used in documentation.

   - Used by QuickCheck to create (==>) guards in the automatic
     @prop_foo_spec@ properties.

5. Default implementations of type class members are often a special
   case of specification functions: the overwriting provides the same
   semantics, more efficiently.  (Occurs often in Abstract Data
   Structures, see [2].)

   A typical example is:

>  is_empty coll = size coll == 0 -- ||

   We can use the double bar (or whatever we decide to standardise),
   to mark the function definition for export to the documentation.
   However, it is more difficult to create test properties which
   compare the two implementations of a function, since the default
   implementation is no more accessible after having been overwritten.
   The property generator will have to copy the default definition to
   a definition with name @is_empty_spec@ (for example) and can then
   generate the usual @prop_is_empty_spec@ (as in point 2).  The
   copying of the function definition happens only for testing and is
   not seen in normal code or documentation.

   (The overwriting corresponds slightly to the contract refinement in
   object-oriented inheritance.  See [3].)

   By the way, we can use -- || on a type class header, saying to
   export all it't default definitions to the documentation.  (In my
   data structure library this is always the case.  See, e.g.,
   http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/AlgebraBasic.hs )

6. Invariants are an important part of Design by Contract according to
   [3].

   In Functional Programming it has been proven practical to check
   (and at the same time: document) data structure invariants in a
   smart constructor; the invariant resides only in this place (is
   not redundantly mentioned in the code) and is run-time checked on
   each creation of a data cell.  This works with @assert@ and doesn't
   need QuickCheck properties.  Also, such internal invariants are not
   relevant for the external documentation.  (Otherwise we could
   documentation- export the definition of the smart constructor,
   using -- ||.)

   External Invariants are nothing else than algebraic properties of
   data structures.  They can be documented using normal QuickCheck-
   Properties, and these can be documentation- exported via -- ||.

   As a consequence, invariants need no special notation.  I only
   mention them her for methodological reasons.

In order to keep the standard short, I suggest only to include this
notations, the discrimination between internal and external comments
(as in Haddock) and the discrimination of identifiers in comments
(also editors can have hyperlinking and different colors.)  Further
markup concerns only the documentation generator and not other tools.

Users of QuickCheck have reported an order-of-magnitude gain in
productivity.  That's because redundancy in form of checkable
documentation comes very near to semantically automated software
engineering.  Together with proper documentation mechanisms, we can
make such rendundant programming more literate, thus bringing an
order-of-magnitude gain in productivity to whole programmer _teams_.

Finally, I think, that the topic "Functional Programming and Design by
Contract" is also a topic for a keynote lecture.  Those who are in the
relevant positions might about that.  Similar applies to the authors
of functional programming text books (and teachers, by the way).  Or,
and fans of program proving (hello LARCH and OBJ2) might want to
statically check those contracts...

so far,
   Robert

[1] myself:
    "Modularity without Abstraction is no more than Syntactic Sugar"
http://www.stud.tu-ilmenau.de/~robertw/eiffel/modularity_abstraction.txt

[2] myself: "Posts on Functional Data Structures"
    http://www.mail-archive.com/haskell@haskell.org/msg13980.html
    http://www.mail-archive.com/haskell@haskell.org/msg13979.html

[3] Bertrand Meyer: "Object-Oriented Software Construction", 1st
    edition, 1988

[4] Richard Bird: "Introduction to Functional Programming using
    Haskell", 2nd ed.

[5] Eric Hehner: "A Practical Theory for Programming", 2nd ed., 2004
    http://www.cs.toronto.edu/~hehner/aPToP/

>  -- | List of initial segments (including []) of a list
>  --   in increasing order of size.
>  inits []     = [[]]
>  inits (x:xs) = [] :
>                 map (x:) (inits xs)
>
>  prop_inits_initial xs = forAll (elements $ inits xs) (\ ys ->
>                          ys == take (length ys) xs )
>
>  prop_inits_increasing_length xs
>     = striclty_increasing $ map length $ inits xs
>  -- Also checks, that no segment occurs twice.