Bug in library report

25 Jul 2002 19:54:47 +0100

Alastair Reid:
> I think we'd want a modified version of quickcheck which generated a
> file of results which were then checked by an external tool.  The
> problem being that there's a wide range of compiler bugs which can
> make a program return 'True' without actually executing the program
> correctly.

Koen:
> I do not understand what you mean here. Maybe an example helps?

The problem is in using Quickcheck for compiler regression testing
stems from the fact that Quickcheck and the code it depends on (i.e.,
bits of the Prelude) are compiled with the same compiler as the
library being tested.

IIRC correctly, Quickcheck's output is either:

  'passed'
or
  'failed with example x==42, y=27'

That first answer ('passed') typically implies that a large number of
minor tests succeeded.  For example, I might write a quickcheck spec
to make sure that

   a+b == a - (-b)

and quickcheck would confirm this by testing 1000 pairs of values of
a and b.

Now imagine a buggy compiler which, because of some property of the
way Quickcheck is written, the way the library under test is written
or the way the quckcheck specification is written happens to always
return True even if the test fails for some of the inputs.  Now
Quickcheck will fail to report a bug.  [And, yes, the bugs in a mature
compiler are sometimes as specific as requiring a combination of two
or three libraries to reveal a bug.]

The solution is in two parts:

1) Make the 'trusted computing base' smaller so that less of the
   testing apparatus will break if the compiler is broken.

   Do this by using Quickcheck to generate a file of data and a
   separate tool (e.g., a perl script) to check for errors.  For
   example, quickcheck could report:

     Testing 'a+b == a - (-b)':
       0 == 0    -- when a=0, b=0
       1 == 1    -- when a=0, b=1
       2 == 2    -- when a=0, b=2
       1 == 1    -- when a=1, b=0
       2 == 2    -- when a=1, b=1
       3 == 3    -- when a=1, b=2

   and a separate tool would test that the text on the left of the ==
   is textually identical to the text on the right of the ==.

   [In practice, I'd probably make the output from quickcheck a bit
   easier for a machine to read at the expense of making it a bit less
   pleasant for a human to read.  It should still be human readable,
   of course because humans have to be able to look at the output and
   figure out what went wrong.] 

2) Compare the output from today's run against the output from
   yesterday's run (or, better, the output from the last successful
   run).

Is this any clearer?

--
Alastair Reid                 alastair@reid-consulting-uk.ltd.uk  
Reid Consulting (UK) Limited  http://www.reid-consulting-uk.ltd.uk/alastair/