[Haskell] Eliminating Array Bound Checking through Non-dependent types

Mon Aug 9 06:22:16 EDT 2004

Hello again

Me:

>>What if I don't trust your kernel?

[..]

>>What's the downloadable object that can be machine-checked to satisfy
>>my paranoid insurance company?

Oleg:
> I hope that you can forgive me if I reply by
> quoting the above two paragraphs back to you, with the substitution 
> s/kernel/compiler/.
> 
> 	What if I don't trust your compiler?
> 
> I have heard a similar question asked of J. Strother Moore and
> J. Harrison. J. Strother Moore said that most of ACL2 is built by
> bootstrapping, from lemmas and strategies that ACL2 itself has
> proven. However, the core of ACL2 just has to be trusted.

It's not an issue of forgiveness. I actively encourage such questions,
of myself and other people. I'd also recommend this paper by Randy
Pollack:

   `How to believe a machine-checked proof'
   http://www.dcs.ed.ac.uk/home/rap/export/believing.ps.gz

But the one-line summary is `by rechecking it independently'.
Of course, we'll never achieve _certainty_ in any absolute
sense, but we can do a lot better than `I can't show you the
evidence for security reasons, but there's no doubt that...'.
In order to do that, you need more than a machine which
says `yes'. You need a language of evidence which is simple
enough that people can write their own checker in whatever way
they choose. Martin-Loef's type theory provides a source of
good candidates for such a language of evidence. Its core
proof-checking/type-checking algorithm is very small. This is
what comes out the other end of the Epigram elaborator.

[..]

>>Hongwei Xi's code contains the evidence I'm asking for.
>>The verification conditions are added by hand in the program as
>>annotations, just as yours are annotations outside the program.  His
>>are checked by Presburger arithmetic, just as yours could be.
> 
> Actually, as far as the PLDI'98 paper is concerned, they specifically say
> they do not use the full Presburger arithmetics. Rather, they solve
> the constraints by Fourier variable elimination. Anyway, even if
> the various elaboration and decision rules are proven to be sound and
> complete, what is the evidence that their implementation in the
> extended SML compiler is also sound and complete?

Another very good question. Every time you appeal to a proof-search
oracle or to an axiomatization of a library, or anything which hides
the evidence, the amount of stuff you `just have to trust' gets
bigger, and the harder it is to perform independent rechecking.

But what if some bright spark (volunteers welcome) were to implement
Presburger Arithmetic in one of the following ways...?
   (1) External to the system, but outputting checkable proof terms,
         instead of just `yes' or `no'.
   (2) Within the system, with a checkable proof that its yea be
         yea and its nay be nay. [Traditional implementation and
         correctness proof.]
   (3) Within the system, with a type which actually guarantees
         what the algorithm proves in the first place. [The
         `two-level' approach.]

> The conclusion specifically states that the algorithm is currently
> incomplete.

Clearly, soundness is more important. But it is nice to have a good idea
which problems the machine will get and which it won't.

Me:
>>...this proposition is false. The bounds function returns bounds which
>>are outside the range of the array when the array is empty.
>>You'll notice that Hongwei Xi's program correctly handles this case.
>>
>>Don't get me wrong: I think your branded arrays and indices are a very
>>good idea. You could clearly fix this problem by Maybe-ing up bbounds
>>or (better?) by refusing to brand empty arrays in the first place.

Oleg:
> I have noticed that the branding trick would work very well with
> number-parameterized types. The latter provide missing guarantees, for
> example, statically outlaw empty arrays. Hongwei Xi's code has another
> neat example: a dot product of two arrays where one array is
> statically known to be no longer that the other. Number-Parameterized
> types can statically express that inequality constraint too.

Yes, of course. And any dependently typed solution to these problems
would naturally parametrize the data-structures by their size,
represented as ordinary numbers. Statically expressing the constraints
is easy.

The abstract `brand' is just a type-level proxy for the bounding
interval, and the library of operations provides interval-respecting
operations on indices. This is a very neat solution in Haskell, but it
goes round an extra corner which isn't necessary with dependent types,
where you can just talk about the interval directly. The library-writer
would develop and verify the same convenient operations for working
with intervals and indices; the proofs would be independently
recheckable terms in type theory.

> The
> Number-Parameterized types paper considers a more difficult example --
> and indeed the typechecker forced me to give it a term that is
> verifiably a proof of the property (inequality on the sizes) stated in
> term's inferred type. The typecheker truly demanded a proof; shouting
> didn't help.

Rightly so. What tools would be useful in such a situation?

> Incidentally, the paper is being considered for JFP, I guess. I don't
> know if the text could be made available. I still can post the link to
> the code:
> 	http://pobox.com/~oleg/ftp/Haskell/number-param-vector-code.tar.gz
> 
> I should emphasize that all proper examples use genuine Haskell arrays
> rather than nested tuples.

[..]

There's no reason why a dependently typed language should not have
genuine arrays.

> Yet the type of the array includes its
> size, conveniently expressed in decimal notation. One can specify
> arithmetic equality and inequality constraints on the sizes of the
> array, in the type of the corresponding functions. The constraints can
> be inferred. One example specifically deals with the case when the
> sizes of those arrays are not known until the run-time -- moreover,
> change in the course of a computation that involves general
> recursion. It seems that branding trick nicely complements
> number-parameterized arrays and makes `dynamic' cases easier to
> handle.

Again, all of this sounds like a good thing to do, but less like hard
work if you have a language designed to support it.

I don't know why you emphasize general recursion. It's no big deal in
programs which we execute only at run-time. Sure, I prefer structural
recursion to general recursion, because I like to have termination for
free, and also to be able to use programs safely in types. But I am not 
a bottom-pincher: you can have general recursion at run-time and still
decide typechecking. Here's a cheap and nasty trick that does it (and
we're kicking around some less nasty tricks at the moment). Add a
hypothesis

   general :: forall p . (p -> p) -> p

This is all you need to write general recursive programs. In Epigram,
your program would invoke `general' once, and then make whatever
recursive calls you like. None of these programs can loop during
typechecking, because `general' has no computational behaviour. We
can tell which programs/proofs to trust by which use `general'. Only
when we output run-time code do we make

   general f = f (general f)

Of course, if you really want to run this program in types---at
your own risk---we plan to let you. There are three security levels:

   paradise:    a complete absence of generals
   purgatory:   there are generals, but their orders are not obeyed
   pandemonium: what happens when you listen to generals

A compiler-switch would do, but annotations on individual functions
would be better.

So please, everybody, no more `What about general recursion? What
about decidable typechecking?'. It's no big deal. Why do some people
presume that Cayenne's particular problems are intrinsic to any 
dependently typed language?

>>You'll notice that Hongwei Xi's program correctly handles this case.
> 
> But what if I specified the dependent type with a mistake that
> overlook the empty array case? Would the dependent ML compiler catch
> me red-handed? In all possible cases? Where is the formal proof of
> that?

I can't speak for dependent ML, but you wouldn't get away with it in
Epigram. Epigram rules out _provably_ impossible cases and provides
a language in which to provide this guarantee. In the mundane
majority of cases, this proof amounts to `constructors of datatype
indices are disjoint', which gets handled automatically.

> 	I have failed to emphasize the parallels between Hongwei Xi's
> annotations and the corresponding Haskell code.

It wasn't lost on me...

[..]

> Yes, there is a trusted kernel involved -- just
> as there is a Dependent SML system to be implicitly trusted.

I agree: the appeal to the constraint oracle is a weakness. As is
the appeal to informal proofs, or axiomatizations of the library.

> However,
> in the given example the trusted kernel is compact Haskell code plus
> the GHC system. The latter is complex -- but it is being used by
> thousands of people over extended period of time -- and so has higher
> confidence than experimental extensions (unless the latter have been
> formally proven -- I mean the code itself -- by a trusted system such as
> ACL2 or Coq).

Confidence of what? Remember that Haskell is logically unsound. I can
fake a branded index by taking the head of an empty list. Fortunately,
this fake will always be discovered at run-time before unsafeAt goes
haywire (that ain't true for the empty array bug). However, it somewhat
weakens the static guarantee!

> 	I do not wish to sound as being against Dependant Type
> systems and implementations. I merely wish to point out poor-man
> approaches.

Don't get me wrong either: I agree with you. I'm in favour of cranking
as much static confidence out of Haskell's type system as possible. And
some pretty good stuff is possible (modulo bottom). However, it's very
easy to pick up such programs, implemented by hook or by crook (I'm a
crook) in Haskell and say: `See? Who needs dependent types?'. The truth
of the matter is that these programs are rather like the dependently
typed programs you might write, except that you need to use type-level
proxies for values and type-level Prolog trickery, instead of ordinary
values and ordinary programs. The dependently typed versions of these
programs are simpler than the Haskell versions. In the short term,
Haskell is the here and now, so do what you can; in the longer term,
choosing type class unicycling over dependent types is a false economy.

[..]

> 	- Try to make `safety' checks algorithmical: use newtypes to
> `lift' the safety checks such as range checks and see if they can be
> merged with the tests the algorithm has to do anyway.

Yes, this is what often happens with dependently typed programs. The
key thing to ensure is that the tests have types which show their
meaning statically. A Bool is a bit uninformative.

> 	- Use explicitly-quantified type variables to associate
> `unforgeable' types with values, so that the following property holds:
> the equality of such types entails the equality of the corresponding
> values -- even if we don't know what they values are until the
> run-time.

Exactly: type-level proxies for values. It's the right thing to do
if you can't have values in types. I'm in favour of the `datakind'
idea: Haskell's type-level proxies for values should look as much
like values as possible!

> 	- Convenient (decimal) type arithmetics at compile time -- and
> even `semi-compile' time.

There's nothing which forces numbers in dependent type systems to be
unary. That's just the path of least resistance in prototypes. Also,
it reflects the importance of the inductive structure of numbers
when they are being used as indices. I wonder how much this really buys
you when you're developing indexed operations, as opposed to when you
want to type actual values. I guess I'll read the paper.

> I'm interested in just how far those principle may take me.

That's a good thing to do. I'm trying to deliver a language which makes
exactly that kind of program much easier to write. I hope that sounds
like a good thing too.

> 	Thank you very much for your message!

And thank you!

All the best

Conor