[Haskell] Eliminating Array Bound Checking through Non-dependent types

oleg at pobox.com oleg at pobox.com
Fri Aug 6 20:50:27 EDT 2004


Hello!

> What if I don't trust your kernel? The guarantees you require of
> your kernel are not statically checked. What guarantee do I have
> that the propositions which you identify are even the ones which
> are really needed to eliminate bounds checking? How does the
> machine replace ! by unsafeAt reliably, all by itself?
>
> Yes, you can convince _me_ that something of the sort will do, because
> I can follow the math. But what is the mechanism? What is the evidence?
> What's the downloadable object that can be machine-checked to satisfy
> my paranoid insurance company?

That is very well said! I hope that you can forgive me if I reply by
quoting the above two paragraphs back to you, with the substitution 
s/kernel/compiler/.

	What if I don't trust your compiler?

I have heard a similar question asked of J. Strother Moore and
J. Harrison. J. Strother Moore said that most of ACL2 is built by
bootstrapping, from lemmas and strategies that ACL2 itself has
proven. However, the core of ACL2 just has to be trusted. ACL2 has
been used for quite a while and so there is a confidence in its
soundness. Incidentally, both NSA and NIST found this argument
persuasive, when they accepted proofs by ACL2 as evidence of high
assurance, in awarding Orange book A1 and IFIPS 140-1 ratings -- the
highest security ratings -- to some products.

> Hongwei Xi's code contains the evidence I'm asking for.
> The verification conditions are added by hand in the program as
> annotations, just as yours are annotations outside the program.  His
> are checked by Presburger arithmetic, just as yours could be.

Actually, as far as the PLDI'98 paper is concerned, they specifically say
they do not use the full Presburger arithmetics. Rather, they solve
the constraints by Fourier variable elimination. Anyway, even if
the various elaboration and decision rules are proven to be sound and
complete, what is the evidence that their implementation in the
extended SML compiler is also sound and complete? Speaking of
completeness, the procedure in PLDI'98 paper notes, "Note that we have
been able to eliminate all the existential variables in the above
constraint. This is true in all our examples, but, unfortunately, we
have not yet found a clear theoretical explanation why this is so."
The conclusion specifically states that the algorithm is currently
incomplete.


> ...this proposition is false. The bounds function returns bounds which
> are outside the range of the array when the array is empty.
> You'll notice that Hongwei Xi's program correctly handles this case.
>
> Don't get me wrong: I think your branded arrays and indices are a very
> good idea. You could clearly fix this problem by Maybe-ing up bbounds
> or (better?) by refusing to brand empty arrays in the first place.

I have noticed that the branding trick would work very well with
number-parameterized types. The latter provide missing guarantees, for
example, statically outlaw empty arrays. Hongwei Xi's code has another
neat example: a dot product of two arrays where one array is
statically known to be no longer that the other. Number-Parameterized
types can statically express that inequality constraint too. The
Number-Parameterized types paper considers a more difficult example --
and indeed the typechecker forced me to give it a term that is
verifiably a proof of the property (inequality on the sizes) stated in
term's inferred type. The typecheker truly demanded a proof; shouting
didn't help.

Incidentally, the paper is being considered for JFP, I guess. I don't
know if the text could be made available. I still can post the link to
the code:
	http://pobox.com/~oleg/ftp/Haskell/number-param-vector-code.tar.gz

I should emphasize that all proper examples use genuine Haskell arrays
rather than nested tuples. Yet the type of the array includes its
size, conveniently expressed in decimal notation. One can specify
arithmetic equality and inequality constraints on the sizes of the
array, in the type of the corresponding functions. The constraints can
be inferred. One example specifically deals with the case when the
sizes of those arrays are not known until the run-time -- moreover,
change in the course of a computation that involves general
recursion. It seems that branding trick nicely complements
number-parameterized arrays and makes `dynamic' cases easier to
handle.


> You'll notice that Hongwei Xi's program correctly handles this case.

But what if I specified the dependent type with a mistake that
overlook the empty array case? Would the dependent ML compiler catch
me red-handed? In all possible cases? Where is the formal proof of
that?

 
	I have failed to emphasize the parallels between Hongwei Xi's
annotations and the corresponding Haskell code. What Hongwei Xi
expressed in types, the previously posted code expressed in terms. The
terms were specifically designed in such a way so that consequences of
various tests were visible to the type systems, and so the
corresponding conclusions could be propagated as a part of regular
type inference. Number-parameterized types also rely on the type
inference of Haskell. Yes, there is a trusted kernel involved -- just
as there is a Dependent SML system to be implicitly trusted. However,
in the given example the trusted kernel is compact Haskell code plus
the GHC system. The latter is complex -- but it is being used by
thousands of people over extended period of time -- and so has higher
confidence than experimental extensions (unless the latter have been
formally proven -- I mean the code itself -- by a trusted system such as
ACL2 or Coq).


	I do not wish to sound as being against Dependant Type
systems and implementations. I merely wish to point out poor-man
approaches. Here are the principles, which I'm sorry to have failed to
eluicidate:

	- Try to make `safety' checks algorithmical: use newtypes to
`lift' the safety checks such as range checks and see if they can be
merged with the tests the algorithm has to do anyway.

	- Use explicitly-quantified type variables to associate
`unforgeable' types with values, so that the following property holds:
the equality of such types entails the equality of the corresponding
values -- even if we don't know what they values are until the
run-time.

	- Convenient (decimal) type arithmetics at compile time -- and
even `semi-compile' time.

I'm interested in just how far those principle may take me.


	Thank you very much for your message!



More information about the Haskell mailing list