[Haskell] Eliminating Array Bound Checking through Non-dependent types

Sun Aug 8 19:50:33 EDT 2004

Hello!

Bjorn Lisper wrote:

> What is the relation to the sized types by Lars Pareto and John Hughes?

It is orthogonal and complementary, as the message in response to
Conor T. McBride indicated.

> What is the relation to classical range analyses for (e.g.) array index
> expressions, which have been known for a long time for imperative languages?

It is just like the classical range analysis, but _reified_ in the
programming language itself. Given a piece of code:

finda x arr = loop lo
    where (lo,hi) = bounds arr
	  loop i = if i <= hi then
		      if x == arr ! i then Just i
		      else loop (i + 1)
		   else Nothing

the analysis sees that 'i' starts at the lower bound of 'arr' and is
incremented afterwards. When the analysis sees the test "i <= hi" it
infers that in the `then' branch of that test `i' does not exceed the
upper bound of the array. Therefore, the indexing operation `arr ! i'
is safe and no range check is needed.

In the `branding' framework, the programmer makes the result of the
test "i <= hi" and the corresponding implication that `i' is in range
known to the type system, by branding the index `i'. To be more
precise, the programmer would replace the first `if' statement with
	if_in_range:: (Ix i) => i -> BArray s i e -> (BIndex s i->r)
		      -> r ->r
	if_in_range i arr on_within_range on_outside_rage ...
If `i' turns out to be in range, that fact would be recorded by
passing to the continuation on_within_range a branded index. Thus the
logical implication that was implicit in the range checker is made
explicit to the typechecker here.

> A program analysis like range analysis is not exact, of course: it must make
> safe approximations sometimes and will sometimes say that an array index
> might be out of bounds when it actually won't. In your framework, this seems
> to correspond to the fact that you must verify your propositions about index
> expressions.

True, just as the range analysis must verify the rules of the
analysis. The difference is that the conventional range analyzer is a
part of the _compiler_, typically hidden from view (of a regular
programmer). Here, the analyzer is a part of a _library_.

It is also true that our analysis can't be exact: if the code includes 
	let i = very_complex_function j
and we know that j is in range, it may be very difficult to ascertain
that 'i' will always be in range. In that case, we do the following
	let j_unbranded = unbrand j
	    i = very_complex_function j_unbranded
        in if_in_range i arr on_within_range on_outside_rage

That is, we intentionally forget the branding information, do a
complex index transformation, followed by a run-time witnessing to
recover the branding. If we somehow know that very_complex_function
keeps its result in range, we can replace `on_outside_rage' with the
function that raises an exception, crashes the computer, etc. If we
are not sure if `i' is in range, then our program 
must do the range check anyway; if `i' turns out of range, the
program should do what the algorithm prescribes in that case. The
upshot is that `if_in_range' makes the programmer explicitly consider the
consequences of the range check. We turn the range check from a
routine safety check into an algorithmically significant decision.

Incidentally, if we can prove that `very_complex_function' leaves the
index in range, then we can make the function return a branded index,
and thus eliminate the if_in_range check above. Because the creation
of a branded index can only be done in a trusted kernel, we must put
such a function into the kernel, after the appropriate rigorous
verification -- perhaps formal verification.