[Haskell] Eliminating Array Bound Checking through Non-dependent types

Thu Aug 5 22:31:36 EDT 2004

There is a view that in order to gain static assurances such as an
array index being always in range or tail being applied to a
non-empty list, we must give up on something significant: on data
structures such as arrays (to be replaced with nested tuples), on
general recursion, on annotation-free programming, on clarity of code,
on well-supported programming languages. That does not have to be the
case. This message shows a non-trivial example involving native
Haskell arrays, index computations, and general recursion. All arrays
indexing operations are statically guaranteed to be safe -- and so we
can safely use an efficient unsafeAt provided by GHC seemingly for
that purpose. The code is efficient; the static assurances cost us no
run-time overhead. The example uses only Haskell98 + higher-ranked
types. No new type classes are introduced. The safety is based on:
Haskell type system, quantified type variables, and a compact
general-purpose trusted kernel.

Our example is `bsearch', taken from the famous paper "Eliminating
Array Bound Checking Through Dependent Types" by Hongwei Xi and Frank
Pfenning (PLDI'98).  Hongwei Xi's code was written in SML extended
with a restricted form of dependent types. Here is the original code
of the example (taken from Figure 3 of that paper, see also
http://www-2.cs.cmu.edu/~hwxi/DML/examples/)

] datatype 'a answer = NONE | SOME of int * 'a
]
] assert sub <| {n:nat, i:nat | i < n } 'a array(n) * int(i) -> 'a
] assert length  <| {n:nat} 'a array(n) -> int(n)
]
] fun('a){size:nat}
] bsearch cmp (key, arr) =
] let
]     fun look(lo, hi) =
]         if hi >= lo then
]             let
]                 val m = (hi + lo) div 2
]                 val x = sub(arr, m)
]             in
]                 case cmp(key, x) of
]                     LESS => look(lo, m-1)
]                   | EQUAL => (SOME(m, x))
]                   | GREATER => look(m+1, hi)
]             end
]         else NONE
]     where look <|
]     {l:nat, h:int | 0 <= l <= size /\ 0 <= h+1 <= size } int(l) * int(h) 
]             -> 'a answer
] in
]     look (0, length arr - 1)
] end
] where bsearch <| ('a * 'a -> order) -> 'a * 'a array(size) -> 'a answer

The text after `<|' are dependent type annotations. They _must_ be
specified by the programmer -- even for internal functions such as
`look'.

Here's our code, deliberately written to be as close to Hongwei Xi's
code as possible (This message is the complete code):

> {-# OPTIONS -fglasgow-exts #-}
> module Dep where
> import Data.Array
>
> bsearch cmp (key, arr) = brand arr (\arr' -> bsearch' cmp (key, arr'))
>
> bsearch' cmp (key,arr) = look lo hi
>  where
>       (lo,hi) = bbounds arr
>       look lo hi = let m = bmiddle lo hi
>                        x = arr !. m
> 		   in case cmp (key,x) of
> 		      LT -> bpred lo m (look lo) Nothing
> 		      EQ -> Just (unbi m, x)
> 		      GT -> bsucc hi m (\m' -> look m' hi) Nothing

This code is just as algorithmically efficient as the Dependent SML
code: one middle index computation, one element comparison, one index
comparison, one index increment or decrement per iteration. There are
no type annotations as none are needed. Operator (!.) is a statically
safe array indexing operator. The type system and the trust properties
of the kernel below guarantee that in the expression "arr !. m" the
index `m' is positively in range of the array `arr' bounds.

> barr1 = listArray (5,5 + (length s)-1) s where s = "abcdefgh"
> btest1 = bsearch (uncurry compare) ('c',barr1)
> btest2 = bsearch (uncurry compare) ('x',barr1)

The code relies on a compact general-purpose trusted kernel explained
below. That code should be preferably put into a separate module.

First we introduce tags for Branded arrays and Branded indices:

>     -- those two must *not* be exported!
> newtype BArray s i a = BArray (Array i a)
> newtype BIndex s i   = BIndex i
>
> unbi (BIndex i) = i

These are `newtype's and so impose no run-time overhead. Of interest
is a phantom type variable 's', which marks a _brand_ of an array and
of an array index. An index is branded if it is certainly within the
range of the array of its brand. The type variable 's' is similar to
that in the ST monad. The latter relies on 's' to enforce
serialization. We, OTH, do not impose any linearity constraints on 's'
-- it may be freely duplicated (see bbounds below) and discarded (see
`unbi'). It is created however under controlled conditions.  The
safety depends on the trusted way of creating branded types: the
constructors BIndex and BArray should be used in the trusted kernel
only, and should not be available anywhere else.  The uniqueness of 's'
afforded by the explicit universal quantification prevents mixing up of
different brands.

We must re-iterate that safety depends on assurances of the code that
constructs BIndex values. Because of the high assurance, we must
formulate the safety properties as propositions, and prove
them. Fortunately, the code below is compact and straightforward, as
well as general purpose.

> bbounds:: (Ix i) => BArray s i a -> (BIndex s i, BIndex s i)
> bbounds (BArray a) = let (l,h) = bounds a in (BIndex l, BIndex h)

Proposition: the two indices returned by bbounds are within the range
of the array 'a'. Proof: from the semantics of the function `bounds',
taken here as an axiom.

> bmiddle:: (Integral i) => BIndex s i -> BIndex s i -> BIndex s i
> bmiddle (BIndex i1) (BIndex i2) = BIndex ((i1 + i2) `div` 2)

Proposition: l <= i1 <= h, l <= i2 <= h |- l <= (i1 + i2) `div` 2 <= h
Proof: plain arithmetics.

We should stress that the type of bmiddle assures that all indices
involved have the same brand -- that is, the same lower and upper
boundaries. A brand 's' is a (type-level) representation of index
bounds. At compile time, we don't know what they are, but the
unforgeability of 's' (see below) statically guarantees that the same
's' represents the same bounds.

> bsucc:: (Ord i,Num i) 
> 	=> BIndex s i -> BIndex s i -> (BIndex s i -> r) -> r -> r
> bsucc (BIndex upb) (BIndex i) on_within on_out
> 	= let i'    = i + 1
> 	  in if i' <= upb then (on_within (BIndex i')) else on_out

The function `bsucc' takes two branded indices that correspond to the
same bounds (see the variable 's'). The function also takes two
continuations, on_within and on_out. The first index is considered to
be an upper limit. The function increments the second index. If the
result does not exceed the upper limit, we invoke on_within and pass
it the result. Otherwise, we invoke `on_out'.

Safety Proposition: l <= upb <= h, l <= i <= h, (i+1) <= upb |-
		    l <= (i+1) <= h
Proof: from i < (i+1) and properties of inequalities.
The safety proposition justifies our use of the data constructor
BIndex.

> bpred:: (Ord i,Num i)
> 	=> BIndex s i -> BIndex s i -> (BIndex s i -> r) -> r -> r
> bpred (BIndex lwb) (BIndex i) on_within on_out
> 	= let i'    = i - 1
> 	  in if i' >= lwb then (on_within (BIndex i')) else on_out

The dual of `bsucc'.
Safety Proposition: l <= lwb <= h, l <= i <= h, (i-1) >= lwb |-
		    l <= (i-1) <= h

Because a branded index is assuredly within the bounds of the array of
the same brand, we can write

> infixl 5 !.
> (!.):: (Ix i) => BArray s i e -> (BIndex s i) -> e
> (BArray a) !. (BIndex i) = a ! i

actually, we may _safely_ replace a ! i with `unsafeAt a i'

Finally, we need an introduction rule for BArray:

> brand:: (Ix i) => Array i e -> (forall s. BArray s i e -> w) -> w
> brand (a::Array i e) k = k ((BArray a)::BArray () i e)

The function has a higher-rank type. It is the existential
quantification of 's' as well as the absence of BArray constructor
elsewhere guarantees that the same brand entails the same bounds.