[Haskell] Infinite, open, statically constrained HLists

Thu Oct 26 03:38:38 EDT 2006

We show a presentation of heterogeneous sequences that admits infinite
sequences and permits post-hoc addition of new elements, even to an
already infinite sequence. We can also assert static constraints,
e.g., the sequence must be made of non-decreasing Peano numerals. Our
sequences permit all the common operations such as `head', `tail',
`cons', `map', `take' and `fold'. It is a type error to attempt to
take `head' of an empty sequence. Any finite sequence is
inter-convertible with the HCons-based HList.

The full code is available at
	http://darcs.haskell.org/HList/CHList.hs

The presentation is inspired by the problem posed by Greg Buchholz on
the following thread:

  http://www.haskell.org/pipermail/haskell-cafe/2006-October/018948.html
  http://www.haskell.org/pipermail/haskell-cafe/2006-October/019030.html

He observed how easy it is to define _some_ (potentially) infinite
heterogeneous data structures. For example,

	data SeqI a = ConsI a (SeqI (S a)) | NilI

defines the type of a sub-sequence of the sequence of natural Peano
numerals. We can construct finite 
	seqi1 = ConsI Z (ConsI (S Z) NilI)
and even infinite sequences
	seqi_inf :: n -> SeqI n
	seqi_inf n = ConsI n (seqi_inf (S n))
but not
	seqi2 = ConsI Z (ConsI Z NilI)
The latter ill-formed sequence is ill-typed. However, it is difficult
to generalize these constructions, for example, to the sequences of
Fibonacci Peano numerals or to the sequences of non-decreasing Peano
numerals.

The HList presentation, with familiar "HCons a b" and "HNil" types,
makes it straightforward to define sequences with arbitrary
constraints on the adjacent elements. The structure of such HList
values is reflected in their types, and so we can statically constrain
them. Alas, the type of an HList has as many HCons type constructors as
there are HCons data constructors. That seems to rule out infinite
sequences.

There is a trivial way out of this predicament, via a different
presentation. A sequence is a partial function from a range of natural
numbers. In other words, a sequence is a mapping from a Natural number
to {HNothing, HJust a}. We consider the HNothing element with the
smallest index to be the end of the sequence. This obvious idea leads
to the straightforward implementation:

> class Nat n => CHList l n r | l n -> r where
>     chel :: l -> n -> r
>     chel = undefined
>
> -- Default `cut-off' instance
> instance (Nat n, TypeCast r HNothing) => CHList l n r

where

> data HNothing = HNothing
> newtype HJust x = HJust x
> fromHJust (HJust x) = x		-- this is the total function!

The class 'CHList l n r' establishes the n-th element of the sequence
named 'l' to be 'r'. The empty sequence is defined then by the single
declaration:

> data LEmpty = LEmpty

Here is a bit more elaborate sequence, of two elements. Both are
Ints. Homogeneous sequences are just a subset of heterogenous ones.

> data L1 = L1
> instance CHList L1 Z (HJust Int) where
>     chel _ _ = HJust 0
> instance CHList L1 (S Z) (HJust Int) where
>     chel _ _ = HJust 1

We can convert such sequences to the regular HList; this is quite
handy for showing them. For example, 

> test1 = show $ chl_to_hl L1

yields "HCons 0 (HCons 1 HNil)"

Not surprisingly, we can implement all of the common List API, for
example,

> cl_head l = fromHJust (chel l Z)

It will be a type error attempting to apply cl_head to an empty
sequence. We can define tail

> newtype CLTail l = CLTail l
> instance (Nat n, CHList l (S n) r) => CHList (CLTail l) n r where
>     chel (CLTail l) n = chel l (S n)
>
> -- ensure the list to take the tail of is non-empty
> cl_tail :: CHList l Z (HJust e) => l -> CLTail l
> cl_tail l = CLTail l
>
> test_h3 = cl_head (cl_tail L1) -- 1

and cons

> data CLCons a l = CLCons a l
> cl_cons = CLCons
>
> instance CHList (CLCons a l) Z (HJust a) where
>     chel (CLCons x _ ) _ = HJust x
>
> instance (Nat n, CHList l n r) => CHList (CLCons a l) (S n) r where
>     chel (CLCons _ l) (S n) = chel l n

after which we can introduce and extend our sequences in the `normal'
way:

> test_c1 = cl_head (cl_tail (cl_cons ((-2)::Int) (cl_cons ((-1)::Int) L1)))
> test_c2 = cl_head (cl_tail (cl_cons () (cl_cons True LEmpty)))
> test_c3 = show $ chl_to_hl $ cl_cons () L1

Defining other functions like `take', `map' and `fold' is
straightforward and elided. The latter require the generalized `Apply'
class defined in the HList library. We can write, for example,

> test_m1 = cl_map (succ :: Int->Int) L1
> test_m2 = show $ chl_to_hl test_m1

because regular functions are already instances of the Apply class and
so apply-able. It is worth noting a close resemblance between test_m1
above and

> test_m1' = map (succ :: Int->Int) l1
>     where l1 = [0,1]

The difference seems to be in the case of the identifier: `l1' vs
`L1'. Indeed, both l1 and L1 _denote_ certain values. The rules of
looking up the value given the name are implicit in the case of l1 and
explicit (and defined by us) in the case of L1. The type-checking is
more explicit in the latter case too -- that's why we are able to
assert more constraints, as we shall see soon.

It goes without saying that our sequences can as well be infinite:

> data LNats = LNats  -- lists of successive numerals
> instance Nat n => CHList LNats n (HJust n) where
>     chel _ n = HJust n

which defines the infinite list of Peano numerals. That is quite akin
to "seqi_inf Z", described in the beginning of this message. Now,
however, we can introduce other infinite sequences, e.g., by mapping
arbitrary function over LNats. We can also define sequences by
recursion, e.g., LFibs.

Perhaps more interesting is the fact that our lists are open, even
when they are extensible. For example, if we define only

> data LCyc = LCyc
> instance (Nat n, CHList LCyc n r) => CHList LCyc (S (S n)) r where
>     chel l (S (S n)) = chel l n

then

> test_cyc = show $ chl_to_hl $ cl_take five LCyc

yields "HNil". If we later add another instance

> instance CHList LCyc Z (HJust ()) where chel _ _ = HJust ()

the same test_cyc will give "HCons () HNil". The addition the third
instance

> instance CHList LCyc (S Z) (HJust Bool) where chel _ _ = HJust True

makes our list infinite. test_cyc now gives
  "HCons () (HCons True (HCons () (HCons True (HCons () HNil))))"

We may still add instances, for example

> instance CHList LCyc (S (S Z)) (HJust Bool) where chel _ _ = HJust False

causes test_cyc to return
 "HCons () (HCons True (HCons False (HCons True (HCons False HNil))))"

Finally, we show constrained lists. A constraint is a binary
predicate; it is apply-ed to two consecutive elements of the
sequence, and must return HTrue. For simplicity, we introduce the
following non-generic class of constrained lists, whose elements must
be non-decreasing numerals:

> data CK l = CK l
> class (Nat n, Apply (CK l) (n,r) HTrue) => C1 l n r | l n -> r where
>     chel1 :: l -> n -> r; chel1 = undefined
> instance (Nat n, TypeCast r HNothing, Apply (CK l) (n, r) HTrue) => C1 l n r
>
> instance Apply (CK l) (Z,r) HTrue
> instance Apply (CK l) (S n,HNothing) HTrue
> instance (C1 l n (HJust ep), Apply PLEQ (ep,e) r) 
>     => Apply (CK l) (S n,HJust e) r

The constraint is specified as a class constraint; therefore, all
instances of C1 must obey it. All instances of C1 are CHLists,
after we say

> newtype C1toCL cl = C1toCL cl
> instance C1 cl n r => CHList (C1toCL cl) n r where
>     chel (C1toCL l) n = chel1 l n

We introduce

> data LC1 = LC1
> instance C1 LC1 Z (HJust Z)
> instance C1 LC1 (S Z) (HJust Z)
> instance C1 LC1 (S (S Z)) (HJust (S (S (S Z))))
> test_cl1 = show $ chl_to_hl $ C1toCL LC1

but we cannot add the instance
	instance C1 LC1 (S (S (S Z))) (HJust Z)
because it fails the non-decreasing constraint. The constrained list
may be infinite:

> data LC2 = LC2
> instance C1 LC2 Z (HJust Z)
>           -- this constraint suggested by the typechecker
> instance (Apply PLEQ (r, S (S r)) HTrue, 
> 	  C1 LC2 n (HJust r))
>     => C1 LC2 (S n) (HJust (S (S r)))
>
> test_cl2 = show $ chl_to_hl $ cl_take five $ C1toCL LC2

We check the constraint by applying the predicate to the current and
the previous element of the sequence (if it exists). Even in case of
the infinite sequence, the prefix is finite, so our constraint is
effectively verifiable.