[Haskell] Even higher-order abstract syntax: typeclasses vs GADT

Sun Jan 21 22:06:33 EST 2007

We show the typeclass implementation of the example used to make the
case for GADTs. We demonstrate the higher-order abstract-syntax-based
embedding of a language in Haskell and implement static and dynamic
semantics of the language. The interpreter of the language is tagless
and statically assured: Only well-typed terms may be evaluated, and
the evaluation of those does not get stuck. We use no tags and *no*
run-time pattern-matching, therefore, the `eval' function has no
possibility of raising a run-time error.  Our language is
_non_-strongly normalizing and non-structurally inductive due to the
presence of Fix; yet the typechecking is decidable and our typeclass
programs always terminate.

The running example has been used as an argument for generalized
recursive data types (one of the variants of GADTs). This message is
meant to clarify the expressiveness of GADTs with respect to
typeclasses. Our implementation is also aimed at the sometimes
exuberant enthusiasm for GADTs.

In positive contribution, we demonstrate the _automatic_ promotion of
a value-level higher-order abstract syntax (HOAS) term to the
type-level. The small-lambda HOAS becomes the big-Lambda HOAS and can
be operated by the typechecker. We demonstrate the typechecking of our
term language by abstract interpretation.  The latter is performed by
the Haskell typechecker itself, following straightforward declarative
rules expressed via instance declarations.  The code works in GHC and
Hugs.

This message has been inspired by the paper 
	`Guarded recursive datatype constructors'
	by Hongwei Xi, Chiyan Chen and Gang Chen (POPL2003)
to be called `the GRDT paper' below. The particular inspiration comes
from a comment by Meng Wang on Haskell-Cafe (Jan 13, 2007).

GADTs are often motivated by a tagless interpreter. When we embed a
language in Haskell, the corresponding data types may admit
meaningless terms like an application of a number to a pair. If we get
the Haskell typechecker to reject such terms, we gain the (static)
type checking for our object language. Furthermore, the evaluator for
the language no longer has to deal with ill-formed terms and report
run-time `type' errors. There is no longer any need in `dynamic
typing' and in maintaining and checking type tags.

The paper `Wobbly types: type inference for generalised algebraic data
types' by Peyton-Jones, Washburn, and Weirich used the embedding of a
first-order language to motivate GADTs. That example is _trivially_
implemented in Haskell without GADTs, with simple typeclasses.
	http://pobox.com/~oleg/ftp/Haskell/GADT-interpreter.hs

Far more interesting is the embedding of a higher-order language,
described in the GRDT paper. The first challenge is representing
higher-order syntax at the type level. More demanding is dealing with
the fixpoint operator of the GRDT example. The HOAS fixpoint destroys
the structural induction principle.  If we fully expose the structure
of our terms at the type level, how can we write divergent terms? Do
we need (equi-)recursive types? Mainly, would the Haskell typechecker
itself diverge when processing such terms? Who would have thought that
divergent terms can be so interesting.

The examples in the code include several terms whose run-time
evaluation does diverge, yet the typechecking of those terms
terminates.  The termination of our typechecker is easy to see as each
step eliminates one term constructor. The key to the terminating
typechecking is abstract interpretation, which permits the
introduction and the elimination of assumptions/hypothesis.

The full code is available at
	http://pobox.com/~oleg/ftp/Haskell/GADT-HOAS.hs

The syntax of our language is as follows. We took the first-order
language described previously and added higher-order terms and the
fixpoint. The resulting language is a little bit larger than that
introduced in the GRDT paper.

> newtype Lit a = Lit a                   -- aka, Lift in GRDT
> newtype Inc a = Inc a
> newtype IsZ a = IsZ a
> newtype If  a b c = If (a,b,c)
> newtype Fst a = Fst a
> newtype Snd a = Snd a
> newtype Pair a b = Pair (a,b)
>
> newtype Lam f t = Lam f                 -- t is the type of the bound arg
> newtype App a b = App (a,b)
> newtype Fix f t = Fix f                 -- ditto
>
> newtype AbsValue t = AbsValue t

The Lam and Fix terms with binders contain in their type the type of
the bound variable. The GRDT paper imposes the same type annotation
requirement. That type, however, can be inferred in many cases.  The
term `AbsValue' is used for typechecking of functions by abstract
interpretation.

Here is a sample higher-order term in our language, with latent
non-termination

> t5f = Lam $ \p -> If( IsZ p, (Pair (p,Inc p)), (Pair (p,Fix id)) )

The body of Lam is the ordinary Haskell function. We define the static
semantics by
> class Of a t | a -> t
asserting that the term 'a' has the type 't'.

The type-checking rules, defined by the instances of `Of' are
straightforward:

> instance Of (Lit t) t
> instance Of a Int => Of (Inc a) Int

More interesting is the typechecking of functions and fixpoints: the
Hyp introduction and elimination rules.

> instance (Apply f (AbsValue t1) y, Of y t2) => Of (Lam f t1) (t1->t2)
> instance (Apply f (AbsValue t) y, Of y t) => Of (Fix f t) t
> instance Of (AbsValue t) t  -- Hyp elimination rule

The Lam rules says that the function mapping the  term `x' to the term 'y'
has the type t1->t2, if the function maps the term AbsValue t1 
to the term that has the type t2. In other words, if the body of the 
function has the type t2 _assuming_ the argument has the type t1.
The AbsValue makes the typechecking of the HOAS Fix terminating.

If we define 
> teval :: Of e t => e -> t		-- static evaluator
> teval = undefined			-- all computation is static
> et5f1 = teval $ t5f

we can see the type inferred for t5f
> *HOASL> :t et5f1
> et5f1 :: Int -> (Int, Int)

We define the dynamic semantics as the big-step evaluator
> class Eval a t | a->t where eval:: a -> t

Most of the instances are straightforward:

> instance Of (Lit a) a => Eval (Lit a) a where
>     eval (Lit i) = i
> instance Eval t Int => Eval (Inc t) Int where
>     eval (Inc t) = eval t + 1

Again, more interesting is the evaluation of higher-order terms

> instance (Eval a (t1->t2), Eval b t1) => Eval (App a b) t2 where
>     eval (App (a,b)) = (eval a) (eval b)
>
> instance (Apply f (Lit t1) b, Eval b t2) => Eval (Lam f t1) (t1->t2) where
>     eval (Lam f) = \x -> eval (apply f (Lit x))
>
> instance (Apply f (Lit t) y, Eval y t) => Eval (Fix f t) t where
>     eval (Fix f) = fix (\x -> eval $ apply f (Lit x))

We can now evaluate the term and its applications

> et5f1 = teval $ t5f
> et5f2 =  eval $ t5f
> et5f3 =  eval $ App (t5f, Lit (0::Int)) -- yields (0,1)
> et5f4t= teval $ App (t5f, Lit (1::Int)) -- typechecking terminates
> et5f4 =  eval $ App (t5f, Lit (1::Int)) -- evaluation diverges!

The full code contains a more meaningful example of using Fix, to add
two numbers using recursion with accumulation. That example too is
potentially diverging on some inputs.