[Haskell-cafe] Finally tagless - stuck with implementation of "lam"

Mon Oct 5 17:42:53 EDT 2009

Hi Günther,

The underlying problem with the implementation of 'lam' is that 
you have to pick an evaluation order for the side effects you want in
the semantics of your embedded language. The two obvious options are
call-by-name and call-by-value.

The tricky point is that the types of the embedded language cannot be
interpreted directly as Haskell types wrapped in a monad. You need to 
either take the call-by-name or the call-by-value interpretation of
types:

For CBN:  [[ Int    ]] = IO Int
          [[ A -> B ]] = [[ A ]] -> [[ B ]]

For CBV:  [[ Int    ]] = Int
          [[ A -> B ]] = [[ A ]] -> IO [[ B ]]

(obviously, there is nothing special about the IO monad here, or in the
rest of this email: we could replace it with any other monad).

To implement the translation of embedded language types to Haskell types
in Haskell we use type families.

Here is the code for call-by-name:

> module CBN where

First of all, we define some dummy types so that we have a different
representation of types in the embedded language than those in the
meta-language, and we don't get confused:

> data IntT
> data a :-> b
> infixr 5 :->

Now we give the definition of the higher-order abstract syntax of our
embedded language:

> class HOAS exp where
>     lam :: (exp a -> exp b) -> exp (a :-> b)
>     app :: exp (a :-> b) -> exp a -> exp b
>
>     int :: Int -> exp IntT
>     add :: exp IntT -> exp IntT -> exp IntT

This is as in your definition, except I have used the different type
names to highlight the distinction between embedded and meta-language
types.

Next, we give the CBN semantics for the types, using the definition
above, with a type family.

> type family Sem a :: *
> type instance Sem IntT      = IO Int
> type instance Sem (a :-> b) = Sem a -> Sem b

Now we can give an instance of the syntax's type class, that translates
a piece of embedded code into its denotational semantics.

> newtype S a = S { unS :: Sem a }

> instance HOAS S where
>   int = S . return
>   add (S x) (S y) = S $ do a <- x
>                            b <- y
>                            putStrLn "Adding"
>                            return (a + b)
>
>   lam f           = S (unS . f . S)
>   app (S x) (S y) = S (x y)

As an example, we can give a term and see what happens when we evaluate
it:

> let_ :: HOAS exp => exp a -> (exp a -> exp b) -> exp b
> let_ x y = (lam y) `app` x

> t :: HOAS exp => exp IntT
> t = (lam $ \x -> let_ (x `add` x)
                      $ \y -> y `add` y) `app` int 10

Evaluating 'unS t' in GHCi gives the output:

Adding
Adding
Adding
40

Note that the add operation is invoked three times: the "x `add` x" that
is let-bound is evaluated twice.

The second option is call-by-value. This is a bit more interesting
because we have to change the syntax of the embedded language to deal
with the different substitution rules for CBV.

> module CBV where

Again, we use a different representation for embedded-language types:

> data IntT
> data a :-> b
> infixr 5 :->

There is a difference in the syntax between CBN and CBV that is not
always obvious from the usual paper presentations. There is a split
between pieces of syntax that are "values" and those that are
"computations". Values do not have side-effects, while computations may.
In this presentation, I have chosen that the only values are variables,
while everything else is a computation.

To represent this split, we use a multi-parameter type class. Note that
the technique of using multi-parameter type classes is generally useful
for representing abstract syntaxes in the HOAS style with multiple
mutually defined syntactic categories.

> class HOAS exp val | exp -> val, val -> exp where
>    val :: val a -> exp a
>
>    lam :: (val a -> exp b) -> exp (a :-> b)
>    app :: exp (a :-> b) -> exp a -> exp b
>
>    int :: Int -> exp IntT
>    add :: exp IntT -> exp IntT -> exp IntT

The 'val' operation injects values into computations: this will be
interpreted by the 'return' of the monad.

We now give the semantics of types for CBV, following the definition
above:

> type family   Sem a         :: *
> type instance Sem IntT      = Int
> type instance Sem (a :-> b) = Sem a -> IO (Sem b)

We need two newtypes for the two different syntactic categories. Values
are interpeted by 'SV', and are just pure values; computations are
interpreted by wrapping them in the monad.

> newtype SV a = SV { unSV :: Sem a }
> newtype SC a = SC { unSC :: IO (Sem a) }

Now we can define the semantics of the syntax in terms of these types:

> instance HOAS SC SV where
>     val = SC . return . unSV
>
>     lam f             = SC $ return (unSC . f . SV) 
>     app (SC x) (SC y) = SC $ do f <- x; a <- y; f a
> 
>     int i             = SC $ return i
>     add (SC x) (SC y) = SC $ do a <- x
>                                 b <- y
>                                 putStrLn "Adding"
>                                 return (a + b)

The same example term is expressed in CBV as follows:

> let_ :: HOAS exp val => exp a -> (val a -> exp b) -> exp b
> let_ x y = (lam y) `app` x

> t :: HOAS exp val => exp IntT
> t = (lam $ \x -> let_ (val x `add` val x)
>                     $ \y -> val y `add` val y) `app` int 10

Running 'unSC t' in GHCi gives the output:

Adding
Adding
40

Note that the add operation is only invoked twice this time: the
let-bound expression is evaluated to a value, and then this value is
used twice in the body of the let.

The other obvious evaluation strategy is call-by-need, as used by
Haskell itself, but I don't know if it is possible to implement this. I
guess it may be possible using the ST monad as the mutable storage of
thunks.

Hope this was helpful.

Bob

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.