[Haskell-cafe] carry "state" around ....

Mon Jul 28 23:01:40 EDT 2008

Excerpts from Galchin, Vasili's message of Mon Jul 28 21:14:56 -0500 2008:
> ok guys .. what is this "phantom type" concept? Is it a type theory thing or
> just Haskell type concept?
> 
> Vasili

Phantom types are more of an idiom than anything else; they are types
with no real concrete representation, i.e. they only exist at compile
time, and not at runtime.

They can be used to hijack the type system, essentially. For example:

> data Expr = EInt  Int
>           | EBool Bool
>           | ECond Expr Expr Expr
>           | EAdd Expr Expr
>  deriving (Show)

This basically represents the constructs for a simple expression
language. However, wouldn't it be nice if we could say that 'EAdd' can
only take an 'EInt' or that ECond's first parameter must be 'EBool'
and check that at *compile* time? In this case, we can't. We would
have to check at runtime when we try to evaluate things that indeed,
the types fit together properly for our little 'expression language.'

But not all is lost.

With phantom types, we can 'hijack' the type system so that
it'll verify this for us. We can do this by simply making a newtype
over 'Expr' and giving it a type variable that does not show up on the
right side:

> newtype Expression a = E Expr
>  deriving (Show)

In this case, the 'a' is the phantom type. It does not show up in the
constructors, so it does not exist at runtime.

Now all we simply have to do is 'lift' all of the constructors of Expr
into their own functions that stick the data into an 'Expression'
using the E constructor:

> intE :: Int -> Expression Int
> intE = E . EInt
>
> boolE :: Bool -> Expression Bool
> boolE = E . EBool
>
> condE :: Expression Bool 
>       -> Expression a 
>       -> Expression a
>       -> Expression a
> condE (E a) (E b) (E c) = E $ ECond a b c
>
> addE :: Expression Int -> Expression Int -> Expression Int
> addE (E a) (E b) = E $ EAdd a b

You'll notice: in the type signatures above, we give the phantom type
of Expression a concrete type such as 'Int' or 'Bool'.

What does this get is? It means that if we construct values via intE
and boolE, and then subsequently use them with condE or addE, we can't
use values of the wrong type in place of one another and the type system
will make sure of it.

For example:

$ ghci
...
Prelude> :l phantom.lhs
[1 of 1] Compiling Main             ( phantom.lhs, interpreted )
Ok, modules loaded: Main.
*Main> let e1 = boolE True
*Main> let e2 = intE 12
*Main> let e3 = intE 21
*Main> condE e1 e2 e3
E (ECond (EBool True) (EInt 12) (EInt 21))
*Main> condE e2 e1 e3

<interactive>:1:6:                                                                                                                                       
    Couldn't match expected type `Bool' against inferred type `Int'
      Expected type: Expression Bool
      Inferred type: Expression Int
    In the first argument of `condE', namely `e2'
    In the expression: condE e2 e1 e3
*Main>

As you can see, we 'hijack' the type system so we can specify exactly
what type of 'Expression' functions like intE/boolE will return. We
then simply have other functions which operate over them, and also
specify *exactly* what kind of expression is necessary for them to
work. The phantom type never exists, it only is there to verify we're
"doing the right thing."

That was a bit OT though.

In the case of empty data declarations, it's essentially the same
idea. It's just a type with no actual runtime representation. You can
exploit them to force invariants in your types or numerous other
things; a good example is type arithmetic which can be found on the wiki:

http://www.haskell.org/haskellwiki/Type_arithmetic

The example here is based off of Lennart's blog post about
representing DSLs in haskell, and he also shows an example of the same
thing which uses GADTs which allow you to do the same thing (roughly.)
So I'd say give the credit wholly to him. :)

http://augustss.blogspot.com/2007/06/representing-dsl-expressions-in-haskell.html

Austin