[Haskell-cafe] Musings on type systems

Fri Nov 19 22:05:01 EST 2010

On Fri, Nov 19, 2010 at 1:05 PM, Andrew Coppin
<andrewcoppin at btinternet.com> wrote:
> So is Either what is meant by a "sum type"?
> Similarly, (X, Y) [...] is this a "product type"?

Yes, and yes, although more generally speaking

data Test = A Int Bool | B Test | C [Test]

is a recursive ADT composed of sums (A ...| B... | C...) and products
((Int, Bool), Test, [Test])

> Notionally (->) is just another
> type constructor, so functions aren't fundamentally different to any other
> types - at least, as far as the type system goes.

Sort of, but I think your discussion later gets into exactly why it
*is* fundamentally different.

> Where *the hell* do GADTs fit in here? Well, they're usually used with
> phantom types, so I guess we need to figure out where phantom types fit in.

Well, I find it's better to think of GADTs as types that have extra
elements holding proofs about their contents which you can unpack.

For example:

data Expr a where
   Prim :: a -> Expr a
   Lam :: (Expr x -> Expr y) -> Expr (x -> y)
   App :: Expr (x -> a) -> Expr x -> Expr a

(The type variables are arbitrary; I chose these specific odd ones for
a reason which will shortly become clear)

This can be reconstructed as an ADT with existentials and constraints:

data Expr a =
   Prim a
 | forall x y. (a ~ (x -> y)) => Lam (Expr x -> Expr y)
 | forall x. App (Expr x -> a) (Expr x)

Conceptually, here is what these types look like in memory; [x] is a
box in memory holding a pointer to an object of type x

Expr a =
   tag_Prim, [a]
or
   tag_Lam, type x, type y, proof that a = (x -> y), [Expr x -> Expr y]
or
   tag_App, type x, [Expr x -> a], [Expr x]

Now, because we have type safety the compiler can actually erase all
the types and equality proofs (although in the case of typeclass
constraints it keeps the typeclass dictionary pointer around so that
it can be used to call functions in the class).

> To the type checker, Foo Int and Foo Bool are two totally seperate types. In
> the phantom type case, the set of possible values for both types are
> actually identical. So really we just have two names for the same set. The
> same thing goes for a type alias (the "type" keyword).

Not exactly; in the phantom type case, the two sets ARE disjoint in a
way; there are no objects that are both Foo Int and Foo Bool (although
there may be objects of type forall a. Foo a -- more on that later).
Whereas the type keyword really creates two names for the same set.

> So what would happen if some crazy person decided to make the kind system
> more like the type system? Or make the type system more like the value
> system? Do we end up with another layer to our cake? Is it possible to
> generalise it to an infinite number of layers? Or to a circular hierachy? Is
> any of this /useful/ in any way?

Absolutely!  In fact, dependent type systems do exactly this; values
can be members of types and vice versa.  For example, in a dependently
typed language, you can write something like

data Vector :: * -> Int -> * where
    Nil :: forall a::*. Vector a 0
    Cons :: forall a::*, n::Int. a -> Vector a n -> Vector a (n+1)

append :: forall a::*, m::Int, n::Int. Vector a m -> Vector a n ->
Vector a (m+n)
append a 0 n Nil v = v
append a m n (Cons a as) v = Cons a (append as v)

However, -> is special here, because it works at both the type level
and the value level!  That is, (Vector Int :: Int -> *) can be applied
to the value 5 to create (Vector Int 5), the set of vectors with 5
elements of type Int.  And in fact, -> is just a generalization of
'forall' that can't refer to the object on it's right hand side:

append :: (
   forall a :: *.
   forall m :: Int.
   forall n :: Int.
   forall v1 :: Vector a m.
   forall v2 :: Vector a n.
   Vector a (m+n)
 )

Since we don't refer to v1 or v2, we can abbreviate the 'forall' with ->:

append :: (
   forall a :: *.
   forall m :: Int.
   forall n :: Int.
   Vector a m ->
   Vector a n ->
   Vector a (m+n)
 )

Now, however, you can run arbitrary code at the type level; just
create an object with the type Vector Int (fib 5) and the compiler
needs to calculate (fib 5) to make sure the length is correct!

In compiler circles it's generally considered a bad thing when the
compiler doesn't terminate, (one reason: how do I know if the bug is
in my code or the compiler?) so these languages tend to force your
code (or at least the code that can run at the type level) to
terminate.  There's another benefit to this: if it's possible that a
type-level expression might not terminate, you lose type erasure and
have to do the calculation at runtime.  Otherwise you can create
unsound expressions:

cast :: forall a::*, b::*. (a = b) -> a -> b
cast a b p x = x  -- note that p is never evaluated, we just require
the proof in scope!

bottom :: forall a. a
bottom a = bottom a -- if we evaluate this, we infinite loop

unsound :: forall a::*, b::*. a -> b
unsound a b x = cast a b (bottom (a=b)) x

print (unsound Int String 0) -- BOOM! Headshot to your type-safety.

So, these languages tend to be very heavily on the 'not turing
complete' side of the fence, because they have restrictions on
nonterminating programs.  But they still manage to be useful!

There's one nit remaining, though: that pesky *.  What's its type?

Sadly, * :: * does not work, it leads to a paradox.  But you can make
a hierarchy: * :: *1, *1 :: *2, *2 :: *3, etc.  Some existing
compilers do this, some even going so far as to automatically lift
your types into the right 'universe' (*-level), and then complaining
if there is a cycle that can't be resolved (x :: *n, y :: *m, with n >
m and m > n).  And sometimes they even provide a switch (like
Haskell's UndecidableInstances) that lets you collapse the *'s and not
care.  You just have to promise not to write Russell's Paradox at the
type level. :)

  -- ryan