[Haskell-cafe] Type system speculation

Wed Dec 9 15:38:14 EST 2009

People in the Haskell community get awfully excited about Haskell's type 
system.

When I was first learning Haskell, I found this rather odd. After all, a 
"type" is just a flat name that tells the compiler how many bits to 
allocate and which operations to allow, right?

As I read further, I quickly discovered that in Haskell, the type system 
is something much more powerful. In a normal language, the type system 
prevents you from accidentally trying to multiply a string by a binary 
tree of integers, or pass a function an signed integer when it's 
expecting an unsigned integer. But in Haskell, the combination of 
parametric polymorphism, typeclasses and automatic type inference 
combine to form a potent mixture. Even without any exotic language 
extensions, in Haskell you can use the type system to *prove stuff about 
your programs*! And not necessary stuff that's anything to do with the 
usual notion of a "data type". You can prove that lists are non-empty or 
even of a specific size. You can prove that optional values get checked 
before they are used. And you can prove that user input is sanitised 
before being passed to sensitive functions. (Or rather, you can make 
your program simply not compile if the required properties are not 
guaranteed.)

Let's take an example. Suppose you have some functions that obtain input 
from a user, and you have some other functions that do something with 
this data. What you can do is something like this:

  newtype Data x y = Data x

  data Checked
  data Unchecked

  check_string :: Data String Unchecked -> Data String Checked

Now make it so that everything that obtains data from the user returns 
Data String Unchecked, and everything that uses a string to do something 
potentially hazardous demands Data String Checked. (And any functions 
that don't care whether the string is checked or not can just use Data 
String x.)

Only one problem: Any functions that expect a plain String now won't 
work. Particularly, the entirity of Data.List will refuse to accept this 
stuff. Which is kind of galling, considering that a Data String Checked 
*is* a String!

What we're really trying to do here is attach additional information to 
a value - information which exists only in the type checker's head, but 
has no effect on runtime behaviour (other than determining whether we 
*get* to runtime). As far as I can tell, Haskell does not provide any 
way to take an existing type and attach additional information to it in 
such a way that code which cares about the new information can make use 
of it, but existing code which doesn't care continues to work. Any 
comments on this one?

On a rather unrelated note, I found myself thinking about the Type 
Families extension (or whatever the hell it's called). I found that I 
can do this:

  class Collection c where
    type Element c :: *
    empty :: c
    first :: c -> Maybe (Element c)
    ...

This works for collections that can contain *anything*:

  instance Collection [x] where
    type Element [x] = x
    ...

It works for collections that can only contain *one* type:

  instance Collection ByteString where
    type Element ByteString = Word8
    ...

And it works for collections that can handle more than one type, but 
less than *every* type:

  instance Ord x => Collection (Set x) where
    type Element (Set x) = x
    ...

Something quite interesting is happening here: this "Element" thing is 
almost like a *function* which takes one type and returns another. A 
function that operates on types themselves.

Now suppose I want to write a function that converts one kind of 
collection into another:

  convert :: (Collection c1, Collection c2) => c1 -> c2

Erm, no. That doesn't work AT ALL. What we want to say is that the 
element types are constrained (specifically, they're identical).

I searched for quite some time looking for a way to say this. In the 
end, I gave up and let the compiler deduce the type signature 
automatically. Apparently the correct signature is

  convert :: (Collection c1, Collection c2, Element c1 ~ Element c2) => 
c1 -> c2

Apparently the bizare syntax "Element c1 ~ Element c2" means that these 
two types are supposed to be equal. (Why not just an equals sign?)

This got me thinking... Maybe what we'd really like to say is something like

  convert :: (Collection c1 {Element = x}, Collection c2 {Element = x}) 
=> c1 -> c2

In other words, make the assosiated type like a named field of the 
class, using syntax similar to the named fields that values can have.

And then, if we can do it to classes, why not types?

  data Map {keys :: *, values :: *} = ...

  insert :: (Ord k) => k -> v -> Map {keys = k, values = v} -> Map {keys 
= k, values = v}

Now, it seems reasonable that if one of the type fields is 
unconstrained, you do not need to mention it:

  keys :: (Ord k) => Map {keys = k} -> [k]

  values :: Map {values = v} -> [v]

If we can have some mechanism for adding new type fields to an existing 
type, we are one step away from the problem at the beginning. You could 
take the existing String type, and add (by whatever the mechanism is) a 
new "safety" field to it. All of Data.List ignores this field, but any 
new code you write can choose to use it if required.

At least, that's my vague idea. Anybody like it?

(My first thought is that "Map {keys = k, values = v}" is going to start 
getting tedious to type very, very quickly. Perhaps we should do 
something like values :: (m :: Map {values = v}) => m -> [v], or even 
allow values :: (m :: Map {}) => m -> [values m] as an alternative? IDK.)

So there you have it. Multiple paragraphs of semi-directed rambling 
about type systems. All written by somebody who doesn't even know what a 
Skolem variable is. :-}