Rank-2 polymorphism and pattern matching

Sun Jan 6 11:20:42 EST 2008

On Jan 4, 2008 5:15 AM, Simon Peyton-Jones <simonpj at microsoft.com> wrote:
> | > The following won't compile for me
> | >
> | > isnull :: (forall a . [a]) -> Bool
> | > isnull ([] :: forall b . [b]) = True
> | >
> | >    Couldn't match expected type `forall b. [b]'
> | >            against inferred type `[a]'
> | >     In the pattern: []
>
> This is a pretty strange thing to do, to match a polymorphic argument against a data constructor.  I guess you'd expect this to work too, perhaps?
>
> f :: forall a.  (forall b. Either a b) -> a
> f (Left x) = x
>
> I grant that arguably these should work, but I think it'd be quite tricky to make it do so, because it'd involve re-generalising in the pattern.  Furthermore, I can't see any use for it.  Just remove the type signature from the pattern.
>
> One could argue that it's a bad sign that the pattern typechecker should have difficulty with this.  But until it turns out to be important I'm not going to lose sleep over it!

This is literate Haskell. It's in the LaTeX style, since my mail
reader won't change the quoting mark from '>'. It is not a valid LaTeX
file.

My reason for wanting pattern matching on values of polymorphic types
is a kind of first-level refinement types. I'm going to demonstrate
using the "risers" function, as presented in Dana N. Xu's ESC/Haskell
(http://research.microsoft.com/Users/simonpj/papers/verify/escH-hw.ps
or http://www.cl.cam.ac.uk/~nx200/research/escH-hw.ps ), which
references Neil Mitchell's Catch
(http://www-users.cs.york.ac.uk/~ndm/catch/ ). I'll also be
referencing an example from Tim Freeman's thesis on refinement types
for ML (http://www.cs.ucla.edu/~palsberg/tba/papers/freeman-thesis94.pdf
)

\begin{code}

{-# OPTIONS -fglasgow-exts #-}
-- The LANGUAGE pragma is usually a pain for exploratory programming.

\end{code}

This is the risers function, as presented by Xu. It returns the sorted
sublists of a list. None of the lists in the return value are empty,
and if the argument is non-empty, the return value is also non-empty.

\begin{code}

risersXu :: (Ord t) => [t] -> [[t]]
risersXu [] = []
risersXu [x] = [[x]]
risersXu (x:y:etc) =
    let ss = risersXu (y : etc)
    in case x <= y of
         True -> (x : (head ss)) : (tail ss)
         False -> ([x]) : ss

\end{code}

Xu uses head and tail. These are safe here by a proof created by ESC/Haskell.

Here is the risers function according to Mitchell:

\begin{code}

risersMitchell :: Ord a => [a] -> [[a]]
risersMitchell [] = []
risersMitchell [x] = [[x]]
risersMitchell (x:y:etc) = if x <= y
                           then (x:s):ss
                           else [x]:(s:ss)
    where (s:ss) = risersMitchell (y:etc)

\end{code}

The unsafe part here is the pattern in the where clause. Mitchell
presents a tool to prove this safe.

These unsafe operations depend on the second property of the function:
non-null inputs generate non-null outputs. We could write a type for
this functions using a trusted library with phantom types for branding
(http://okmij.org/ftp/Computation/lightweight-dependent-typing.html ).
This technique (called lightweight static capabilities) can do this
and much else, as well, but clients lose all ability to pattern match
(even in case). We could also write a type signature guaranteeing this
by using GADTs.  Without either one of these, incomplete pattern
matching or calling unsafe head and tail on the result of the
recursive call seems inevitable.

Here's another way to write the function which does away with the need
for second property on the recursive call, substituting instead the
need for the first property, that no lists in the return value are
empty:

\begin{code}

risersAlt :: (Ord t) => [t] -> [[t]]
risersAlt [] = []
risersAlt (x:xs) =
    case risersAlt xs of
      [] -> [[x]]
      w@((y:ys):z) ->
          if x <= y
          then (x:y:ys):z
          else ([x]):w
      ([]:_) -> error "risersAlt"

\end{code}

The error is never reached.

Though ensuring the second property with our usual types seems tricky,
ensuring the first is not too tough:

\begin{code}

type And1 a = (a,[a])

risersAlt' :: Ord a => [a] -> [And1 a]
risersAlt' [] = []
risersAlt' (x:xs) =
    case risersAlt' xs of
      [] -> [(x,[])]
      w@(yys:z) ->
          if x <= fst yys
          then (x,fst yys : snd yys):z
          else (x,[]):w

\end{code}

It is now much easier to see that risers is safe: There is one pattern
match and one case, and each is simple. No unsafe functions like head
or tail are called.

It does have two disadvantages. First, the second property is still
true, but the function type does not enforce it. This means that any
other callers of risers may have to use incomplete pattern matching or
unsafe functions, since they may not be so easy to transform. It is my
intuition that it is not frequently the case that these functions are
tricky to transform, but perhaps Neil Mitchell disagrees. We could fix
this by writing another risers function with type And1 a -> And1 (And1
a), but this brings us to the second problem: And1 a is not a subtype
of [a]. This means that callers of our hypothetical other risers
function (as well as consumers of the output from risersAlt') must
explicitly coerce the results back to lists.

Let's write more expressive signatures using the principle from my
original question.

\begin{code}

data List a n = Nil n
              | forall r . a :| (List a r)

\end{code}

[] is the only value of type forall a . [a], assuming no values of
type forall a . a. Similarly, the only values of type forall a . List
a n use the Nil constructor, and the only values of type forall n .
List a n use the :| constructor.

\begin{code}

infixr :|

box x = x :| Nil ()

type NonEmpty a = forall n . List a n

onebox :: NonEmpty Int
onebox = box 1

onebox' :: List Int Dummy
onebox' = onebox

data Dummy

-- This doesn't compile
-- empty :: NonEmpty a
-- empty = Nil ()

\end{code}

NonEmpty a is a subtype of List a x for all types x.

Now we get to the disappointing part of the show in which I pine for
first-class existentials and pattern matching on polymorphic values.

\begin{code}

data Some f a = forall n . Some (f a n)

safeHead :: NonEmpty a -> a
safeHead x = unsafeHead x where
    unsafeHead (x :| _) = x

safeTail :: NonEmpty a -> Some List a
safeTail x = unsafeTail x where
    unsafeTail (_ :| xs) = Some xs

\end{code}

Unfortunately, we'll be forced to Some and un-Some some values, and it
takes some thinking to see that safeHead and safeTail are actually
safe.

\begin{code}

risersMitchell' :: Ord a => List a n -> List (NonEmpty a) n
risersMitchell' (Nil x) = (Nil x)
risersMitchell' (x :| Nil _) = box (box x)
risersMitchell' ((x {- :: a -}) :| y :| etc) =
    case risersMitchell' (y :| etc) {- :: NonEmpty (NonEmpty a) -} of
      Nil _ -> error "risersMitchell'"
      s :| ss -> if x <= y
                 then (x :| s) :| ss
                 else (box x) :| s :| ss

\end{code}

Since we can't put the recursive call in a where clause, we must use a
case with some dead code. The type annotations are commented out here
to show they are not needed, but uncommenting them shows that the
recursive call really does return a non-empty lists, and so the Nil
case really is dead code.

This type signature ensures both of the properties listed when
introducing risers. The key to the
non-empty-arguments-produce-non-empty-results property is that the
variable n in the signature is used twice. That means applying
risersMitchell' to a list with a fixed (or existential) type as its
second parameter can't produce a NonEmpty list.

\begin{code}

risersXu' :: Ord a => List a r -> List (NonEmpty a) r
risersXu' (Nil x) = Nil x
risersXu' (x :| Nil _) = box (box x)
risersXu' (x :| y :| etc) =
    let ss = risersXu' (y :| etc)
    in case x <= y of
         True -> case safeTail ss of
                   Some v -> (x :| (safeHead ss)) :| v
         False -> (box x) :| ss

\end{code}

Here we see that the type annotation isn't necessary to infer that
risers applied to a non-empty list returns a non-empty list. The value
ss isn't given a type signature, but we can apply safeHead and
safeTail. The case matching on safeTail is the pain of boxing up
existentials.

This is the first version of risers with a type signature that gives
us the original invariant Xu and Mitchell can infer, as well as
calling no unsafe functions and containing no incomplete case or let
matching. It also returns a list of lists, just like the original
function.

With first-class existentials, this would look just like Xu's risers
(modulo built-in syntax for lists). With pattern matching for
polymorphic values, risersMitchell' would look just like Mitchell's
original risers, but be safe by construction. Pattern matching for
polymorphic values would also allow non-trusted implementations of
safeTail and safeHead to be actually safe.

Since GADTs and lightweight static capabilities are already available,
I do not know how compelling this example is. I think there are much
more compelling examples for first-class existentials.

In any case, here's one more example, this one from Freeman's thesis
on refinement types. We create a type of boolean expressions with
variables. We can evaluate terms without any free variables, so we
distinguish them using the trick above. Unfortunately, this time the
constructor restriction isn't just at the top level, but all the way
down. We must, therefore, parameterize the recursion in the data type.

\begin{code}

data BoolExp' a r = And r r
                  | Or r r
                  | Not r
                  | Tru
                  | Fals
                  | Var a String

newtype Mu' f = Mu' (forall a . f a (Mu' f))

type Ground = Mu' BoolExp'

data Mu f = forall a . Mu (f a (Mu f))

type BoolExp = Mu BoolExp'

type Base = forall a r . BoolExp' a r

eval :: Ground -> Base
eval (Mu' x) =
    case x of
      And y z ->
          case (eval y,eval z) of
            (Tru,x) -> x
            _ -> Fals
      Or y z ->
          case (eval y,eval z) of
            (Fals,x) -> x
            _ -> Tru
      Not y ->
          case eval y of
            Tru -> Fals
            _ -> Tru
      Tru -> Tru
      Fals -> Fals
      _ -> error "eval"

\end{code}

The error in eval is never reached.

Unfortunately, Ground is not a subtype of BoolExp, so this seems like
a less compelling example.