[Haskell] Replacing and improving pattern guards with PMC syntax

Sun Oct 1 20:33:07 EDT 2006

One of the problems of pattern matching in Haskell is
that it might well be regarded as not very declarative:
function definition lines cease being generalisable equations,
and the definition of the rewriting strategy involved
in Haskell pattern matching extends over about three pages,
no matter whether you look in the Haskell report,
or for a definition of the ``functional rewriting strategy''
in the theoretical literature.

<shameless plug>
To fix exactly this problem was my original goal
when I started developing a family of Pattern Matching Calculi,
see  http://www.cas.mcmaster.ca/~kahl/PMC/ .
</shameless plug>

(Other pattern matching calculi in the literature have different goals.)

The original calculus, which I shall call ``PMC'' for the remainder of
this message, is a confluent rewriting system with a normalising strategy
that incorporates Haskell pattern matching ---
the confluence proof has been mechanised using Isabelle.

So far I never considered it important to devise a concrete syntax for PMC,
but triggered by the current pattern guards thread on haskell-prime,
I now try to give a PMC syntax that blends well with Haskell.

PMC ``automatically'' includes
the ``expression pattern'' of pattern guards,
but organised differently, see below.

I am now presenting a series of syntactic extensions to Haskell.
without assuming that the reader is familiar with PMC,
but I will also explain how these extensions relate with PMC.

One of my gripes with Haskell pattern matching is
that the anonymous variant, namely case expressions,
has an application to an argument built in,
and therefore does not by itself produce a function.

However, lambda-abstraction already allows single-alternative
pattern matching, so I propose to relax this to multi-alternative
pattern matching:

> length = \
>   [] -> 0
>   x : xs -> 1 + length xs

(Lists of alternatives correspond to the ``matchings'' of PMC,
 and concatenation of those lists (using ``;'' or layout) corresponds
 to matching alternative in PMC.
 ``\ m'' then corresponds to ``{| m |}'' in PMC.
)

For multi-argument pattern matching,
I propose to allow nested matchings:

> zip = \
>   x : xs -> y : ys -> (x,y) : zip xs ys
>   _      -> _      -> []

> take' = \
>   0 ->   _      -> []
>   n -> { []     -> []
>        ; x : xs -> x : take (pred n) xs
>        }

This means that the alternative ``->''
now can have either an expression or a list of alternatives
as its right argument.
(Having an expression there corresponds to
 ``lifting'' $\leftharpoonup \_ \rightharpoonup$ in PMC.)

Pattern guards now can be seen as a special case of
alternatives-level argument application
(``argument supply'' $\righttriangle$ in PMC).
I find it useful to write the argument first, as in PMC.

I see two options of writing argument supply in Haskell:
either using new keywords, giving rise to the syntax production

alt -> match exp with alts

or using a keyword infix operator:

alt -> exp |> alts

(Using this kind of argument supply instead of pattern guards
 also has the advantage that multiple alternatives become possible.)

I start with Simon Peyton Jones' standard example:

> clunky env v1 v2 |  Just r1 <- lookup env v1
>                   , Just r2 <- lookup env v2  = r1 + r2
>                  | otherwise                  = v1 + v2

This could now be written using either ``match _ with _'':

> clunky env v1 v2 = \
>   match lookup env v1 with
>     Just r1 ->  match lookup env v2 with
>       Just r2 ->  r1 + r2
>   v1 + v2

or infix ``_ |> _'':

> clunky env v1 v2 = \
>   lookup env v1   |> Just r1 ->
>     lookup env v2 |> Just r2 ->  r1 + r2
>   v1 + v2

Note that the syntactic structure is different from pattern guards;
here is the fully parenthesised version:

> clunky env v1 v2 = \
>   { lookup env v1   |> { Just r1 ->
>       { lookup env v2 |> { Just r2 ->  r1 + r2 }}}
>   ; v1 + v2
>   }

Boolean guards are matches against True:

> take = \
>   0 ->   _      -> []
>   n -> match n > 0 with
>      True ->
>        { []     -> []
>        ; x : xs -> x : take (pred n) xs
>        }
>      _ -> error "take: negative argument"

But Boolean guards can also be kept as syntactic sugar:

> take = \
>   0 ->   _      -> []
>   n | n > 0 ->
>        { []     -> []
>        ; x : xs -> x : take (pred n) xs
>        }
>     | otherwise -> error "take: negative argument"

Now we have all the syntax in place ---
PMC's ``failure'' is of course the empty alternative
(allowed in Haskell 98).

For the grammar, the relevant productions in Haskell 98 are:

exp10   ->      \ apat1 ... apatn -> exp        (lambda abstraction, n>=1)
        |       "case" exp "of" { alts }        (case expression)
alts    ->      alt1 ; ... ; altn               (n>=1)
alt     ->      pat "->" exp [where decls]
        |       pat gdpat [where decls]
        |                                       (empty alternative)
gdpat   ->      gd "->" exp [ gdpat ]
gd      ->      | exp

This would change to the following, where I propose to rename
``alt'' to ``match'' for ``matching'' as in PMC,
since this is now a full-fledged syntactic category of importance
similar to that of expressions (PMC works essentially because of this
separation of syntactic categories):

exp10   ->      \  { matchs }                   (lambda abstraction)
        |       "case" exp "of" { matchs }      (case expression)
matchs  ->      match1 ; ... ; matchn           (alternative, n>=1)
match   ->      expr                            (result)
        |       pat "->" { matchs } [where decls] (pattern matching)
        |       pat gdpat [where decls]         (pat. mat. w. Boolean guards) 
        |                                       (empty alternative)
        |       "match" exp "with" { matchs }       (pattern argument supply)
gdpat   ->      gd "->" { matchs } [ gdpat ]
gd      ->      | exp

(case expressions would than be syntactic sugar:

> case exp of {matchs} = \ match exp with {matchs}
)

These productions imply that we would loose multiple abstraction:

\ x y z -> 3

becomes

\ x -> y -> z -> 3
.

For reasons of declarativity, I am also tempted to argue
that ``='' should only be allowed for one-line definitions,
so we would write:

> f 0 -> 1
> f n -> n * f (n - 1)

An easy-to-remember motto for this could be
  ``if it can fall through, use ->''.

This would of course break every single textbook and tutorial,
so I won't insist...

No matter whether with ``->'' or with ``='', the above would of course
be syntactic sugar for:

> f = \
>   0 -> 1
>   n -> n * f (n - 1)

LAWS
====

The most important nice thing is that we can now use equational reasoning
for expressions involving pattern matching.

The PMC laws (see the papers for more explanation)
translate into the syntax presented above as follows
(note that most of these equations are between matchings,
 i.e., between (lists of) alternatives,
 and only some are between expressions):

---------------------------------------------------------------------

Failure as unit:

> {- empty alternative -} ; alts = alts

Failure lifting:

> \ { } = undefined

Result lifting:

> \ { exp } = exp

Argument propagation into matching:

> (\ { alts }) exp = \ { match exp with { alts } }

Argument propagation into expression:

> match arg with { exp } = exp arg

First match wins (no backtracking after first success):

> exp ; alts = exp

Failure-strictness of argument supply:

> match arg with { } = {- empty alternative -}

Distributivity of argument supply over alternative:

> match arg with { alts1 ; alts2 } =
>                   match arg with { alts1 } ; match arg with { alts2 }

beta (v is a variable):

> match arg with { v -> { alts } } = alts[v \ arg]      -- substitution

Match when matching the same constructor:

> match c(e1,...,en) with {c(p1,...,pn) -> {alts}}
>  = match e1 with {p1 -> ... -> match en with pn -> {alts}}

Mismatch when matching different constructors:

> match c(e1,...,em) with {d(p1,...,pn) -> {alts}}
>  =  {- empty alternative -}

Strictness of matching against constructors:

> match undefined with {c(p1,...,pn) -> {alts}} = undefined

---------------------------------------------------------------------

Together with

 Strictness of function application in the function:

> undefined arg = undefined  -- already in Haskell

, these laws, when oriented as rules, produce a confluent rewriting system,
and the standard definition of strong head normal form
induces a normalising strategy.

Besides extending syntactic convenience even beyond pattern guards,
adopting the abstract syntax of PMC would therefore also simplify
the definition of the semantics of pattern matching in Haskell,
and extend the scope of equational reasoning to pattern matching.

For the concrete syntax I have no strong feelings,
but hope that I provided a starting point for discussion.

Wolfram