compilation of pattern-matching?

Thu Mar 26 13:39:14 EDT 2009

Sorry to be the odd man out - perhaps an example will help to
clarify my reading of the language definition.

>I find this "reordering" discussion somewhat nonsensical.
>Haskell specifies top-to-botton, left-to-right matching.
>This specifies exactly which tests that have to be made and in what order,
>and ghc does exactly those and in the correct order.
>
>One can have a perception that when there are multiple arms in a case
>decided by a single test, then the first arm should somehow be reached 
>quicker than the second one etc But that is something that the Haskell 
>standard has never promised, nor has any compiler ever promised this.

When you say "test", which can decide multiple arms in a case, do you
mean that, say 'null e' being 'True' implies that matching '[]' against 'e' 
will succeed while matching '_:_' against 'e' will fail? Because that kind
of test is not what the Haskell'98 report talks about. It talks about 
individual matches of  expressions against alternatives, and it does
specify precisely in what order these are to be performed, hence
which pattern is reached first:

    A case expression is evaluated by pattern matching the expression e 
    against the individual alternatives. The alternatives are tried sequentially, 
    from top to bottom.

    Patterns are matched against values. Attempting to match a pattern can 
    have one of three results: it may fail; it may succeed, returning a binding 
    for each variable in the pattern; or it may diverge (i.e. return _|_). 
    Pattern matching proceeds from left to right, ..

Nothing abstract about that. So for a function application 'f e', where

    f [] = True
    f (_:_) = False

the Haskell'98 report specifies that 'e' is first matched against '[]', then
(if that fails) against (_:_). So the first pattern is reached/tried before
the second. Of course, we can make use of the fact that these two
matches are complementary, so we only need one "test" to decide,
and if there are further '[]' patterns after the first, we don't have to
match them again, but that is all in the realm of optimization, not 
language definition. 

The definition explicitly provides room for such optimization, but 
it still requires conformance to the rules set out in the definition, 
which include:

    case e of {p1->e1;p2->e2} = 
    case e of {p1->e1;_->case e of {p2->e2;_->error "No match"}}

GHC violates that rule, as we can demonstrate:

    newtype N = N Int deriving (Show,Eq)

    instance Num N where
      fromInteger 0 = error "0"
      fromInteger 1 = N 0
      fromInteger _ = N 1

    f x = case x of
            1 -> False
            0 -> True

    g x = case x of
            1 -> False
            _ -> case x of
                  0 -> True
                  _ -> error "No match"

    main = do
      print $ g (N 0)
      print $ f (N 0)

    -- ghc
    $ ghc -w -e main PMOrderSpec.hs
    False
    <interactive>: 0

    -- hugs
    Main> main
    False
    False

One can presumably construct similar examples using 'Data.String.IsString',
or using pattern guards, so just fixing the special case of 'Num' is not going
to help, but this example seems firmly within Haskell'98.

>And to me such a perception is counter-intuitive; Haskell is about
>specifying functions abstractly so order should only matter when it's
>a matter of semantics.

Any semantics should conform to the language definition, right?

>On the other hand, adding some kind of pragma that indicates the
>likelyhood of a branch seems quite sensible to me.

Which doesn't mean that I wouldn't try such a pragma, if it existed.
I'm just having difficulties understanding this universal resistance to
what seems one of the few unambiguously defined parts of Haskell'98;-)

Claus