[Haskell-beginners] pattern match vs. pure function

Tue Oct 19 20:23:16 EDT 2010

On Wednesday 20 October 2010 01:23:49, Bastian Erdnüß wrote:
> Suppose I would want to write a function rmap that acts like
>
>   rmap [f,g,h] x == [f x, g x, h x]
>
> Do I gain any advantage or disadvantage (beside readability) from using
>
>   rmap = flip $ map . flip id
>
> over
>
>   rmap []     _ = []
>   rmap (f:fs) x = f x : rmap fs x
>
> ?
>
> I could imagine Haskell can optimize the first version better since it
> refers to the built in map function.  But beside that, does Haskell
> struggle with the combinatory stuff in the first version?  Or does it
> get optimized away?

Well, you can ask GHC what it does with the definitions.
Let's compile

module RMap where

rmap :: [a -> b] -> a -> [b]
rmap = flip $ map . flip id

recmap :: [a -> b] -> a -> [b]
recmap [] _     = []
recmap (f:fs) x = f x : recmap fs x

and look at the generated Core (obtained via -ddump-simpl).
The @ x_yz things are type annotations, otherwise Core is pretty close to 
Haskell. It takes a bit to get used to reading Core, but it's not that 
difficult (as long as the functions are short).

First, without optimisations:

==================== Tidy Core ====================
Rec {
RMap.recmap :: forall a_adg b_adh.
               [a_adg -> b_adh] -> a_adg -> [b_adh]
GblId
[Arity 2
 NoCafRefs]
RMap.recmap =
  \ (@ a_adA)
    (@ b_adB)
    (ds_dee :: [a_adA -> b_adB])
    (ds1_def :: a_adA) ->
    case ds_dee of _ {
      [] -> GHC.Types.[] @ b_adB;
      : f_adk fs_adl ->
        GHC.Types.:
          @ b_adB
          (f_adk ds1_def)
          (RMap.recmap @ a_adA @ b_adB fs_adl ds1_def)
    }
end Rec }

RMap.rmap :: forall a_adi b_adj.
             [a_adi -> b_adj] -> a_adi -> [b_adj]
GblId
[]
RMap.rmap =
  \ (@ a_adE) (@ b_adF) ->
    GHC.Base.$
      @ (a_adE -> [a_adE -> b_adF] -> [b_adF])
      @ ([a_adE -> b_adF] -> a_adE -> [b_adF])
      (GHC.Base.flip @ a_adE @ [a_adE -> b_adF] @ [b_adF])
      (GHC.Base..
         @ ((a_adE -> b_adF) -> b_adF)
         @ ([a_adE -> b_adF] -> [b_adF])
         @ a_adE
         (GHC.Base.map @ (a_adE -> b_adF) @ b_adF)
         (GHC.Base.flip
            @ (a_adE -> b_adF)
            @ a_adE
            @ b_adF
            (GHC.Base.id @ (a_adE -> b_adF))))

-- both are pretty exactly the code written in Haskell.
The explicit recursion recmap is small and efficient, it looks at the list 
whether there are still functions left, if so, apply the first of those to 
the value and recur. No cruft here.
The other one, flip $ map . flip id is, well it's small too, but it's 
horrible to look at (all those @s). Every time you invoke that function, 
you call flip twice, ($), map, (.) and id. If you use the function on short 
lists, those calls generate significant overhead, but for long lists that 
doesn't matter.

Now with optimisations. It doesn't matter whether you use -O1 or -O2, both 
produce exactly the same Core for both functions.
And, for recmap, they produce exactly the same Core as -O0 did.
So, recmap is pretty good code, GHC doesn't know how to improve it.
rmap on the other hand changed:

RMap.rmap :: forall a_adi b_adj.
             [a_adi -> b_adj] -> a_adi -> [b_adj]
GblId
[Arity 2
 NoCafRefs
 Str: DmdType SL]
RMap.rmap =
  \ (@ a_adE)
    (@ b_adF)
    (x_aeq :: [a_adE -> b_adF])
    (y_aer :: a_adE) ->
    GHC.Base.map
      @ (a_adE -> b_adF)
      @ b_adF
      (\ (y1_XeE :: a_adE -> b_adF) -> y1_XeE y_aer)
      x_aeq

There, that looks much better than before (and for short lists, it performs 
much better, but for long lists, the difference should be negligible).
flip, ($) and (.) have been inlined and eliminated, what we get is

rmap fs x = map (\f -> f x) fs

, which is really nice. In fact, it's even nicer than what we got for 
recmap, because the compiler knows map, there are rewrite rules, which can 
produce much better code when the function is used, for example, it's 
possible that the list of functions is completely eliminated and a use is 
rewritten to a direct loop instead of allocating a list cell for each 
function in the list. And it can profit from the rule

map f . map g = map (f . g)

which is much harder to detect for the direct recursion.

> And how "expensive" are pattern matches on the other side, anyway?

Depends on the pattern, of course. Matching "oh" against [] is much cheaper 
than matching "This is an expensive car" against

'T':'h':'i':'s':' ':'a':'n':' ':'e':'x':'p':'e':'n':'s':'i':'v':'e':' 
':'p':'a':'t':'t':'e':'r':'n':' ':'m':'a':'t':'c':'h':_

But generally, pattern matching is quite cheap. The (well, one) advantage 
of higher order functions like map is that they're easier to optimise when 
they're used, given that somebody has taught the compiler how.

>
> Thanks for reading,
> Bastian