[Haskell-cafe] >>= definition for list monad in ghc

austin seipp as at hacks.yi.org
Mon May 16 20:49:35 CEST 2011


Looking at the Core for an utterly trivial example (test x = concatMap
k x where k i = [i..i*2]), the foldr definition seems to cause a
little extra optimization rules to fire, but the result seems pretty
big. The definition using concatMap results in core like this:

main_go2 =
  \ (ds_aqV :: [Int]) ->
    case ds_aqV of _ {
      [] -> [] @ Int;
      : y_ar0 ys_ar1 ->
        case y_ar0 of _ { I# x_arj ->
        let {
          y1_ase [Dmd=Just L] :: Int#

          y1_ase = *# x_arj 2 } in
        let {
          n_sRv :: [Int]

          n_sRv = main_go2 ys_ar1 } in
        case ># x_arj y1_ase of _ {
          False ->
            letrec {
              go_sRx [Occ=LoopBreaker] :: Int# -> [Int]

              go_sRx =
                \ (x1_asi :: Int#) ->
                  :
                    @ Int
                    (I# x1_asi)
                    (case ==# x1_asi y1_ase of _ {
                       False -> go_sRx (+# x1_asi 1);
                       True -> n_sRv
                     }); } in
            go_sRx x_arj;
          True -> n_sRv
        }
        }
    }

But with the foldr definition, we get:

Main.main_go [Occ=LoopBreaker] :: GHC.Prim.Int# -> [GHC.Types.Int]
[GblId, Arity=1, Caf=NoCafRefs, Str=DmdType L]
Main.main_go =
  \ (x_asu :: GHC.Prim.Int#) ->
    GHC.Base.++
      @ GHC.Types.Int
      (GHC.Enum.eftInt x_asu (GHC.Prim.*# x_asu 2))
      (case x_asu of wild_B1 {
         __DEFAULT -> Main.main_go (GHC.Prim.+# wild_B1 1);
         10 -> GHC.Types.[] @ GHC.Types.Int
       })
end Rec }

There seems to be a binding for my 'test' example, but it seems to be
abandoned in the final core for some reason (there are no references
too it, but it's not eliminated as an unused binding?) Main simply
calls the simplified/inlined version above.

As you can see, with the foldr definition, GHC is able to fuse the
iteration of the input list with the generation of the result - note
the 'GHC.Base.++' call with the first argument being a list from
[x..x*2], and the second list to append being a recursive call. So it
simplifies the code quite a bit - it doesn't really end up traversing
a list, but instead generating a list only in this case, as it stores
current iteration in a Int# and has the upper limit inlined into the
case statement.

I don't know why GHC doesn't have this rule by default, though. We can
at least rig it with a RULES pragma, however:

$ cat concatmap.hs
module Main where

{-# RULES
"concatMap/foldr" forall x k. concatMap k x = foldr ((++) . k) [] x
  #-}

test :: [Int] -> [Int]
--test x = foldr ((++) . k) [] x
test x = concatMap k x
  where k i = [i..i*2]

main :: IO ()
main = do
  print $ test [1..10]

$ ghc -fforce-recomp -O2 -ddump-simpl concatmap.hs

                                                   1 ↵
[1 of 1] Compiling Main             ( concatmap.hs, concatmap.o )

==================== Tidy Core ====================
Rec {
Main.main_go [Occ=LoopBreaker] :: GHC.Prim.Int# -> [GHC.Types.Int]
[GblId, Arity=1, Caf=NoCafRefs, Str=DmdType L]
Main.main_go =
  \ (x_ato :: GHC.Prim.Int#) ->
    GHC.Base.++
      @ GHC.Types.Int
      (GHC.Enum.eftInt x_ato (GHC.Prim.*# x_ato 2))
      (case x_ato of wild_B1 {
         __DEFAULT -> Main.main_go (GHC.Prim.+# wild_B1 1);
         10 -> GHC.Types.[] @ GHC.Types.Int
       })
end Rec }
...
...
...
------ Local rules for imported ids --------
"concatMap/foldr" [ALWAYS]
    forall {@ b_aq7 @ a_aq8 x_abH :: [a_aq8] k_abI :: a_aq8 -> [b_aq7]}
      GHC.List.concatMap @ a_aq8 @ b_aq7 k_abI x_abH
      = GHC.Base.foldr
          @ a_aq8
          @ [b_aq7]
          (GHC.Base..
             @ [b_aq7]
             @ ([b_aq7] -> [b_aq7])
             @ a_aq8
             (GHC.Base.++ @ b_aq7)
             k_abI)
          (GHC.Types.[] @ b_aq7)
          x_abH


Linking concatmap ...
$

Maybe it should be added to the base libraries?

On Mon, May 16, 2011 at 1:03 PM, Andrew Coppin
<andrewcoppin at btinternet.com> wrote:
> On 16/05/2011 10:07 AM, Michael Vanier wrote:
>>
>> Usually in monad tutorials, the >>= operator for the list monad is
>> defined as:
>>
>> m >>= k = concat (map k m) -- or concatMap k m
>>
>> but in the GHC sources it's defined as:
>>
>> m >>= k = foldr ((++) . k) [] m
>>
>> As far as I can tell, this definition is equivalent to the previous one
>> (correct me if I'm wrong), so I was wondering why this definition was
>> chosen instead of the other one. Does anybody know?
>
> Any time you see a more convoluted definition which ought to be equivilent
> to a simpler one, the answer is usually "because this way makes some
> important compiler optimisation fire". It's even possible that the
> optimisation in question would fire anyway now, but way back when the code
> was written, the compiler wasn't as smart.
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>



-- 
Regards,
Austin



More information about the Haskell-Cafe mailing list