Question about inliner behaviour with a small function

Michael Walker mike at
Wed Jul 22 13:46:14 UTC 2015


I managed to shrink a bizarre memory issue down to this (probably minimal)

    module Main where
    f :: [Int] -> Int
    f xs = length is where
      is = [ i | (_, i) <- pairs ys ] :: [(Int,Int)]
      ys = zip [0..] xs :: [(Int, Int)]
      {-# INLINE pairs #-}
      pairs xs = zip xs $ tail xs
    main :: IO ()
    main = print $ f xs where
      xs = replicate 1000000 0

With the INLINE pragma, this allocates 264,049,584 bytes (compiled with `ghc -O2
-prof -fprof-auto inline.hs), without the pragma it allocates 336,049,512 bytes.

Dropping the `main` definition (and renaming the module to "Foo") and examining
the core, the key difference seems to be in how the list comprehension is

With the INLINE pragma:

    Rec {
    Foo.f_go [Occ=LoopBreaker]
      :: [(Int, Int)] -> [(Int, Int)] -> [(Int, Int)]
    [GblId, Arity=2, Caf=NoCafRefs, Str=DmdType <S,1*U><L,1*U>]
    Foo.f_go =
      \ (ds_a14a :: [(Int, Int)]) (_ys_a14b :: [(Int, Int)]) ->
        case ds_a14a of _ [Occ=Dead] {
          [] -> GHC.Types.[] @ (Int, Int);
          : ipv_a14g ipv1_a14h ->
            case _ys_a14b of _ [Occ=Dead] {
              [] -> GHC.Types.[] @ (Int, Int);
              : ipv2_a14n ipv3_a14o ->
                GHC.Types.: @ (Int, Int) ipv2_a14n (Foo.f_go ipv1_a14h ipv3_a14o)
    end Rec }

And without:

    Rec {
    Foo.f_go [Occ=LoopBreaker]
      :: [((Int, Int), (Int, Int))] -> [(Int, Int)]
    [GblId, Arity=1, Caf=NoCafRefs, Str=DmdType <S,1*U>]
    Foo.f_go =
      \ (ds_a13o :: [((Int, Int), (Int, Int))]) ->
        case ds_a13o of _ [Occ=Dead] {
          [] -> GHC.Types.[] @ (Int, Int);
          : y_a13t ys_a13u ->
            case y_a13t of _ [Occ=Dead] { (ds1_d12V, i_an2) ->
            GHC.Types.: @ (Int, Int) i_an2 (Foo.f_go ys_a13u)
    end Rec }

Full core is available at

In the no-INLINE case, the list comprehension is compiled much like I would
expect, it's walking down a zipped list of pairs and producing the result. The
INLINE case is rather different, it looks like the `zip` has been inlined and
deforestation has happened.

This explains the difference in memory usage, a whole intermediary list has been

I assume that GHC's analysis is determining `pairs` is too expensive to inline
early enough to allow the further optimisation without the pragma, but the
pragma forces it to happen earlier by marking it as really cheap. But `pairs` is
a really small definition, syntactically. Why does the analysis consider it
expensive? It is simply because it uses its argument multiple times in its body?

And if it's not some sort of cost analysis, what's really going on?

Thank for your time.

Michael Walker (

More information about the ghc-devs mailing list