Question about inliner behaviour with a small function
Simon Peyton Jones
simonpj at microsoft.com
Wed Jul 29 14:51:12 UTC 2015
Hmm. With HEAD, and without profiling, the program allocates the same 104M, both with and without the INLINE. The same deforestation happens in both cases.
It's quite possible that profiling interferes with deforestation.
Simon
c:/code/HEAD/inplace/bin/ghc-stage1 Michael.hs -O -o Michael-no-inline
./Michael-no-inline.exe +RTS -s
999999
104,045,052 bytes allocated in the heap
171,752 bytes copied during GC
41,756 bytes maximum residency (2 sample(s))
36,800 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
| -----Original Message-----
| From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Michael
| Walker
| Sent: 22 July 2015 14:46
| To: ghc-devs at haskell.org
| Subject: Question about inliner behaviour with a small function
|
| Hello,
|
| I managed to shrink a bizarre memory issue down to this (probably
| minimal)
| example:
|
| module Main where
|
| f :: [Int] -> Int
| f xs = length is where
| is = [ i | (_, i) <- pairs ys ] :: [(Int,Int)]
|
| ys = zip [0..] xs :: [(Int, Int)]
|
| {-# INLINE pairs #-}
| pairs xs = zip xs $ tail xs
|
| main :: IO ()
| main = print $ f xs where
| xs = replicate 1000000 0
|
| With the INLINE pragma, this allocates 264,049,584 bytes (compiled with
| `ghc -O2
| -prof -fprof-auto inline.hs), without the pragma it allocates 336,049,512
| bytes.
|
| Dropping the `main` definition (and renaming the module to "Foo") and
| examining
| the core, the key difference seems to be in how the list comprehension is
| compiled.
|
| With the INLINE pragma:
|
| Rec {
| Foo.f_go [Occ=LoopBreaker]
| :: [(Int, Int)] -> [(Int, Int)] -> [(Int, Int)]
| [GblId, Arity=2, Caf=NoCafRefs, Str=DmdType <S,1*U><L,1*U>]
| Foo.f_go =
| \ (ds_a14a :: [(Int, Int)]) (_ys_a14b :: [(Int, Int)]) ->
| case ds_a14a of _ [Occ=Dead] {
| [] -> GHC.Types.[] @ (Int, Int);
| : ipv_a14g ipv1_a14h ->
| case _ys_a14b of _ [Occ=Dead] {
| [] -> GHC.Types.[] @ (Int, Int);
| : ipv2_a14n ipv3_a14o ->
| GHC.Types.: @ (Int, Int) ipv2_a14n (Foo.f_go ipv1_a14h
| ipv3_a14o)
| }
| }
| end Rec }
|
| And without:
|
| Rec {
| Foo.f_go [Occ=LoopBreaker]
| :: [((Int, Int), (Int, Int))] -> [(Int, Int)]
| [GblId, Arity=1, Caf=NoCafRefs, Str=DmdType <S,1*U>]
| Foo.f_go =
| \ (ds_a13o :: [((Int, Int), (Int, Int))]) ->
| case ds_a13o of _ [Occ=Dead] {
| [] -> GHC.Types.[] @ (Int, Int);
| : y_a13t ys_a13u ->
| case y_a13t of _ [Occ=Dead] { (ds1_d12V, i_an2) ->
| GHC.Types.: @ (Int, Int) i_an2 (Foo.f_go ys_a13u)
| }
| }
| end Rec }
|
| Full core is available at
| https://gist.github.com/barrucadu/a59df62cd16074559e35
|
| In the no-INLINE case, the list comprehension is compiled much like I
| would
| expect, it's walking down a zipped list of pairs and producing the
| result. The
| INLINE case is rather different, it looks like the `zip` has been inlined
| and
| deforestation has happened.
|
| This explains the difference in memory usage, a whole intermediary list
| has been
| skipped!
|
| I assume that GHC's analysis is determining `pairs` is too expensive to
| inline
| early enough to allow the further optimisation without the pragma, but
| the
| pragma forces it to happen earlier by marking it as really cheap. But
| `pairs` is
| a really small definition, syntactically. Why does the analysis consider
| it
| expensive? It is simply because it uses its argument multiple times in
| its body?
|
| And if it's not some sort of cost analysis, what's really going on?
|
| Thank for your time.
|
| --
| Michael Walker (http://www.barrucadu.co.uk)
| _______________________________________________
| ghc-devs mailing list
| ghc-devs at haskell.org
| http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
More information about the ghc-devs
mailing list