[Haskell-cafe] Sub-optimal [code]

Daniel Fischer daniel.is.fischer at googlemail.com
Tue Feb 15 21:35:01 CET 2011


On Tuesday 15 February 2011 20:15:54, Andrew Coppin wrote:
> >> I tried -O2 -fno-cse. No difference.
> >>
> >> I also tried -O2 -fno-full-laziness. BIG DIFFERENCE.
> >
> > See also the very old GHC ticket at
> > http://hackage.haskell.org/trac/ghc/ticket/917
>
> I don't know if that's the problem or not, but it might plausibly be.
>
> Here's the smallest version of the program that I could come up with
> [which still misbehaves]:
>
> module Main (main) where
>
> import System.IO
> import System.Random
>
> main = do
>    file_batch "01-Uniform" random_byte_uniform
>
> random_byte_uniform :: IO Int
> random_byte_uniform = randomRIO (0x00, 0xFF)
>
> random_file :: String -> Int -> IO Int -> IO ()
> random_file f n rnd = do
>    putStrLn $ "Save: " ++ f ++ " [" ++ show n ++ " bytes]"
>    h <- openFile f WriteMode
>    hSetBinaryMode h True
>    mapM_ (\ _ -> rnd >>= hPutChar h . toEnum) [1..n]
>    hClose h
>
> file_batch :: String -> IO Int -> IO ()
> file_batch f rnd =
>    mapM_
>      (\ k ->
>        mapM_
>          (\ n ->
>            random_file
>              (f ++ "-" ++ show k ++ "x-" ++ [n])
>              (10 * 1024 * 1024 * k)
>              rnd
>          )
>          "ABCD"
>      )
>      [1..4]
>
> If main calls random_file directly, the program seems to work OK, so the
> problem seems to be file_batch. Maybe.

Or, one could say, the problem is the export list :)
If you remove the export list, so that random_file is exported, the leak 
disappears (at least with 7.0.1, didn't test 6.12).
If nothing but main is exported, GHC can be much more aggressive with 
inlining, and it is.

The result is that the list

[1 .. 10*1024*1024*k]

from the penultimate line of random_file is shared between the four 
iterations of the inner loop in file_batch (for k = 1 .. 4). Oops.

If random_file is exported, it is too big to inline it (at least if you 
don't specifically ask for it), so you get no sharing (even better, GHC 
rewrites

mapM_ (\ _ -> rnd >>= hPutChar h . toEnum) [1..n]

to a nice loop, the list isn't constructed at all).

> I don't really know. I had a go
> at playing with -ddump-simpl, but that just generated a 8 KB file which
> is utterly incomprehensible. (Well, the -O0 variant is just about
> comprehensible. The -O2 variant isn't.

You have to look for interesting stuff (in this case the list [1 .. n]) and 
note its identifier (yes, coping with the identifiers in core is hard, 
especially when they are entirely compiler-generated and don't start with a 
source-code name), then see how it is used.

> But it appears that *everything*
> gets inlined into main...)

That's kind of the point of

module Main (main) where

Sometimes that's good, other times not.

>
> If anybody can figure out what's happening here, I'd be interested to
> know.

Due to the extensive inlining, GHC sees that some values are reused, so it 
decides to share those values instead of recomputing them.
Unfortunately, those values are long lists.
Making GHC look at smaller chunks of the code prevents that, as does 
turning off full-laziness (in both cases the let-binding of the list 
doesn't get floated out of random_file, that floating [more precisely, the 
resulting sharing] is what causes the leak).

Which makes me wonder: unwanted sharing of lists [1 .. n] or similar is a 
frequent cause of space leaks, so would it be possible to teach GHC to not 
share such lists (unless they're bound to a name to indicate sharing is 
wanted)?
In particular for enumerations [a .. b] of type [Int], [Integer] or 
similar, I'm pretty sure that the cost of recomputation is far outweighed 
by the memory consumption of sharing in almost all cases.




More information about the Haskell-Cafe mailing list