memory slop (was: Using the GHC heap profiler)

Johan Tibell johan.tibell at gmail.com
Tue Mar 22 13:38:38 CET 2011


On Wed, Mar 23, 2011 at 9:32 AM, Tim Docker <twd2 at dockerz.net> wrote:
>  Productivity  78.2% of total user, 76.5% of total elapsed

As a rule of thumb GC time should be less than 10%.

> This seems strange. The maximum residency of 12MB sounds about correct for
> my data. But what's with the 59MB of "slop"? According to the ghc docs:
>
> | The "bytes maximum slop" tells you the most space that is ever wasted
> | due to the way GHC allocates memory in blocks. Slop is memory at the
> | end of a block that was wasted. There's no way to control this; we
> | just like to see how much memory is being lost this way.

GHC requests memory from the OS in large blocks. This makes GC more
efficient. The program might not end up using all the allocated memory
in the end.

> type MyMap = Map.Map BS.ByteString BS.ByteString

Try using HashMap from the unordered-collections package. It's
typically 2-3x faster than Map for ByteString/Text keys.

> foldLines :: (a -> String -> a) -> a -> Handle -> IO a
> foldLines f a h = do
>    eof <- hIsEOF h
>    if eof
>      then (return a)
>      else do
>         l <- hGetLine h
>         let a' = f a l
>         a' `seq` foldLines f a' h

Your foldLines is not strict enough. Consider what happens if you call

    foldLines someF undefined someHandle

when the file is empty. If foldLines was strict in the accumulator
you'd expect the program to crash (from evaluating undefined), but it
doesn't as 'return a' doesn't force 'a'.

In addition, you'd like GHC to inline foldLines so the indirect
function call to 'f' can be turned to a call to a known function.
Here's a better definition:

foldLines :: (a -> String -> a) -> a -> Handle -> IO a
foldLines f a0 !h = go a0
  where
    go !a = do
      eof <- hIsEOF h
      if eof
        then (return a)
        else do
          l <- hGetLine h
          go (f a l)
{-# INLINE foldLines #-}

Also, as others have mentioned, String is no good. Use ByteString and
Text. Both come with functions to read lines (if I recall correctly).

Johan



More information about the Glasgow-haskell-users mailing list