memory slop (was: Using the GHC heap profiler)

Tim Docker twd2 at dockerz.net
Wed Mar 23 03:32:16 CET 2011


On Mon, Mar 21, 2011 at 9:59 AM, I wrote:
>
> My question on the ghc heap profiler on stack overflow:
>
> http://stackoverflow.com/questions/5306717/how-should-i-interpret-the-output-of-the-ghc-heap-profiler
>
> remains unanswered :-( Perhaps that's not the best forum. Is there someone
> here prepared to explain how the memory usage in the heap profiler relates
> to the  "Live Bytes" count shown in the garbage collection statistics?

I've made a little progress on this. I've simplified my program down to 
a simple executable that loads a bunch of data into an in-memory map, 
and then writes it out again. I've added calls to `seq` to ensure that 
laziness is not causing excessing memory consumption. When I run this on 
my sample data set, it takes ~7 cpu seconds, and uses ~120 MB of vm An 
equivalent python script, takes ~2 secs and ~19MB of vm :-(.

The code is below. I'm mostly concerned with the memory usage rather 
than performance at this stage. What is interesting, is that when I turn 
on garbage collection statistics (+RTS -s), I see this:

   10,089,324,996 bytes allocated in the heap
      201,018,116 bytes copied during GC
       12,153,592 bytes maximum residency (8 sample(s))
       59,325,408 bytes maximum slop
              114 MB total memory in use (1 MB lost due to fragmentation)

   Generation 0: 19226 collections,     0 parallel,  1.59s,  1.64selapsed
   Generation 1:     8 collections,     0 parallel,  0.04s,  0.04selapsed

   INIT  time    0.00s  (  0.00s elapsed)
   MUT   time    5.84s  (  5.96s elapsed)
   GC    time    1.63s  (  1.68s elapsed)
   EXIT  time    0.00s  (  0.00s elapsed)
   Total time    7.47s  (  7.64s elapsed)

   %GC time      21.8%  (22.0% elapsed)

   Alloc rate    1,726,702,840 bytes per MUT second

   Productivity  78.2% of total user, 76.5% of total elapsed

This seems strange. The maximum residency of 12MB sounds about correct 
for my data. But what's with the 59MB of "slop"? According to the ghc docs:

| The "bytes maximum slop" tells you the most space that is ever wasted
| due to the way GHC allocates memory in blocks. Slop is memory at the
| end of a block that was wasted. There's no way to control this; we
| just like to see how much memory is being lost this way.

There's this page also:

http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/Slop

but it doesn't really make things clearer for me.

Is the slop number above likely to be a significant contribution to net 
memory usage? Are there any obvious reasons why the code below could be 
generating so much? The data file in question has 61k lines, and is <6MB 
in total.

Thanks,

Tim

-------- Map2.hs --------------------------------------------

module Main where

import qualified Data.Map as Map
import qualified Data.ByteString.Char8 as BS
import System.Environment
import System.IO

type MyMap = Map.Map BS.ByteString BS.ByteString

foldLines :: (a -> String -> a) -> a -> Handle -> IO a
foldLines f a h = do
     eof <- hIsEOF h
     if eof
       then (return a)
       else do
          l <- hGetLine h
          let a' = f a l
          a' `seq` foldLines f a' h

undumpFile :: FilePath -> IO MyMap
undumpFile path = do
     h <- openFile path ReadMode
     m <- foldLines addv Map.empty h
     hClose h
     return m
   where
     addv m "" = m
     addv m s = let (k,v) = readKV s
                in k `seq` v `seq` Map.insert k v m

     readKV s = let (ks,vs) = read s in (BS.pack ks, BS.pack vs)

dump :: [(BS.ByteString,BS.ByteString)] -> IO ()
dump vs = mapM_ putV vs
   where
     putV (k,v) = putStrLn (show (BS.unpack k, BS.unpack v))

main :: IO ()
main =  do
     args <- getArgs
     case args of
       [path] -> do
           v <- undumpFile path
           dump (Map.toList v)
           return ()










More information about the Glasgow-haskell-users mailing list