[Haskell-cafe] Re: Can't find space leak in external sort with tournament trees (test case attached)

Denis Bueno dbueno at gmail.com
Wed Apr 22 09:38:36 EDT 2009


Hello again,

Here are several heap profiles for sorting 1 million entries (each
entry is 48 bytes).  Each was created by passing +RTS <flag> -L200 to
leak.hs, where <flag> is one of -hc, -hd, -hy, -hr; then compiled with
"ghc -O --make -prof -auto-all" under ghc 6.10.2, under OS X ppc.

* The cost centre profile (leak-hc.pdf) shows that
"get_a4hN/sort/performSort/main" is at fault for most of the space.
How do I map "get_a4hN" to a Data.Binary instance?
* I'm not sure how to interpret the retainer profiling in leak-hr.pdf
-- but it mentions my merging function (tourMerge) and tournamentRound
and deleteMinF from Data.TournamentTree.  (leak-hr.prof was produced
during retainer profiling.)

Any suggestions on how to debug this further?
                              Denis



On Tue, Apr 21, 2009 at 21:12, Denis Bueno <dbueno at gmail.com> wrote:
> Hello haskell-cafe,
>
> I'm running into what I think is a space leak.  I've modified
> external-sort-0.2 [0] to use tournament trees for merging, and
> although the block sorting works in constant space, merging seems to
> produce a space leak.
>
> The external sort works by lazily consuming an input list, chopping it
> up into fixed size blocks, sorting each block in memory and writing
> each sorted block out to disk.  At this point we have a big, on-disk
> array of k sorted blocks.  To produce the final, sorted output list,
> we lazily read back each block into a tournament tree (a heap of
> sorts), which should only compare the first element of block to figure
> out the tree root.  The merging algorithm in full is in `tourMerge'
> [1].  I've confirmed by profiling (with -hc) that the merge step
> gobbles up memory.
>
> I've attached a self-contained program (two haskell source files).  It
> requires binary, mersenne-random*, and ArrayRef from hackage.
>
>    $ tar xvzf leak.tar.gz
>    leak.hs
>    Data/TournamentTree.hs
>    $ ghc -O --make leak.hs
>    $ ./leak 5000000 # generate 5 million entries in ./entries.tmp,
> sort them, and output to ./leak.out
>    Writing 5000000 entries to './entries.tmp'
>    Done.  Henceforth, if you omit the number argument to leak it will
> sort './entries.tmp'.
>    Sorting 5000000 entries from './entries.tmp'
>    External block sort using 1000 blocks:
>      -- 5000 entries per block
>      -- 240000 bytes per block
>    1000--blocks sorted.
>    Merging blocks and writing sorted entries to './leak.out'
>
> Depending on how much RAM you have, you may need to tweak the argument
> to leak and/or numBlocks (at the top of leak.hs) -- a block needs to
> be able to fit in RAM.  If the argument to leak is too low, the sort
> will succeed and the leak won't be apparent.  Otherwise, the OS will
> kill the process because it allocated too much.
>
> Anyone have an idea how I can fix the leak?
>                              Denis
>
>
>
> *Only required to generate the initial input file to sort; it's much
> faster than System.Random.
>
> [0] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/external-sort-0.2
>
> [1]
> tourMerge is in leak.hs, but I've also inlined it here for
> convenience.  TT is the Data.TournamentTree module, included in the
> tarball I've attached.
>
> tourMerge :: (Ord a) => [[a]] -> [a]
> tourMerge []   = []
> tourMerge blks = let t = {-SCC "TMfromList" #-}
>                         TT.fromList blks
>                 in tM (TT.minElem t) (TT.deleteMinF t)
>  where
>  tM :: (Ord a) => [a] -> TT.Forest [a] -> [a]
>  tM l f
>     | null f = l
>     | otherwise =
>         let t      = TT.tournament f
>             next   = TT.minElem t
>             (x, xs) = {-# SCC "TTspan" #-} span (<= head next) l
>         in
>         x ++ (tM next
>                  (if null xs then TT.deleteMinF t
>                   else (TT.singleton xs : TT.deleteMinF t)))
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: leak-hc.pdf
Type: application/pdf
Size: 21228 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20090422/7b0dde90/leak-hc-0001.pdf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: leak-hd.pdf
Type: application/pdf
Size: 29585 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20090422/7b0dde90/leak-hd-0001.pdf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: leak-hy.pdf
Type: application/pdf
Size: 29747 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20090422/7b0dde90/leak-hy-0001.pdf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: leak-hr.pdf
Type: application/pdf
Size: 25290 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20090422/7b0dde90/leak-hr-0001.pdf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: leak-hr.prof
Type: application/octet-stream
Size: 38991 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20090422/7b0dde90/leak-hr-0001.obj


More information about the Haskell-Cafe mailing list