[Haskell-cafe] Re: Can't find space leak in external sort with
tournament trees (test case attached)
Denis Bueno
dbueno at gmail.com
Wed Apr 22 09:38:36 EDT 2009
Hello again,
Here are several heap profiles for sorting 1 million entries (each
entry is 48 bytes). Each was created by passing +RTS <flag> -L200 to
leak.hs, where <flag> is one of -hc, -hd, -hy, -hr; then compiled with
"ghc -O --make -prof -auto-all" under ghc 6.10.2, under OS X ppc.
* The cost centre profile (leak-hc.pdf) shows that
"get_a4hN/sort/performSort/main" is at fault for most of the space.
How do I map "get_a4hN" to a Data.Binary instance?
* I'm not sure how to interpret the retainer profiling in leak-hr.pdf
-- but it mentions my merging function (tourMerge) and tournamentRound
and deleteMinF from Data.TournamentTree. (leak-hr.prof was produced
during retainer profiling.)
Any suggestions on how to debug this further?
Denis
On Tue, Apr 21, 2009 at 21:12, Denis Bueno <dbueno at gmail.com> wrote:
> Hello haskell-cafe,
>
> I'm running into what I think is a space leak. I've modified
> external-sort-0.2 [0] to use tournament trees for merging, and
> although the block sorting works in constant space, merging seems to
> produce a space leak.
>
> The external sort works by lazily consuming an input list, chopping it
> up into fixed size blocks, sorting each block in memory and writing
> each sorted block out to disk. At this point we have a big, on-disk
> array of k sorted blocks. To produce the final, sorted output list,
> we lazily read back each block into a tournament tree (a heap of
> sorts), which should only compare the first element of block to figure
> out the tree root. The merging algorithm in full is in `tourMerge'
> [1]. I've confirmed by profiling (with -hc) that the merge step
> gobbles up memory.
>
> I've attached a self-contained program (two haskell source files). It
> requires binary, mersenne-random*, and ArrayRef from hackage.
>
> $ tar xvzf leak.tar.gz
> leak.hs
> Data/TournamentTree.hs
> $ ghc -O --make leak.hs
> $ ./leak 5000000 # generate 5 million entries in ./entries.tmp,
> sort them, and output to ./leak.out
> Writing 5000000 entries to './entries.tmp'
> Done. Henceforth, if you omit the number argument to leak it will
> sort './entries.tmp'.
> Sorting 5000000 entries from './entries.tmp'
> External block sort using 1000 blocks:
> -- 5000 entries per block
> -- 240000 bytes per block
> 1000--blocks sorted.
> Merging blocks and writing sorted entries to './leak.out'
>
> Depending on how much RAM you have, you may need to tweak the argument
> to leak and/or numBlocks (at the top of leak.hs) -- a block needs to
> be able to fit in RAM. If the argument to leak is too low, the sort
> will succeed and the leak won't be apparent. Otherwise, the OS will
> kill the process because it allocated too much.
>
> Anyone have an idea how I can fix the leak?
> Denis
>
>
>
> *Only required to generate the initial input file to sort; it's much
> faster than System.Random.
>
> [0] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/external-sort-0.2
>
> [1]
> tourMerge is in leak.hs, but I've also inlined it here for
> convenience. TT is the Data.TournamentTree module, included in the
> tarball I've attached.
>
> tourMerge :: (Ord a) => [[a]] -> [a]
> tourMerge [] = []
> tourMerge blks = let t = {-SCC "TMfromList" #-}
> TT.fromList blks
> in tM (TT.minElem t) (TT.deleteMinF t)
> where
> tM :: (Ord a) => [a] -> TT.Forest [a] -> [a]
> tM l f
> | null f = l
> | otherwise =
> let t = TT.tournament f
> next = TT.minElem t
> (x, xs) = {-# SCC "TTspan" #-} span (<= head next) l
> in
> x ++ (tM next
> (if null xs then TT.deleteMinF t
> else (TT.singleton xs : TT.deleteMinF t)))
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: leak-hc.pdf
Type: application/pdf
Size: 21228 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20090422/7b0dde90/leak-hc-0001.pdf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: leak-hd.pdf
Type: application/pdf
Size: 29585 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20090422/7b0dde90/leak-hd-0001.pdf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: leak-hy.pdf
Type: application/pdf
Size: 29747 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20090422/7b0dde90/leak-hy-0001.pdf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: leak-hr.pdf
Type: application/pdf
Size: 25290 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20090422/7b0dde90/leak-hr-0001.pdf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: leak-hr.prof
Type: application/octet-stream
Size: 38991 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20090422/7b0dde90/leak-hr-0001.obj
More information about the Haskell-Cafe
mailing list