[Haskell-cafe] Re: Can't find space leak in external sort with tournament trees (test case attached)

Denis Bueno dbueno at gmail.com
Thu Apr 23 11:21:56 EDT 2009


Hi again,

The problem here turned out to be a too-lazy Data.Binary instance for
Entry.  Using seq to force each number before wrapping it in an Entry
made those retainers go away.

I have yet another leaking problem, but I need to figure out which
question to ask of the mailing list first.
                              Denis



On Wed, Apr 22, 2009 at 07:38, Denis Bueno <dbueno at gmail.com> wrote:
> Hello again,
>
> Here are several heap profiles for sorting 1 million entries (each
> entry is 48 bytes).  Each was created by passing +RTS <flag> -L200 to
> leak.hs, where <flag> is one of -hc, -hd, -hy, -hr; then compiled with
> "ghc -O --make -prof -auto-all" under ghc 6.10.2, under OS X ppc.
>
> * The cost centre profile (leak-hc.pdf) shows that
> "get_a4hN/sort/performSort/main" is at fault for most of the space.
> How do I map "get_a4hN" to a Data.Binary instance?
> * I'm not sure how to interpret the retainer profiling in leak-hr.pdf
> -- but it mentions my merging function (tourMerge) and tournamentRound
> and deleteMinF from Data.TournamentTree.  (leak-hr.prof was produced
> during retainer profiling.)
>
> Any suggestions on how to debug this further?
>                              Denis
>
>
>
> On Tue, Apr 21, 2009 at 21:12, Denis Bueno <dbueno at gmail.com> wrote:
>> Hello haskell-cafe,
>>
>> I'm running into what I think is a space leak.  I've modified
>> external-sort-0.2 [0] to use tournament trees for merging, and
>> although the block sorting works in constant space, merging seems to
>> produce a space leak.
>>
>> The external sort works by lazily consuming an input list, chopping it
>> up into fixed size blocks, sorting each block in memory and writing
>> each sorted block out to disk.  At this point we have a big, on-disk
>> array of k sorted blocks.  To produce the final, sorted output list,
>> we lazily read back each block into a tournament tree (a heap of
>> sorts), which should only compare the first element of block to figure
>> out the tree root.  The merging algorithm in full is in `tourMerge'
>> [1].  I've confirmed by profiling (with -hc) that the merge step
>> gobbles up memory.
>>
>> I've attached a self-contained program (two haskell source files).  It
>> requires binary, mersenne-random*, and ArrayRef from hackage.
>>
>>    $ tar xvzf leak.tar.gz
>>    leak.hs
>>    Data/TournamentTree.hs
>>    $ ghc -O --make leak.hs
>>    $ ./leak 5000000 # generate 5 million entries in ./entries.tmp,
>> sort them, and output to ./leak.out
>>    Writing 5000000 entries to './entries.tmp'
>>    Done.  Henceforth, if you omit the number argument to leak it will
>> sort './entries.tmp'.
>>    Sorting 5000000 entries from './entries.tmp'
>>    External block sort using 1000 blocks:
>>      -- 5000 entries per block
>>      -- 240000 bytes per block
>>    1000--blocks sorted.
>>    Merging blocks and writing sorted entries to './leak.out'
>>
>> Depending on how much RAM you have, you may need to tweak the argument
>> to leak and/or numBlocks (at the top of leak.hs) -- a block needs to
>> be able to fit in RAM.  If the argument to leak is too low, the sort
>> will succeed and the leak won't be apparent.  Otherwise, the OS will
>> kill the process because it allocated too much.
>>
>> Anyone have an idea how I can fix the leak?
>>                              Denis
>>
>>
>>
>> *Only required to generate the initial input file to sort; it's much
>> faster than System.Random.
>>
>> [0] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/external-sort-0.2
>>
>> [1]
>> tourMerge is in leak.hs, but I've also inlined it here for
>> convenience.  TT is the Data.TournamentTree module, included in the
>> tarball I've attached.
>>
>> tourMerge :: (Ord a) => [[a]] -> [a]
>> tourMerge []   = []
>> tourMerge blks = let t = {-SCC "TMfromList" #-}
>>                         TT.fromList blks
>>                 in tM (TT.minElem t) (TT.deleteMinF t)
>>  where
>>  tM :: (Ord a) => [a] -> TT.Forest [a] -> [a]
>>  tM l f
>>     | null f = l
>>     | otherwise =
>>         let t      = TT.tournament f
>>             next   = TT.minElem t
>>             (x, xs) = {-# SCC "TTspan" #-} span (<= head next) l
>>         in
>>         x ++ (tM next
>>                  (if null xs then TT.deleteMinF t
>>                   else (TT.singleton xs : TT.deleteMinF t)))
>>
>


More information about the Haskell-Cafe mailing list