[Haskell-cafe] Understanding GC time

Sat Mar 10 21:23:52 CET 2012

I'm afraid without uniqueness or linear typing it would be hard to avoid.

2012/3/10 Thiago Negri <evohunz at gmail.com>

> I see. Thanks for the answers.
>
> Any data structure or source annotation that would prevent that?
>
> For example, if I try the same program to run on a
> [1..9999999999999999] list, I'll get an out of memory error for the
> single-threaded version. Any way to prevent it without declaring two
> different versions of "list"?
>
>
> 2012/3/10 Anthony Cowley <acowley at gmail.com>:
> > From that profiling data, I think you're just seeing a decrease in
> sharing. With one thread, you create the list structure in memory: the
> first fold could consume it in-place, but the second fold is still waiting
> for its turn.  The list is built on the heap so the two folds can both
> refer to the same list.
> >
> > With two threads, GHC is being clever and inlining the definition you
> give for list, which is then optimized into two parallel loops. No list on
> the heap means there's not much for the GC to do.
> >
> > Sharing of index lists like this is a common source of problems. In
> particular, nested loops can make it even trickier to prevent sharing as
> there may not be an opportunity for parallel evaluation.
> >
> > Anthony
> >
> > On Mar 10, 2012, at 10:21 AM, Thiago Negri <evohunz at gmail.com> wrote:
> >
> >> Hi all.
> >>
> >> I wrote a very simple program to try out parallel Haskel and check how
> >> it would look like to make use of more than one core in this language.
> >>
> >> When I tried the program with RTS option -N1, total time shows it took
> >> 2.48 seconds to complete and around 65% of that time was taken by GC.
> >>
> >> Then I tried the same program with RTS options -N2 and total time
> >> decreased to 1.15 seconds as I expected a gain here. But what I didn't
> >> expect is the GC time to drop to 0%.
> >>
> >> I guess I'm having trouble to understand the output of the RTS option
> -s.
> >> Can you enlighten me?
> >>
> >>
> >> The source for the testing program:
> >>
> >>> module Main where
> >>>
> >>> import Data.List (foldl1')
> >>> import Control.Parallel (par, pseq)
> >>> import Control.Arrow ((&&&))
> >>>
> >>> f `parApp` (a, b) = a `par` (b `pseq` (f a b))
> >>> seqApp = uncurry
> >>>
> >>> main = print result
> >>>  where result = (+) `parApp` minMax list
> >>>        minMax = minlist &&& maxlist
> >>>        minlist = foldl1' min
> >>>        maxlist = foldl1' max
> >>>        list = [1..19999999]
> >>
> >>
> >> The results on a Windows 7 64bits with an Intel Core 2 Duo, compiled
> >> with GHC from Haskell Platform:
> >>
> >> c:\tmp\hs>par +RTS -s -N1
> >> par +RTS -s -N1
> >> 20000000
> >>     803,186,152 bytes allocated in the heap
> >>     859,916,960 bytes copied during GC
> >>     233,465,740 bytes maximum residency (10 sample(s))
> >>      30,065,860 bytes maximum slop
> >>             483 MB total memory in use (0 MB lost due to fragmentation)
> >>
> >>  Generation 0:  1523 collections,     0 parallel,  0.80s,  0.75s elapsed
> >>  Generation 1:    10 collections,     0 parallel,  0.83s,  0.99s elapsed
> >>
> >>  Parallel GC work balance: nan (0 / 0, ideal 1)
> >>
> >>                        MUT time (elapsed)       GC time  (elapsed)
> >>  Task  0 (worker) :    0.00s    (  0.90s)       0.00s    (  0.06s)
> >>  Task  1 (worker) :    0.00s    (  0.90s)       0.00s    (  0.00s)
> >>  Task  2 (bound)  :    0.86s    (  0.90s)       1.62s    (  1.69s)
> >>
> >>  SPARKS: 1 (0 converted, 0 pruned)
> >>
> >>  INIT  time    0.00s  (  0.00s elapsed)
> >>  MUT   time    0.86s  (  0.90s elapsed)
> >>  GC    time    1.62s  (  1.74s elapsed)
> >>  EXIT  time    0.00s  (  0.00s elapsed)
> >>  Total time    2.48s  (  2.65s elapsed)
> >>
> >>  %GC time      65.4%  (65.9% elapsed)
> >>
> >>  Alloc rate    936,110,032 bytes per MUT second
> >>
> >>  Productivity  34.6% of total user, 32.4% of total elapsed
> >>
> >> gc_alloc_block_sync: 0
> >> whitehole_spin: 0
> >> gen[0].sync_large_objects: 0
> >> gen[1].sync_large_objects: 0
> >>
> >>
> >> c:\tmp\hs>par +RTS -s -N2
> >> par +RTS -s -N2
> >> 20000000
> >>   1,606,279,644 bytes allocated in the heap
> >>          74,924 bytes copied during GC
> >>          28,340 bytes maximum residency (1 sample(s))
> >>          29,004 bytes maximum slop
> >>               2 MB total memory in use (0 MB lost due to fragmentation)
> >>
> >>  Generation 0:  1566 collections,  1565 parallel,  0.00s,  0.01s elapsed
> >>  Generation 1:     1 collections,     1 parallel,  0.00s,  0.00s elapsed
> >>
> >>  Parallel GC work balance: 1.78 (15495 / 8703, ideal 2)
> >>
> >>                        MUT time (elapsed)       GC time  (elapsed)
> >>  Task  0 (worker) :    0.00s    (  0.59s)       0.00s    (  0.00s)
> >>  Task  1 (worker) :    0.58s    (  0.59s)       0.00s    (  0.01s)
> >>  Task  2 (bound)  :    0.58s    (  0.59s)       0.00s    (  0.00s)
> >>  Task  3 (worker) :    0.00s    (  0.59s)       0.00s    (  0.00s)
> >>
> >>  SPARKS: 1 (1 converted, 0 pruned)
> >>
> >>  INIT  time    0.00s  (  0.00s elapsed)
> >>  MUT   time    1.15s  (  0.59s elapsed)
> >>  GC    time    0.00s  (  0.01s elapsed)
> >>  EXIT  time    0.00s  (  0.00s elapsed)
> >>  Total time    1.15s  (  0.61s elapsed)
> >>
> >>  %GC time       0.0%  (2.4% elapsed)
> >>
> >>  Alloc rate    1,391,432,695 bytes per MUT second
> >>
> >>  Productivity 100.0% of total user, 190.3% of total elapsed
> >>
> >> gc_alloc_block_sync: 90
> >> whitehole_spin: 0
> >> gen[0].sync_large_objects: 0
> >> gen[1].sync_large_objects: 0
> >>
> >> _______________________________________________
> >> Haskell-Cafe mailing list
> >> Haskell-Cafe at haskell.org
> >> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20120310/6c0aa7aa/attachment.htm>