[Haskell-cafe] Understanding GC time
Chaddaï Fouché
chaddai.fouche at gmail.com
Sat Mar 10 17:00:17 CET 2012
On Sat, Mar 10, 2012 at 4:21 PM, Thiago Negri <evohunz at gmail.com> wrote:
> c:\tmp\hs>par +RTS -s -N1
> par +RTS -s -N1
> 20000000
> 803,186,152 bytes allocated in the heap
> 859,916,960 bytes copied during GC
> 233,465,740 bytes maximum residency (10 sample(s))
> 30,065,860 bytes maximum slop
> 483 MB total memory in use (0 MB lost due to fragmentation)
>
> Generation 0: 1523 collections, 0 parallel, 0.80s, 0.75s elapsed
> Generation 1: 10 collections, 0 parallel, 0.83s, 0.99s elapsed
>
> Parallel GC work balance: nan (0 / 0, ideal 1)
> c:\tmp\hs>par +RTS -s -N2
> par +RTS -s -N2
> 20000000
> 1,606,279,644 bytes allocated in the heap
> 74,924 bytes copied during GC
> 28,340 bytes maximum residency (1 sample(s))
> 29,004 bytes maximum slop
> 2 MB total memory in use (0 MB lost due to fragmentation)
>
> Generation 0: 1566 collections, 1565 parallel, 0.00s, 0.01s elapsed
> Generation 1: 1 collections, 1 parallel, 0.00s, 0.00s elapsed
>
> Parallel GC work balance: 1.78 (15495 / 8703, ideal 2)
An important part of what happened is explained by this :
-N1
> 483 MB total memory in use (0 MB lost due to fragmentation)
-N2
> 2 MB total memory in use (0 MB lost due to fragmentation)
Thing is, in the first version, the list had to be present in memory
completely because you had two traversals and so the head was retained
during the first traversal so that the second traversal could work on
the same list. In the version where both traversals were done in
parallel, the list was produced and consumed in constant memory, since
both folds could progress simultaneously. So the memory use was much
simpler and smaller, which must explain in part why the collections
were so much faster (apparently there was still 0.01s elapsed for the
generation 0 collections).
--
Jedaï
More information about the Haskell-Cafe
mailing list