Is evacuate for StgMutArrPtrs and StgArrPtrs expensive to GC?

Tue Oct 1 08:36:42 UTC 2013

It's typical for benchmarks that allocate a large data structure to spend a
lot of time in the GC.  The data gets copied twice - once in the young
generation and then again when promoted to the old generation.  You can
make this kind of benchmark much faster by just using a bigger allocation
area.

There's nothing inherently costly about StgMutArrPtrs compared to other
objects, except that they are variable size and therefore we can't unroll
the copy loop, but I don't think that's a big effect.  The actual copying
is the major cost.

The way to improve this kind of benchmark would be to add some heuristics
for varying the nursery size based on the quantity of data retained, for
example.  I think there's a lot of room for improvement here, but someone
needs to do some careful benchmarking and experimentation. Andrew Farmer
did some work on this and allegedly got good results but we never saw the
code (hint hint!).

Cheers,
Simon

On 1 October 2013 06:43, Johan Tibell <johan.tibell at gmail.com> wrote:

> The code for 'allocate' in rts/sm/Storage.c doesn't seem that
> expensive. An extra branch compared to inline allocation and
> allocation is done in the next nursery block (risking fragmentation?).
>
> -- Johan
>
> On Mon, Sep 30, 2013 at 9:50 PM, Johan Tibell <johan.tibell at gmail.com>
> wrote:
> > Hi,
> >
> > When I benchmark Data.HashMap.insert from unordered-containers
> > (inserting the keys [0..10000]) the runtime is dominated by GC:
> >
> > $ cat Test.hs
> > module Main where
> >
> > import           Control.DeepSeq
> > import           Control.Exception
> > import           Control.Monad
> > import qualified Data.HashMap.Strict as HM
> > import           Data.List (foldl')
> >
> > main = do
> >     let ks = [0..10000] :: [Int]
> >     evaluate (rnf ks)
> >     forM_ ([0..1000] :: [Int]) $ \ x -> do
> >         evaluate $ HM.null $ foldl' (\ m k -> HM.insert k x m) HM.empty
> ks
> >
> > $ perf record -g ./Test +RTS -s
> >    6,187,678,112 bytes allocated in the heap
> >    3,309,887,128 bytes copied during GC
> >        1,299,200 bytes maximum residency (1002 sample(s))
> >          118,816 bytes maximum slop
> >                5 MB total memory in use (0 MB lost due to fragmentation)
> >
> >                                     Tot time (elapsed)  Avg pause  Max
> pause
> >   Gen  0     11089 colls,     0 par    1.31s    1.30s     0.0001s
>  0.0005s
> >   Gen  1      1002 colls,     0 par    0.49s    0.51s     0.0005s
>  0.0022s
> >
> >   INIT    time    0.00s  (  0.00s elapsed)
> >   MUT     time    1.02s  (  1.03s elapsed)
> >   GC      time    1.80s  (  1.80s elapsed)
> >   EXIT    time    0.00s  (  0.00s elapsed)
> >   Total   time    2.82s  (  2.84s elapsed)
> >
> >   %GC     time      63.7%  (63.5% elapsed)
> >
> >   Alloc rate    6,042,264,963 bytes per MUT second
> >
> >   Productivity  36.3% of total user, 36.1% of total elapsed
> >
> > $ perf report
> > 41.46%  Test  Test               [.] evacuate
> > 15.47%  Test  Test               [.] scavenge_block
> > 11.04%  Test  Test               [.] s3cN_info
> >  8.74%  Test  Test               [.] s3aZ_info
> >  3.59%  Test  Test               [.] 0x7ff5
> >  2.83%  Test  Test               [.] scavenge_mut_arr_ptrs
> >  2.69%  Test  libc-2.15.so       [.] 0x147fd9
> >  2.51%  Test  Test               [.] allocate
> >  2.00%  Test  Test               [.] s3oo_info
> >  0.91%  Test  Test               [.] todo_block_full
> >  0.87%  Test  Test               [.] hs_popcnt64
> >  0.80%  Test  Test               [.] s3en_info
> >  0.62%  Test  Test               [.] s3el_info
> >
> > Is GC:ing StgMutArrPtrs and StgArrPtrs, which I create a lot of, more
> > expensive than GC:ing normal heap objects (i.e. for standard data
> > types)?
> >
> > -- Johan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20131001/650157fc/attachment.html>