Is evacuate for StgMutArrPtrs and StgArrPtrs expensive to GC?

Tue Oct 1 20:57:15 UTC 2013

wonderful!

also great meeting you at ICFP

On Tue, Oct 1, 2013 at 4:56 PM, Andrew Farmer <afarmer at ittc.ku.edu> wrote:

> Definitely... I'm somewhat fully occupied for the next two weeks, but
> should be able to dig it out then and organize/share it.
>  On Oct 1, 2013 3:50 PM, "Carter Schonwald" <carter.schonwald at gmail.com>
> wrote:
>
>> awesome!
>>
>> please let us know when some of the info is available publicly, perhaps
>> so other folks can help out wiht experimentation
>>
>>
>> On Tue, Oct 1, 2013 at 4:30 PM, Andrew Farmer <afarmer at ittc.ku.edu>wrote:
>>
>>> I did indeed implement dynamic nursery sizing and did some preliminary
>>> benchmarking. The headline figure: 15% speedup on the nofib/gc benchmarks,
>>> though the variance was pretty large, and there were some slowdowns.
>>>
>>> My scheme was very simple... I kept track of the size and rough
>>> collection time of the previous three collections and did a sort of crude
>>> binary search to find a minimum in the search space. I did it this way
>>> because it was simple and required constant time and memory to make a
>>> decision. Though one of the conclusions was that collection time was a bad
>>> metric, due to the way the RTS re-uses blocks. As Simon pointed out,
>>> tracking retainment or some other metric would probably be better, but I
>>> need to explore it. Another result: the default size is almost always too
>>> small (at least for the nofib programs). CPUs come with huge caches, and
>>> using the RTS flag -A to set the allocation area to be roughly the size of
>>> the L3 cache usually gave pretty decent speedups.
>>>
>>> I did this for a class project, and had to put it down to focus on other
>>> things, and just haven't picked it back up. I still have a patch laying
>>> around, and several pages of notes with ideas for improvement in both the
>>> metric and search. I'm hoping to pick it back up again in a couple months,
>>> with an eye on a workshop paper, and a real patch for 7.10.
>>>
>>>
>>> On Tue, Oct 1, 2013 at 3:36 AM, Simon Marlow <marlowsd at gmail.com> wrote:
>>>
>>>> It's typical for benchmarks that allocate a large data structure to
>>>> spend a lot of time in the GC.  The data gets copied twice - once in the
>>>> young generation and then again when promoted to the old generation.  You
>>>> can make this kind of benchmark much faster by just using a bigger
>>>> allocation area.
>>>>
>>>> There's nothing inherently costly about StgMutArrPtrs compared to other
>>>> objects, except that they are variable size and therefore we can't unroll
>>>> the copy loop, but I don't think that's a big effect.  The actual copying
>>>> is the major cost.
>>>>
>>>> The way to improve this kind of benchmark would be to add some
>>>> heuristics for varying the nursery size based on the quantity of data
>>>> retained, for example.  I think there's a lot of room for improvement here,
>>>> but someone needs to do some careful benchmarking and experimentation.
>>>> Andrew Farmer did some work on this and allegedly got good results but we
>>>> never saw the code (hint hint!).
>>>>
>>>> Cheers,
>>>> Simon
>>>>
>>>>
>>>> On 1 October 2013 06:43, Johan Tibell <johan.tibell at gmail.com> wrote:
>>>>
>>>>> The code for 'allocate' in rts/sm/Storage.c doesn't seem that
>>>>> expensive. An extra branch compared to inline allocation and
>>>>> allocation is done in the next nursery block (risking fragmentation?).
>>>>>
>>>>> -- Johan
>>>>>
>>>>> On Mon, Sep 30, 2013 at 9:50 PM, Johan Tibell <johan.tibell at gmail.com>
>>>>> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > When I benchmark Data.HashMap.insert from unordered-containers
>>>>> > (inserting the keys [0..10000]) the runtime is dominated by GC:
>>>>> >
>>>>> > $ cat Test.hs
>>>>> > module Main where
>>>>> >
>>>>> > import           Control.DeepSeq
>>>>> > import           Control.Exception
>>>>> > import           Control.Monad
>>>>> > import qualified Data.HashMap.Strict as HM
>>>>> > import           Data.List (foldl')
>>>>> >
>>>>> > main = do
>>>>> >     let ks = [0..10000] :: [Int]
>>>>> >     evaluate (rnf ks)
>>>>> >     forM_ ([0..1000] :: [Int]) $ \ x -> do
>>>>> >         evaluate $ HM.null $ foldl' (\ m k -> HM.insert k x m)
>>>>> HM.empty ks
>>>>> >
>>>>> > $ perf record -g ./Test +RTS -s
>>>>> >    6,187,678,112 bytes allocated in the heap
>>>>> >    3,309,887,128 bytes copied during GC
>>>>> >        1,299,200 bytes maximum residency (1002 sample(s))
>>>>> >          118,816 bytes maximum slop
>>>>> >                5 MB total memory in use (0 MB lost due to
>>>>> fragmentation)
>>>>> >
>>>>> >                                     Tot time (elapsed)  Avg pause
>>>>>  Max pause
>>>>> >   Gen  0     11089 colls,     0 par    1.31s    1.30s     0.0001s
>>>>>  0.0005s
>>>>> >   Gen  1      1002 colls,     0 par    0.49s    0.51s     0.0005s
>>>>>  0.0022s
>>>>> >
>>>>> >   INIT    time    0.00s  (  0.00s elapsed)
>>>>> >   MUT     time    1.02s  (  1.03s elapsed)
>>>>> >   GC      time    1.80s  (  1.80s elapsed)
>>>>> >   EXIT    time    0.00s  (  0.00s elapsed)
>>>>> >   Total   time    2.82s  (  2.84s elapsed)
>>>>> >
>>>>> >   %GC     time      63.7%  (63.5% elapsed)
>>>>> >
>>>>> >   Alloc rate    6,042,264,963 bytes per MUT second
>>>>> >
>>>>> >   Productivity  36.3% of total user, 36.1% of total elapsed
>>>>> >
>>>>> > $ perf report
>>>>> > 41.46%  Test  Test               [.] evacuate
>>>>> > 15.47%  Test  Test               [.] scavenge_block
>>>>> > 11.04%  Test  Test               [.] s3cN_info
>>>>> >  8.74%  Test  Test               [.] s3aZ_info
>>>>> >  3.59%  Test  Test               [.] 0x7ff5
>>>>> >  2.83%  Test  Test               [.] scavenge_mut_arr_ptrs
>>>>> >  2.69%  Test  libc-2.15.so       [.] 0x147fd9
>>>>> >  2.51%  Test  Test               [.] allocate
>>>>> >  2.00%  Test  Test               [.] s3oo_info
>>>>> >  0.91%  Test  Test               [.] todo_block_full
>>>>> >  0.87%  Test  Test               [.] hs_popcnt64
>>>>> >  0.80%  Test  Test               [.] s3en_info
>>>>> >  0.62%  Test  Test               [.] s3el_info
>>>>> >
>>>>> > Is GC:ing StgMutArrPtrs and StgArrPtrs, which I create a lot of, more
>>>>> > expensive than GC:ing normal heap objects (i.e. for standard data
>>>>> > types)?
>>>>> >
>>>>> > -- Johan
>>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> ghc-devs mailing list
>>> ghc-devs at haskell.org
>>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20131001/fa0b8085/attachment-0001.html>