Confused about PAP object layout

Simon Marlow marlowsd at gmail.com
Thu Feb 27 08:37:31 UTC 2020


On Wed, 26 Feb 2020 at 18:48, Ömer Sinan Ağacan <omeragacan at gmail.com>
wrote:

> So the key points from this thread are:
>
> - PAP payloads are scavenged using the function's bitmap. Because a PAPs
> payload
>   will have less number of closures than the function's arity the bitmap
> will
>   always have enough bits.
>
> - A bit in a function bitmap is NOT for liveness (e.g. does not indicate
> whether
>   an argument used or not), but for pointers vs. non-pointers. Function
> bitmaps
>   are called "liveness bits" in the code generator which is misleading.
>

I think of all bitmaps as representing "liveness" (or equivalently
"pointerhood") for the purposes of GC. There's no difference from the GC's
perspective between a non-pointer and a pointer that it doesn't need to
follow.

In fact there's nothing to prevent us using the function bitmap to indicate
dead arguments too - it would require zero changes in the RTS, the compiler
would only need to mark unused pointer arguments as non-pointers in the
bitmap. Probably wouldn't be worth very much overall, but I do recall one
space leak that would have been cured by this.

- In a function bitmap (small or large), 0 means pointer, 1 means
> non-pointer.
>

This is true of bitmaps generally I think, not just function bitmaps.

  This is really what confused me in my last email above. For some reason I
>   intuitively expected 1 to mean pointer, not 0. Simon M also got this
> wrong
>

Oops :)

I think there may originally have been a good reason to have it this way
around: before eval/apply, we used bitmaps to describe stack frames, but we
didn't need to encode a size in the bitmap because the default was for the
stack contents to be pointers unless there was something to tell us
otherwise. So a zero suffix of a bitmap just meant "the rest is just normal
stack". This changed with eval/apply, but we kept the convention that zero
meant pointer in a bitmap.


>   ("So a 0 in the bitmap always means non-pointer.") so maybe this is
> confusing
>   to others too.
>
> - For functions with known argument patterns we don't use the function's
> bitmap.
>   These function's type are greater than ARG_BCO (2), and for those we use
> the
>   stg_arg_bitmaps array to get the bitmap.
>
>   For example, the bitmap for ARG_PPP (function with 3 pointer arguments)
> is at
>   index 23 in this array, which is 0b11. For ARG_PNN it's 0b110000011. The
> least
>   significant 6 bits are for the size (3), the remaining 0b110 means the
> first
>   argument is a pointer, rest of the two are non-pointers.
>

Actually I think documentation on this is missing in the wiki, I guess I
never got around to updating it when we implemented eval/apply. This page
should really describe function info tables:
https://gitlab.haskell.org/ghc/ghc/wikis/commentary/rts/storage/heap-objects#info-tables

If you want to add documentation that would be a good place.

Cheers
Simon

I still don't understand why this assertion
>
>     ASSERT(BITMAP_SIZE(bitmap) >= size);
>
> I added to scavenge_small_bitmap in !2727 is failing though.
>
> Ömer
>
> Simon Peyton Jones <simonpj at microsoft.com>, 24 Şub 2020 Pzt, 13:45
> tarihinde şunu yazdı:
> >
> > I’m not following this in detail, but do please make sure that the
> results of this discussion end up in a suitable Note.  Obviously it’s not
> transparently clear as-is, and I can see clarity emerging
> >
> >
> >
> > Thanks!
> >
> >
> > Simon
> >
> >
> >
> > From: ghc-devs <ghc-devs-bounces at haskell.org> On Behalf Of Simon Marlow
> > Sent: 24 February 2020 08:22
> > To: Ömer Sinan Ağacan <omeragacan at gmail.com>
> > Cc: ghc-devs <ghc-devs at haskell.org>
> > Subject: Re: Confused about PAP object layout
> >
> >
> >
> > On Thu, 20 Feb 2020 at 09:21, Ömer Sinan Ağacan <omeragacan at gmail.com>
> wrote:
> >
> > > I'm not sure what you mean by "garbage". The bitmap merely determines
> whether
> > > a field is a pointer,
> >
> > I think the bitmap is for liveness, not for whether a field is pointer
> or not.
> > Relevant code for building an info table for a function:
> >
> >     mk_pieces (Fun arity (ArgGen arg_bits)) srt_label
> >       = do { (liveness_lit, liveness_data) <- mkLivenessBits dflags
> arg_bits
> >            ; let fun_type | null liveness_data = aRG_GEN
> >                           | otherwise          = aRG_GEN_BIG
> >                  extra_bits = [ packIntsCLit dflags fun_type arity ]
> >                            ++ (if inlineSRT dflags then [] else [
> srt_lit ])
> >                            ++ [ liveness_lit, slow_entry ]
> >            ; return (Nothing, Nothing, extra_bits, liveness_data) }
> >
> > This uses the word "liveness" rather than "pointers".
> >
> > However I just realized that the word "garbage" is still not the best
> way to
> > describe what I'm trying to say. In the example
> >
> >     [pap_info, x, y, z]
> >
> > If the function's bitmap is [1, 0, 1], then `y` may be a dead (an unused
> > argument, or "garbage" as I describe in my previous email) OR it may be a
> > non-pointer, but used (i.e. not a garbage).
> >
> >
> >
> > I don't think we ever put a zero in the bitmap for a
> pointer-but-not-used argument. We don't do liveness analysis for function
> arguments, as far as I'm aware. So a 0 in the bitmap always means
> "non-pointer".
> >
> >
> >
> > The only reaosn the code uses the terminology "liveness" here is that
> it's sharing code with the code that handles bitmaps for stack frames,
> which do deal with liveness.
> >
> >
> >
> > So maybe "liveness" is also not the best way to describe this bitmap, as
> 0 does
> > not mean dead but rather "don't follow in GC".
> >
> >
> > On my quest to understand and document this code better I have one more
> > question. When generating info tables for functions with know argument
> patterns
> > (ArgSpec) we initialize the bitmap as 0. Relevant code:
> >
> >     mk_pieces (Fun arity (ArgSpec fun_type)) srt_label
> >       = do { let extra_bits = packIntsCLit dflags fun_type arity :
> srt_label
> >            ; return (Nothing, Nothing,  extra_bits, []) }
> >
> > Here the last return value is for the liveness data. I don't understand
> how can
> > this be correct, because when we use this function in a PAP this will
> cause NOT
> > scavenging the PAP payload. Relevant code (simplified):
> >
> >     STATIC_INLINE GNUC_ATTR_HOT StgPtr
> >     scavenge_PAP_payload (StgClosure *fun, StgClosure **payload, StgWord
> size)
> >     {
> >         const StgFunInfoTable *fun_info =
> >             get_fun_itbl(UNTAG_CONST_CLOSURE(fun));
> >
> >         StgPtr p = (StgPtr)payload;
> >
> >         StgWord bitmap;
> >         switch (fun_info->f.fun_type) {
> >         ...
> >
> >         default:
> >             bitmap = BITMAP_BITS(stg_arg_bitmaps[fun_info->f.fun_type]);
> >         small_bitmap:
> >             p = scavenge_small_bitmap(p, size, bitmap);
> >             break;
> >         }
> >         return p;
> >     }
> >
> >
> > Here if I have a function with three pointer args (ARG_PPP) the shown
> branch
> > that will be taken, but because the bitmap is 0 (as shown in the
> mk_pieces code
> > above) nothing in the PAPs payload will be scavenged.
> >
> >
> >
> > It gets the bitmap from stg_arg_bitmaps[fun_info->f.fun_type], not from
> the info table.  Hope this helps.
> >
> >
> >
> > Cheers
> >
> > Simon
> >
> >
> >
> >
> >
> >
> > Here's an example from a debugging session:
> >
> >     >>> print pap
> >     $10 = (StgPAP *) 0x42001fe030
> >
> >     >>> print *pap
> >     $11 = {
> >       header = {
> >         info = 0x7fbdd1f06640 <stg_PAP_info>
> >       },
> >       arity = 2,
> >       n_args = 1,
> >       fun = 0x7fbdd2d23ffb,
> >       payload = 0x42001fe048
> >     }
> >
> > So this PAP is applied one argument, which is a boxed object (a FUN_2_0):
> >
> >     >>> print *get_itbl(UNTAG_CLOSURE(pap->payload[0]))
> >     $20 = {
> >       layout = {
> >         payload = {
> >           ptrs = 2,
> >           nptrs = 0
> >         },
> >         bitmap = 2,
> >         large_bitmap_offset = 2,
> >         __pad_large_bitmap_offset = 2,
> >         selector_offset = 2
> >       },
> >       type = 11,
> >       srt = 1914488,
> >       code = 0x7fbdd2b509c0 "H\215E\370L9\370r[I\203\304 M;\245X\003"
> >     }
> >
> > However if I look at the function of this PAP:
> >
> >     >>> print *get_fun_itbl(UNTAG_CLOSURE(pap->fun))
> >     $21 = {
> >       f = {
> >         slow_apply_offset = 16,
> >         __pad_slow_apply_offset = 3135120895,
> >         b = {
> >           bitmap = 74900193017889,
> >           bitmap_offset = 258342945,
> >           __pad_bitmap_offset = 258342945
> >         },
> >         fun_type = 23,
> >         arity = 3
> >       },
> >       i = {
> >         layout = {
> >           payload = {
> >             ptrs = 0,
> >             nptrs = 0
> >           },
> >           bitmap = 0,
> >           large_bitmap_offset = 0,
> >           __pad_large_bitmap_offset = 0,
> >           selector_offset = 0
> >         },
> >         type = 14,
> >         srt = 1916288,
> >         code = 0x7fbdd2b50260 <base_GHCziRead_list3_info>
> > "I\203\304(M;\245X\003"
> >       }
> >     }
> >
> > It has arity 3. Since the first argument is a boxed object and this
> function has
> > arity 3, if the argument is actually live in the function (i.e. not an
> unused
> > argument), then the bitmap should have a 1 for this. But because the
> argument
> > pattern is known (ARG_PPP) we initialized the bitmap as 0! Not sure how
> this
> > can work.
> >
> > What am I missing?
> >
> > Thanks,
> >
> > Ömer
> >
> > Ben Gamari <ben at smart-cactus.org>, 14 Şub 2020 Cum, 20:25 tarihinde
> şunu yazdı:
> > >
> > > Ömer Sinan Ağacan <omeragacan at gmail.com> writes:
> > >
> > > > I think that makes sense, with the invariant that n_args <=
> bitmap_size. We
> > > > evacuate the arguments used by the function but not others. Thanks.
> > > >
> > > > It's somewhat weird to see an object with useful stuff, then
> garbage, then
> > > > useful stuff again in the heap, but that's not an issue by itself.
> For example
> > > > if I have something like
> > > >
> > > >     [pap_info, x, y, z]
> > > >
> > > > and according to the function `y` is dead, then after evacuating I
> get
> > > >
> > > >     [pap_info, x, <garbage>, z]
> > > >
> > > > This "garbage" is evacuated again and again every time we evacuate
> this PAP.
> > > >
> > > I'm not sure what you mean by "garbage". The bitmap merely determines
> > > whether a field is a pointer, not whether it is copied during
> > > evacuation. A field's bitmap bit not being set merely means that we
> won't
> > > evacuate the value of that field during scavenging.
> > >
> > > Nevertheless, this all deserves a comment in scavenge_PAP.
> > >
> > > Cheers,
> > >
> > > - Ben
> > >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20200227/e2d2bb73/attachment.html>


More information about the ghc-devs mailing list