Confused about PAP object layout
Simon Marlow
marlowsd at gmail.com
Thu Feb 27 08:37:31 UTC 2020
On Wed, 26 Feb 2020 at 18:48, Ömer Sinan Ağacan <omeragacan at gmail.com>
wrote:
> So the key points from this thread are:
>
> - PAP payloads are scavenged using the function's bitmap. Because a PAPs
> payload
> will have less number of closures than the function's arity the bitmap
> will
> always have enough bits.
>
> - A bit in a function bitmap is NOT for liveness (e.g. does not indicate
> whether
> an argument used or not), but for pointers vs. non-pointers. Function
> bitmaps
> are called "liveness bits" in the code generator which is misleading.
>
I think of all bitmaps as representing "liveness" (or equivalently
"pointerhood") for the purposes of GC. There's no difference from the GC's
perspective between a non-pointer and a pointer that it doesn't need to
follow.
In fact there's nothing to prevent us using the function bitmap to indicate
dead arguments too - it would require zero changes in the RTS, the compiler
would only need to mark unused pointer arguments as non-pointers in the
bitmap. Probably wouldn't be worth very much overall, but I do recall one
space leak that would have been cured by this.
- In a function bitmap (small or large), 0 means pointer, 1 means
> non-pointer.
>
This is true of bitmaps generally I think, not just function bitmaps.
This is really what confused me in my last email above. For some reason I
> intuitively expected 1 to mean pointer, not 0. Simon M also got this
> wrong
>
Oops :)
I think there may originally have been a good reason to have it this way
around: before eval/apply, we used bitmaps to describe stack frames, but we
didn't need to encode a size in the bitmap because the default was for the
stack contents to be pointers unless there was something to tell us
otherwise. So a zero suffix of a bitmap just meant "the rest is just normal
stack". This changed with eval/apply, but we kept the convention that zero
meant pointer in a bitmap.
> ("So a 0 in the bitmap always means non-pointer.") so maybe this is
> confusing
> to others too.
>
> - For functions with known argument patterns we don't use the function's
> bitmap.
> These function's type are greater than ARG_BCO (2), and for those we use
> the
> stg_arg_bitmaps array to get the bitmap.
>
> For example, the bitmap for ARG_PPP (function with 3 pointer arguments)
> is at
> index 23 in this array, which is 0b11. For ARG_PNN it's 0b110000011. The
> least
> significant 6 bits are for the size (3), the remaining 0b110 means the
> first
> argument is a pointer, rest of the two are non-pointers.
>
Actually I think documentation on this is missing in the wiki, I guess I
never got around to updating it when we implemented eval/apply. This page
should really describe function info tables:
https://gitlab.haskell.org/ghc/ghc/wikis/commentary/rts/storage/heap-objects#info-tables
If you want to add documentation that would be a good place.
Cheers
Simon
I still don't understand why this assertion
>
> ASSERT(BITMAP_SIZE(bitmap) >= size);
>
> I added to scavenge_small_bitmap in !2727 is failing though.
>
> Ömer
>
> Simon Peyton Jones <simonpj at microsoft.com>, 24 Şub 2020 Pzt, 13:45
> tarihinde şunu yazdı:
> >
> > I’m not following this in detail, but do please make sure that the
> results of this discussion end up in a suitable Note. Obviously it’s not
> transparently clear as-is, and I can see clarity emerging
> >
> >
> >
> > Thanks!
> >
> >
> > Simon
> >
> >
> >
> > From: ghc-devs <ghc-devs-bounces at haskell.org> On Behalf Of Simon Marlow
> > Sent: 24 February 2020 08:22
> > To: Ömer Sinan Ağacan <omeragacan at gmail.com>
> > Cc: ghc-devs <ghc-devs at haskell.org>
> > Subject: Re: Confused about PAP object layout
> >
> >
> >
> > On Thu, 20 Feb 2020 at 09:21, Ömer Sinan Ağacan <omeragacan at gmail.com>
> wrote:
> >
> > > I'm not sure what you mean by "garbage". The bitmap merely determines
> whether
> > > a field is a pointer,
> >
> > I think the bitmap is for liveness, not for whether a field is pointer
> or not.
> > Relevant code for building an info table for a function:
> >
> > mk_pieces (Fun arity (ArgGen arg_bits)) srt_label
> > = do { (liveness_lit, liveness_data) <- mkLivenessBits dflags
> arg_bits
> > ; let fun_type | null liveness_data = aRG_GEN
> > | otherwise = aRG_GEN_BIG
> > extra_bits = [ packIntsCLit dflags fun_type arity ]
> > ++ (if inlineSRT dflags then [] else [
> srt_lit ])
> > ++ [ liveness_lit, slow_entry ]
> > ; return (Nothing, Nothing, extra_bits, liveness_data) }
> >
> > This uses the word "liveness" rather than "pointers".
> >
> > However I just realized that the word "garbage" is still not the best
> way to
> > describe what I'm trying to say. In the example
> >
> > [pap_info, x, y, z]
> >
> > If the function's bitmap is [1, 0, 1], then `y` may be a dead (an unused
> > argument, or "garbage" as I describe in my previous email) OR it may be a
> > non-pointer, but used (i.e. not a garbage).
> >
> >
> >
> > I don't think we ever put a zero in the bitmap for a
> pointer-but-not-used argument. We don't do liveness analysis for function
> arguments, as far as I'm aware. So a 0 in the bitmap always means
> "non-pointer".
> >
> >
> >
> > The only reaosn the code uses the terminology "liveness" here is that
> it's sharing code with the code that handles bitmaps for stack frames,
> which do deal with liveness.
> >
> >
> >
> > So maybe "liveness" is also not the best way to describe this bitmap, as
> 0 does
> > not mean dead but rather "don't follow in GC".
> >
> >
> > On my quest to understand and document this code better I have one more
> > question. When generating info tables for functions with know argument
> patterns
> > (ArgSpec) we initialize the bitmap as 0. Relevant code:
> >
> > mk_pieces (Fun arity (ArgSpec fun_type)) srt_label
> > = do { let extra_bits = packIntsCLit dflags fun_type arity :
> srt_label
> > ; return (Nothing, Nothing, extra_bits, []) }
> >
> > Here the last return value is for the liveness data. I don't understand
> how can
> > this be correct, because when we use this function in a PAP this will
> cause NOT
> > scavenging the PAP payload. Relevant code (simplified):
> >
> > STATIC_INLINE GNUC_ATTR_HOT StgPtr
> > scavenge_PAP_payload (StgClosure *fun, StgClosure **payload, StgWord
> size)
> > {
> > const StgFunInfoTable *fun_info =
> > get_fun_itbl(UNTAG_CONST_CLOSURE(fun));
> >
> > StgPtr p = (StgPtr)payload;
> >
> > StgWord bitmap;
> > switch (fun_info->f.fun_type) {
> > ...
> >
> > default:
> > bitmap = BITMAP_BITS(stg_arg_bitmaps[fun_info->f.fun_type]);
> > small_bitmap:
> > p = scavenge_small_bitmap(p, size, bitmap);
> > break;
> > }
> > return p;
> > }
> >
> >
> > Here if I have a function with three pointer args (ARG_PPP) the shown
> branch
> > that will be taken, but because the bitmap is 0 (as shown in the
> mk_pieces code
> > above) nothing in the PAPs payload will be scavenged.
> >
> >
> >
> > It gets the bitmap from stg_arg_bitmaps[fun_info->f.fun_type], not from
> the info table. Hope this helps.
> >
> >
> >
> > Cheers
> >
> > Simon
> >
> >
> >
> >
> >
> >
> > Here's an example from a debugging session:
> >
> > >>> print pap
> > $10 = (StgPAP *) 0x42001fe030
> >
> > >>> print *pap
> > $11 = {
> > header = {
> > info = 0x7fbdd1f06640 <stg_PAP_info>
> > },
> > arity = 2,
> > n_args = 1,
> > fun = 0x7fbdd2d23ffb,
> > payload = 0x42001fe048
> > }
> >
> > So this PAP is applied one argument, which is a boxed object (a FUN_2_0):
> >
> > >>> print *get_itbl(UNTAG_CLOSURE(pap->payload[0]))
> > $20 = {
> > layout = {
> > payload = {
> > ptrs = 2,
> > nptrs = 0
> > },
> > bitmap = 2,
> > large_bitmap_offset = 2,
> > __pad_large_bitmap_offset = 2,
> > selector_offset = 2
> > },
> > type = 11,
> > srt = 1914488,
> > code = 0x7fbdd2b509c0 "H\215E\370L9\370r[I\203\304 M;\245X\003"
> > }
> >
> > However if I look at the function of this PAP:
> >
> > >>> print *get_fun_itbl(UNTAG_CLOSURE(pap->fun))
> > $21 = {
> > f = {
> > slow_apply_offset = 16,
> > __pad_slow_apply_offset = 3135120895,
> > b = {
> > bitmap = 74900193017889,
> > bitmap_offset = 258342945,
> > __pad_bitmap_offset = 258342945
> > },
> > fun_type = 23,
> > arity = 3
> > },
> > i = {
> > layout = {
> > payload = {
> > ptrs = 0,
> > nptrs = 0
> > },
> > bitmap = 0,
> > large_bitmap_offset = 0,
> > __pad_large_bitmap_offset = 0,
> > selector_offset = 0
> > },
> > type = 14,
> > srt = 1916288,
> > code = 0x7fbdd2b50260 <base_GHCziRead_list3_info>
> > "I\203\304(M;\245X\003"
> > }
> > }
> >
> > It has arity 3. Since the first argument is a boxed object and this
> function has
> > arity 3, if the argument is actually live in the function (i.e. not an
> unused
> > argument), then the bitmap should have a 1 for this. But because the
> argument
> > pattern is known (ARG_PPP) we initialized the bitmap as 0! Not sure how
> this
> > can work.
> >
> > What am I missing?
> >
> > Thanks,
> >
> > Ömer
> >
> > Ben Gamari <ben at smart-cactus.org>, 14 Şub 2020 Cum, 20:25 tarihinde
> şunu yazdı:
> > >
> > > Ömer Sinan Ağacan <omeragacan at gmail.com> writes:
> > >
> > > > I think that makes sense, with the invariant that n_args <=
> bitmap_size. We
> > > > evacuate the arguments used by the function but not others. Thanks.
> > > >
> > > > It's somewhat weird to see an object with useful stuff, then
> garbage, then
> > > > useful stuff again in the heap, but that's not an issue by itself.
> For example
> > > > if I have something like
> > > >
> > > > [pap_info, x, y, z]
> > > >
> > > > and according to the function `y` is dead, then after evacuating I
> get
> > > >
> > > > [pap_info, x, <garbage>, z]
> > > >
> > > > This "garbage" is evacuated again and again every time we evacuate
> this PAP.
> > > >
> > > I'm not sure what you mean by "garbage". The bitmap merely determines
> > > whether a field is a pointer, not whether it is copied during
> > > evacuation. A field's bitmap bit not being set merely means that we
> won't
> > > evacuate the value of that field during scavenging.
> > >
> > > Nevertheless, this all deserves a comment in scavenge_PAP.
> > >
> > > Cheers,
> > >
> > > - Ben
> > >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20200227/e2d2bb73/attachment.html>
More information about the ghc-devs
mailing list