Nested constructed product results?

Tue Dec 15 08:20:11 UTC 2020

Hi Alexis,

that's a very interesting example you have there!

So far, what we referred to as Nested CPR concerned unboxing for returned
nested *records*, e.g., the `annotation` field in your example. That's what
I try to exploit in !1866
<https://gitlab.haskell.org/ghc/ghc/-/merge_requests/1866>, which after a
rebase that I'll hopefully be doing this week, some more sleuthing and then
documenting what I did will finally make it into master.

CPR'ing the Lambda, i.e., what is returned for `parser`, on the other hand,
is a surprising new opportunity for what Nested CPR could do beyond
unboxing records! And it's pretty simple, too: Because it's a function, we
don't care about subtleties such as whether all callers actually evaluate
the pair that deep (actually, that's wrong, as I realise below). I think
it's entirely within the reach of !1866 today. So we could transform
(provided that `(,) <$> a <*> b` inlines `<$>` and `<*>` and then will
actually have the CPR property)

    AnnotatedParser ann1 f <+> AnnotatedParser ann2 g = AnnotatedParser
      { annotation = Seq ann1 ann2
      , parser = \s1 ->
          let !(a, s2) = f s1
              !(b, s3) = g s2
          in ((,) <$> a <*> b, s3)
      }

to

    $w<+> :: Annotation
          -> (String -> (Maybe a, String))
          -> Annotation
          -> (String -> (Maybe b, String))
          -> (# Annotation, String -> (# Maybe (a, b), String #) #)
    $w<+> ann1 f ann2 g =
      (# Seq ann1 ann2
       , \s1 -> case (\s1 -> let !(a, s2) = f s1
                    !(b, s3) = g s2
                in ((,) <$> a <*> b) s1 of (p, q) -> (#p, q#), s3) #)

    <+> :: AnnotatedParser a -> AnnotatedParser b -> AnnotatedParser (a, b)
    <+> (AnnotatedParser ann1 f) (AnnotatedParser ann2 g) =
      case $w<+> ann1 f ann2 g of
        (# a, b #) -> AnnotatedParser (\s1 -> case a s1 of (# p, q#) -> (p,
q)) b
    {-# INLINE <+> #-}

Actually writing out the transformation tells me that this isn't always a
win: We now have to allocate a lambda in the wrapper. That is only a win if
that lambda cancels away at call sites! So we have to make sure that all
call sites of the wrapper actually call the `parser`, so that the lambda
simplifies away. If it doesn't, we have a situation akin to reboxing. So I
was wrong above when I said "we don't care about subtleties such as whether
all callers actually evaluate the pair that deep": We very much need to
know whether all call sites call the lambda. Luckily, I implemented just
that <https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4493> for
exploitation by Nested CPR! That's the reason why I need to rebase !1866
now. I'll ḱeep you posted.

---

You might wonder why CPR today doesn't care for lambdas. Well, they only
make sense in nested scenarios (otherwise the function wasn't eta-expanded
that far, for good reasons) and CPR currently doesn't bother unboxing
records nestedly, which is what #18174
<https://gitlab.haskell.org/ghc/ghc/-/issues/18174> discusses and what
!1866 tries to fix.

Cheers,
Sebastian

Am Di., 15. Dez. 2020 um 06:52 Uhr schrieb Alexis King <
lexi.lambda at gmail.com>:

> Hi all,
>
> I spent some time today looking into the performance of a program
> involving a parser type that looks something like this:
>
>     data AnnotatedParser a = AnnotatedParser
>       { annotation :: Annotation
>       , parser :: String -> (Maybe a, String)
>       }
>
> The `Annotation` records metadata about the structure of an
> `AnnotatedParser` that can be accessed statically (that is, without
> having to run the parser on some input). `AnnotatedParser`s are built
> from various primitive constructors and composed using various
> combinators. These combinators end up looking something like this:
>
>     (<+>) :: AnnotatedParser a -> AnnotatedParser b -> AnnotatedParser (a,
> b)
>     AnnotatedParser ann1 f <+> AnnotatedParser ann2 g = AnnotatedParser
>       { annotation = Seq ann1 ann2
>       , parser = \s1 ->
>           let !(a, s2) = f s1
>               !(b, s3) = g s2
>           in ((,) <$> a <*> b, s3)
>       }
>
> Use of these combinators leads to the construction and subsequent case
> analysis of numerous `AnnotatedParser` closures. Happily, constructed
> product result[1] analysis kicks in and rewrites such combinators to cut
> down on the needless boxing, leading to worker/wrapper splits like this:
>
>     $w<+> :: Annotation
>           -> (String -> (Maybe a, String))
>           -> Annotation
>           -> (String -> (Maybe b, String))
>           -> (# Annotation, String -> (Maybe (a, b), String) #)
>     $w<+> ann1 f ann2 g =
>       (# Seq ann1 ann2
>        , \s1 -> let !(a, s2) = f s1
>                     !(b, s3) = g s2
>                 in ((,) <$> a <*> b, s3) #)
>
>     <+> :: AnnotatedParser a -> AnnotatedParser b -> AnnotatedParser (a, b)
>     <+> (AnnotatedParser ann1 f) (AnnotatedParser ann2 g) =
>       case $w<+> ann1 f ann2 g of
>         (# a, b #) -> AnnotatedParser a b
>     {-# INLINE <+> #-}
>
> This is great, and it cuts down on allocation significantly, but there
> is still something unsatisfying about it: the `parser` function inside
> the record is not affected by CPR! This is a shame, because
> essentially all use sites immediately deconstruct the pair, making it
> a prime candidate for unboxing. Ideally, we’d like to get this, instead:
>
>     $w<+> :: Annotation
>           -> (String -> (Maybe a, String))
>           -> Annotation
>           -> (String -> (Maybe b, String))
>           -> (# Annotation, String -> (# Maybe (a, b), String #) #)
>     $w<+> ann1 f ann2 g =
>       (# Seq ann1 ann2
>        , \s1 -> let !(a, s2) = f s1
>                     !(b, s3) = g s2
>                 in (# (,) <$> a <*> b, s3 #) #)
>
> In practice, little combinators like `$w<+>` are marked INLINE, so `f`
> and `g` are usually known rather than unknown calls. This nested CPR
> transformation would allow the tuple construction to fuse with the
> tuple deconstruction, eliminating quite a lot of unnecessary
> boxing/unboxing.
>
> Unfortunately, it seems as though GHC’s implementation of CPR is
> entirely first-order: although function arguments are given rich
> demand signatures, results are only described one level deep. But as
> the above example hopefully illustrates, that’s leaving significant
> optimization opportunities on the table! Hence, my questions:
>
>     1. Has this notion of “nested CPR” been explored at all before?
>     2. Does such an extension to CPR sound worth its weight?
>
> I peeked a little at GHC.Types.Cpr and GHC.Core.Opt.CprAnal, and it
> seems quite manageable to me, but I haven’t actually looked into an
> implementation attempt just yet. I’m mostly interested in whether
> others have thought about something like this and/or run into similar
> issues in the past, or if this is really an unusual construction.
>
> Thanks,
> Alexis
>
> [1]:
> https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/cpr.pdf
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20201215/0c8494f0/attachment.html>