<div dir="ltr"><div>Hi Alexis,</div><div><br></div><div>that's a very interesting example you have there!</div><div><br></div><div>So far, what we referred to as Nested CPR concerned unboxing for returned nested *records*, e.g., the `annotation` field in your example. That's what I try to exploit in <a href="https://gitlab.haskell.org/ghc/ghc/-/merge_requests/1866">!1866</a>, which after a rebase that I'll hopefully be doing this week, some more sleuthing and then documenting what I did will finally make it into master.</div><div><br></div><div>CPR'ing the Lambda, i.e., what is returned for `parser`, on the other hand, is a surprising new opportunity for what Nested CPR could do beyond unboxing records! And it's pretty simple, too: Because it's a function, we don't care about subtleties such as whether all callers actually evaluate the pair that deep (actually, that's wrong, as I realise below). I think it's entirely within the reach of !1866 today. So we could transform (provided that `(,) <$> a <*> b` inlines `<$>` and `<*>` and then will actually have the CPR property) <br></div><div><br></div><div>     AnnotatedParser ann1 f <+> AnnotatedParser ann2 g = AnnotatedParser<br>

      { annotation = Seq ann1 ann2<br>

      , parser = \s1 -><br>

          let !(a, s2) = f s1<br>

              !(b, s3) = g s2<br>

          in ((,) <$> a <*> b, s3)<br>

      }</div><div><br></div><div>to</div><div><br></div><div>    $w<+> :: Annotation<br>

          -> (String -> (Maybe a, String))<br>

          -> Annotation<br>

          -> (String -> (Maybe b, String))<br>

          -> (# Annotation, String -> (# Maybe (a, b), String #) #)<br>

    $w<+> ann1 f ann2 g =<br>

      (# Seq ann1 ann2<br>

       , \s1 -> case (\s1 -> let !(a, s2) = f s1<br>

                    !(b, s3) = g s2<br>

                in ((,) <$> a <*> b) s1 of (p, q) -> (#p, q#), s3) #)</div><div><br></div><div>    <+> :: AnnotatedParser a -> AnnotatedParser b -> AnnotatedParser (a, b)<br>

    <+> (AnnotatedParser ann1 f) (AnnotatedParser ann2 g) =<br>

      case $w<+> ann1 f ann2 g of<br>

        (# a, b #) -> AnnotatedParser (\s1 -> case a s1 of (# p, q#) -> (p, q)) b<br>

    {-# INLINE <+> #-}</div><div><br></div><div>Actually writing out the transformation tells me that this isn't always a win: We now have to allocate a lambda in the wrapper. That is only a win if that lambda cancels away at call sites! So we have to make sure that all call sites of the wrapper actually call the `parser`, so that the lambda simplifies away. If it doesn't, we have a situation akin to reboxing. So I was wrong above when I said "we don't care about subtleties such as whether all callers actually evaluate the pair that deep": We very much need to know whether all call sites call the lambda. Luckily, I implemented <a href="https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4493">just that</a> for exploitation by Nested CPR! That's the reason why I need to rebase !1866 now. I'll ḱeep you posted.<br></div><div><br></div><div>---<br></div><div><br></div><div>You might wonder why CPR today doesn't care for lambdas. Well, they only make sense in nested scenarios (otherwise the function wasn't eta-expanded that far, for good reasons) and CPR currently doesn't bother unboxing records nestedly, which is what <a href="https://gitlab.haskell.org/ghc/ghc/-/issues/18174">#18174</a> discusses and what !1866 tries to fix.<br></div><div><br></div><div>Cheers,<br></div><div>Sebastian<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Am Di., 15. Dez. 2020 um 06:52 Uhr schrieb Alexis King <<a href="mailto:lexi.lambda@gmail.com">lexi.lambda@gmail.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi all,<br>

<br>

I spent some time today looking into the performance of a program<br>

involving a parser type that looks something like this:<br>

<br>

    data AnnotatedParser a = AnnotatedParser<br>

      { annotation :: Annotation<br>

      , parser :: String -> (Maybe a, String)<br>

      }<br>

<br>

The `Annotation` records metadata about the structure of an<br>

`AnnotatedParser` that can be accessed statically (that is, without<br>

having to run the parser on some input). `AnnotatedParser`s are built<br>

from various primitive constructors and composed using various<br>

combinators. These combinators end up looking something like this:<br>

<br>

    (<+>) :: AnnotatedParser a -> AnnotatedParser b -> AnnotatedParser (a, b)<br>

    AnnotatedParser ann1 f <+> AnnotatedParser ann2 g = AnnotatedParser<br>

      { annotation = Seq ann1 ann2<br>

      , parser = \s1 -><br>

          let !(a, s2) = f s1<br>

              !(b, s3) = g s2<br>

          in ((,) <$> a <*> b, s3)<br>

      }<br>

<br>

Use of these combinators leads to the construction and subsequent case<br>

analysis of numerous `AnnotatedParser` closures. Happily, constructed<br>

product result[1] analysis kicks in and rewrites such combinators to cut<br>

down on the needless boxing, leading to worker/wrapper splits like this:<br>

<br>

    $w<+> :: Annotation<br>

          -> (String -> (Maybe a, String))<br>

          -> Annotation<br>

          -> (String -> (Maybe b, String))<br>

          -> (# Annotation, String -> (Maybe (a, b), String) #)<br>

    $w<+> ann1 f ann2 g =<br>

      (# Seq ann1 ann2<br>

       , \s1 -> let !(a, s2) = f s1<br>

                    !(b, s3) = g s2<br>

                in ((,) <$> a <*> b, s3) #)<br>

<br>

    <+> :: AnnotatedParser a -> AnnotatedParser b -> AnnotatedParser (a, b)<br>

    <+> (AnnotatedParser ann1 f) (AnnotatedParser ann2 g) =<br>

      case $w<+> ann1 f ann2 g of<br>

        (# a, b #) -> AnnotatedParser a b<br>

    {-# INLINE <+> #-}<br>

<br>

This is great, and it cuts down on allocation significantly, but there<br>

is still something unsatisfying about it: the `parser` function inside<br>

the record is not affected by CPR! This is a shame, because<br>

essentially all use sites immediately deconstruct the pair, making it<br>

a prime candidate for unboxing. Ideally, we’d like to get this, instead:<br>

<br>

    $w<+> :: Annotation<br>

          -> (String -> (Maybe a, String))<br>

          -> Annotation<br>

          -> (String -> (Maybe b, String))<br>

          -> (# Annotation, String -> (# Maybe (a, b), String #) #)<br>

    $w<+> ann1 f ann2 g =<br>

      (# Seq ann1 ann2<br>

       , \s1 -> let !(a, s2) = f s1<br>

                    !(b, s3) = g s2<br>

                in (# (,) <$> a <*> b, s3 #) #)<br>

<br>

In practice, little combinators like `$w<+>` are marked INLINE, so `f`<br>

and `g` are usually known rather than unknown calls. This nested CPR<br>

transformation would allow the tuple construction to fuse with the<br>

tuple deconstruction, eliminating quite a lot of unnecessary<br>

boxing/unboxing.<br>

<br>

Unfortunately, it seems as though GHC’s implementation of CPR is<br>

entirely first-order: although function arguments are given rich<br>

demand signatures, results are only described one level deep. But as<br>

the above example hopefully illustrates, that’s leaving significant<br>

optimization opportunities on the table! Hence, my questions:<br>

<br>

    1. Has this notion of “nested CPR” been explored at all before?<br>

    2. Does such an extension to CPR sound worth its weight?<br>

<br>

I peeked a little at GHC.Types.Cpr and GHC.Core.Opt.CprAnal, and it<br>

seems quite manageable to me, but I haven’t actually looked into an<br>

implementation attempt just yet. I’m mostly interested in whether<br>

others have thought about something like this and/or run into similar<br>

issues in the past, or if this is really an unusual construction.<br>

<br>

Thanks,<br>

Alexis<br>

<br>

[1]: <a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/cpr.pdf" rel="noreferrer" target="_blank">https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/cpr.pdf</a><br>

<br>

_______________________________________________<br>

ghc-devs mailing list<br>

<a href="mailto:ghc-devs@haskell.org" target="_blank">ghc-devs@haskell.org</a><br>

<a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs" rel="noreferrer" target="_blank">http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs</a><br>

</blockquote></div>