# Why isn't (\$) inlining when I want?

John Lato jwlato at gmail.com
Wed Aug 27 23:16:33 UTC 2014

```I sometimes think the solution is to make let-floating apply in fewer
cases.  I'm not sure we ever want to float out intermediate lists, the cost
of creating them is very small relative to the memory consumption if they
do happen to get shared.

My approach is typically to mark loop INLINE.  This very often results in
the code I want (with vector, which I use more than lists), but it is a big
hammer to apply.

John

On Thu, Aug 28, 2014 at 5:56 AM, Dan Doel <dan.doel at gmail.com> wrote:

> I think talking about inlining of \$ may not be addressing the crux of the
> problem here.
>
> The issue seems to be about functions like the one in the first message.
> For instance:
>
>     loop :: (Int -> Int) -> Int
>     loop g = sum . map g \$ [1..1000000]
>
> Suppose for argument that we have a fusion framework that would handle
> this. The problem is that this does not actually turn into a loop over
> integers, because the constant [1..1000000] gets floated out. It instead
> builds a list/vector/whatever.
>
> By contrast, if we write:
>
>     loop' :: Int
>     loop' = sum . map (+1) \$ [1..1000000]
>
> this does turn into a loop over integers, with no intermediate list.
> Presumably this is due to there being no work to be saved ever by floating
> the list out. These are the examples people usually test fusion with.
>
> And if loop is small enough to inline, it turns out that the actual code
> that gets run will be the same as loop', because everything will get
> inlined and fused. But it is also possible to make loop big enough to not
> inline, and then the floating will pessimize the overall code.
>
> So the core issue is that constant floating blocks some fusion
> opportunities. It is trying to save the work of building the structure more
> than once, but fusion can cause the structure to not be built at all. And
> the floating happens before fusion can reasonably be expected to work.
>
>
> I've verified that this kind of situation also affects vector. And it
> seems to be an issue even if loop is written:
>
>     loop g = sum (map g [1..1000000])
>
> -- Dan
>
>
> On Wed, Aug 27, 2014 at 3:38 PM, Simon Peyton Jones <simonpj at microsoft.com
> > wrote:
>
>> You'll have to do more detective work! In your dump I see "Inactive
>> unfolding \$".  So that's why it's not being inlined.  That message comes
>> from CoreUnfold, line 941 or so.  The Boolean active_unfolding is passed in
>> to callSiteInline from Simplify, line 1408 or so.  It is generated by the
>> function activeUnfolding, defined in SimplUtils.
>>
>> But you have probably change the "CompilerPhase" data type, so I can't
>> guess what is happening.  But if you just follow it through I'm sure you'll
>> find it.
>>
>> Simon
>>
>> | -----Original Message-----
>> | From: David Feuer [mailto:david.feuer at gmail.com]
>> | Sent: 27 August 2014 17:22
>> | To: Simon Peyton Jones
>> | Cc: ghc-devs
>> | Subject: Re: Why isn't (\$) inlining when I want?
>> |
>> | I just ran that (results attached), and as far as I can tell, it
>> | doesn't even *consider* inlining (\$) until phase 2.
>> |
>> | On Wed, Aug 27, 2014 at 4:03 AM, Simon Peyton Jones
>> | <simonpj at microsoft.com> wrote:
>> | > It's hard to tell since you are using a modified compiler.  Try
>> running
>> | with -ddump-occur-anal -dverbose-core2core -ddump-inlinings.  That will
>> | show you every inlining, whether failed or successful. You can see the
>> | attempt to inline (\$) and there is some info with the output that may
>> | help to explain why it did or did not work.
>> | >
>> | > Try that
>> | >
>> | > Simon
>> | >
>> | > | -----Original Message-----
>> | > | From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of
>> | David
>> | > | Feuer
>> | > | Sent: 27 August 2014 04:50
>> | > | To: ghc-devs; Carter Schonwald
>> | > | Subject: Why isn't (\$) inlining when I want?
>> | > |
>> | > | tl;dr  I added a simplifier run with inlining enabled between
>> | > | specialization and floating out. It seems incapable of inlining
>> | > | saturated applications of (\$), and I can't figure out why. These are
>> | > | inlined later, when phase 2 runs. Am I running the simplifier wrong
>> | or
>> | > | something?
>> | > |
>> | > |
>> | > | I'm working on this simple little fusion pipeline:
>> | > |
>> | > | {-# INLINE takeWhile #-}
>> | > | takeWhile p xs = build builder
>> | > |   where
>> | > |     builder c n = foldr go n xs
>> | > |       where
>> | > |         go x r = if p x then x `c` r else n
>> | > |
>> | > | foo c n x = foldr c n . takeWhile (/= (1::Int)) \$ [-9..10]
>> | > |
>> | > | There are some issues with the enumFrom definition that break
>> things.
>> | > | If I use a fusible unfoldr that produces some numbers instead, that
>> | > | issue goes away. Part of that problem (but not all of it) is that
>> the
>> | > | simplifier doesn't run to apply rules between specialization and
>> full
>> | > | laziness, so there's no opportunity for the specialization of
>> | > | enumFromTo to Int to trigger the rewrite to a build form and fusion
>> | > | with foldr before full laziness tears things apart. The other
>> problem
>> | > | is that inlining doesn't happen at all before full laziness, so
>> | things
>> | > | defined using foldr and/or build aren't actually exposed as such
>> | until
>> | > | afterwards. Therefore I decided to try adding a simplifier run with
>> | > | inlining between specialization and floating out:
>> | > |
>> | > |         runWhen do_specialise CoreDoSpecialising,
>> | > |
>> | > |         runWhen full_laziness \$ CoreDoSimplify max_iter
>> | > |                        (base_mode { sm_phase = InitialPhase
>> | > |                                   , sm_names = ["PostGentle"]
>> | > |                                   , sm_rules = rules_on
>> | > |                                   , sm_inline = True
>> | > |                                   , sm_case_case = False }),
>> | > |
>> | > |         runWhen full_laziness \$
>> | > |            CoreDoFloatOutwards FloatOutSwitches {
>> | > |                                  floatOutLambdas   = Just 0,
>> | > |                                  floatOutConstants = True,
>> | > |                                  floatOutPartialApplications = False
>> | },
>> | > |
>> | > | The weird thing is that for some reason this doesn't inline (\$),
>> even
>> | > | though it appears to be saturated. Using the modified thing with (my
>> | > | version of) unfoldr:
>> | > |
>> | > | foo c n x = (foldr c n . takeWhile (/= (1::Int))) \$ unfoldr (potato
>> | 10)
>> | > | (-9)
>> | > |
>> | > | potato :: Int -> Int -> Maybe (Int, Int)
>> | > | potato n m | m <= n = Just (m, m)
>> | > |            | otherwise = Nothing
>> | > |
>> | > |
>> | > | I get this out of the specializer:
>> | > |
>> | > | foo
>> | > | foo =
>> | > |   \ @ t_a1Za @ c_a1Zb c_a1HT n_a1HU _ ->
>> | > |     \$ (. (foldr c_a1HT n_a1HU)
>> | > |          (takeWhile
>> | > |             (let {
>> | > |                ds_s21z
>> | > |                ds_s21z = I# 1 } in
>> | > |              \ ds_d1Zw -> neInt ds_d1Zw ds_s21z)))
>> | > |       (let {
>> | > |          n_s21x
>> | > |          n_s21x = I# 10 } in
>> | > |        unfoldr
>> | > |          (\ m_a1U7 ->
>> | > |             case leInt m_a1U7 n_s21x of _ {
>> | > |               False -> Nothing;
>> | > |               True -> Just (m_a1U7, m_a1U7)
>> | > |             })
>> | > |          (\$fNumInt_\$cnegate (I# 9)))
>> | > |
>> | > |
>> | > | and then I get this out of my extra simplifier run:
>> | > |
>> | > | foo
>> | > | foo =
>> | > |   \ @ t_a1Za @ c_a1Zb c_a1HT n_a1HU _ ->
>> | > |     \$ (\ x_a20f ->
>> | > |          foldr
>> | > |            (\ x_a1HR r_a1HS ->
>> | > |               case case x_a1HR of _ { I# x_a20R ->
>> | > |                    tagToEnum#
>> | > |                      (case x_a20R of _ {
>> | > |                         __DEFAULT -> 1;
>> | > |                         1 -> 0
>> | > |                       })
>> | > |                    }
>> | > |               of _ {
>> | > |                 False -> n_a1HU;
>> | > |                 True -> c_a1HT x_a1HR r_a1HS
>> | > |               })
>> | > |            n_a1HU
>> | > |            x_a20f)
>> | > |       (let {
>> | > |          b'_a1ZS
>> | > |          b'_a1ZS = \$fNumInt_\$cnegate (I# 9) } in
>> | > |        \$ (build)
>> | > |          (\ @ b1_a1ZU c_a1ZV n_a1ZW ->
>> | > |             letrec {
>> | > |               go_a1ZX
>> | > |               go_a1ZX =
>> | > |                 \ b2_a1ZY ->
>> | > |                   case case case b2_a1ZY of _ { I# x_a218 ->
>> | > |                             tagToEnum# (<=# x_a218 10)
>> | > |                             }
>> | > |                        of _ {
>> | > |                          False -> Nothing;
>> | > |                          True -> Just (b2_a1ZY, b2_a1ZY)
>> | > |                        }
>> | > |                   of _ {
>> | > |                     Nothing -> n_a1ZW;
>> | > |                     Just ds_a203 ->
>> | > |                       case ds_a203 of _ { (a1_a207, new_b_a208) ->
>> | > |                       c_a1ZV a1_a207 (go_a1ZX new_b_a208)
>> | > |                       }
>> | > |                   }; } in
>> | > |             go_a1ZX b'_a1ZS))
>> | > |
>> | > |
>> | > | That is, neither the \$ in the code nor the \$ that was inserted when
>> | > | inlining unfoldr got inlined themselves, even though both appear to
>> | be
>> | > | saturated. As a result, foldr/build doesn't fire, and full laziness
>> | > | tears things apart. Later on, in simplifier phase 2, \$ gets inlined.
>> | > | What's preventing this from happening in the PostGentle phase I
>> | > |
>> | > | David Feuer
>> | > | _______________________________________________
>> | > | ghc-devs mailing list
>> | > | ghc-devs at haskell.org
>> _______________________________________________
>> ghc-devs mailing list
>>
>
>
> _______________________________________________
> ghc-devs mailing list