How to prevent GHC (API) from breaking large constants into multiple top-level bindings

Thu Mar 24 16:13:23 UTC 2016

Ok, that makes sense, indeed, we don't want to duplicate that (b:c:[]).
Thank you for explaining the reason behind the flattening.

And yes, the asymptotic complexity of undoing the flattening is linear.
It is just, due to the internal structure of my compiler pipeline, the 
constants of doing that are high, because I perform the substitution 
during my normal inlining-pass, which does a lot of extra checks.

Now that I know why GHC does the things it does, I'll figure out a 
different way to make sure that my compiler doesn't slow down on large 
constants.

Christiaan

On 03/24/2016 04:49 PM, Simon Peyton Jones wrote:
> There's a pretty strong reason for flattening data structures, whether nested or at top level.  Consider
>
> let x = a : (b : (c : [])))
> in
> ....(case x of (p:q) -> e1)...(case x or (r:s) -> e2) ....
>
> We want to cancel out those case expressions.  In the nested form we'd be stuck with
>
> let x = a : (b : (c : [])))
> in
> ....(let { p = a; q = b:c:[] } in e1) ....
>      (let {r = a; s = b:c:[] } in e2)...
>
> But now we have wastefully duplicated that (b:c:[]).
>
> Instead GHC flattens thus:
>
> let x2 = c : []
>      x1 = b : x2
>      x = a : x1
> in
> ....(case x of (p:q) -> e1)...(case x or (r:s) -> e2) ....
>
> And now we can do nice simple case-cancellation:
>
> let x2 = c : []
>      x1 = b : x2
>      x = a : x1
> in
> ....(let { p = a; q = x1 } in e1) ....
>      (let {r = a; s = x2 } in e2)...
>
>
> Bottom line: no, there is no flag to stop this happening.
>
> But surely it should be a linear-time substitution to undo it?
>
> Simon
>
> |  -----Original Message-----
> |  From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of
> |  Christiaan Baaij
> |  Sent: 24 March 2016 13:39
> |  To: ghc-devs at haskell.org
> |  Subject: How to prevent GHC (API) from breaking large constants into
> |  multiple top-level bindings
> |
> |  My situation is the following, given the code:
> |
> |   > {-# LANGUAGE GADTs, DataKinds, TypeOperators, KindSignatures #-}  >
> |  module GConst where  >  > import GHC.TypeLits  >  > data Vec :: Nat ->
> |  * -> *
> |   >   where
> |   >     Nil  :: Vec 0 a
> |   >     Cons :: a -> Vec n a -> Vec (n+1) a
> |   >
> |   > infixr `Cons`
> |   >
> |   > c :: Vec 5 Int
> |   > c = 1 `Cons` 2 `Cons` 3 `Cons` 4 `Cons` 5 `Cons` Nil
> |
> |  The output of the desugarer, 'ghc -O -fforce-recomp -fno-full-laziness
> |  -ddump-ds -dsuppress-all GConst.hs', is:
> |
> |   > c =
> |   >   ($WCons
> |   >      (I# 1)
> |   >      (($WCons
> |   >          (I# 2)
> |   >          (($WCons
> |   >              (I# 3)
> |   >              (($WCons (I# 4) (($WCons (I# 5) ($WNil)) `cast` ...))
> |  `cast` ...))
> |   >           `cast` ...))
> |   >       `cast` ...))
> |   >   `cast` ...
> |
> |  Where the constant 'c' is a single large constant. However, when I look
> |  at the output of the simplifier, 'ghc -O -fforce-recomp -fno-full-
> |  laziness -ddump-simpl -dsuppress-all GConst.hs', I see this:
> |
> |   > c10
> |   > c10 = I# 1
> |   >
> |   > c9
> |   > c9 = I# 2
> |   >
> |   > c8
> |   > c8 = I# 3
> |   >
> |   > c7
> |   > c7 = I# 4
> |   >
> |   > c6
> |   > c6 = I# 5
> |   >
> |   > c5
> |   > c5 = Cons @~ <0 + 1>_N c6 ($WNil)
> |   >
> |   > c4
> |   > c4 = Cons @~ <1 + 1>_N c7 (c5 `cast` ...)  >  > c3  > c3 = Cons @~
> |  <2 + 1>_N c8 (c4 `cast` ...)  >  > c2  > c2 = Cons @~ <3 + 1>_N c9 (c3
> |  `cast` ...)  >  > c1  > c1 = Cons @~ <4 + 1>_N c10 (c2 `cast` ...)  >
> |  > c  > c = c1 `cast` ...
> |
> |  The single constant is completely taken apart into multiple top-level
> |  bindings.
> |
> |  I haven't given it too much thought, but I assume there are good
> |  reasons to take large constants aparts, and break them into individual
> |  top-level bindings. At least when your target is a normal CPU.
> |
> |  Now, I'm a GHC API user, and I convert Haskell programs to digital
> |  circuits. For my use case, breaking up constants into smaller top-level
> |  bindings has completely no performance benefits at all. Actually, my
> |  compiler _inlines_ all those top-level bindings again to create a
> |  single large constant.
> |  When working with large constants, my compiler is actually taking an
> |  disproportionately large amount of time of doing the inverse of what
> |  the GHC simplifier did.
> |  I want to keep using the GHC simplifier, because it contains many
> |  optimisations that are usefull for my specific use-case.
> |
> |  So my question is: how can I stop the GHC simplifier from breaking up
> |  large constants into smaller top-level bindings?
> |  As you could see from the example, this "breaking-apart-constants" is
> |  not due to the FullLaziness transform, as I explicitly disabled it.
> |  If this "breaking-apart-constants" part is not (currently) controllable
> |  by a flag, would it be possible to add such a flag? I'm happy to work
> |  on a patch myself if someone could tell where about in the simplifier I
> |  would have to make some changes.
> |
> |  Thanks,
> |
> |  Christiaan
> |  _______________________________________________
> |  ghc-devs mailing list
> |  ghc-devs at haskell.org
> |  https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fmail.ha
> |  skell.org%2fcgi-bin%2fmailman%2flistinfo%2fghc-
> |  devs&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7cf244fbbdf6d946ae
> |  2b9408d353e9946c%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=v1daJzyAp
> |  Slqrw8MUAy57Z%2bIDpNGO1wT50X54%2fmTx38%3d
>