[GHC] #8763: forM_ [1..N] does not get fused (10 times slower than go function)
GHC
ghc-devs at haskell.org
Thu Mar 29 16:32:16 UTC 2018
#8763: forM_ [1..N] does not get fused (10 times slower than go function)
-------------------------------------+-------------------------------------
Reporter: nh2 | Owner: (none)
Type: bug | Status: new
Priority: normal | Milestone: 8.6.1
Component: Compiler | Version: 7.6.3
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #7206 | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by sgraf):
It seems that for `IO`, GHC decides that it's OK to inline `c` from the
[https://hackage.haskell.org/package/base-4.11.0.0/docs/src/GHC.Enum.html#efdtIntUpFB
fusion helper of enumFromThenTo], but not so for `ST s`.
For our case, `c` is the `<huge>` computation (see the worker `$wc` in
comment:44) performed for each outer list element and would be duplicated
by inlining: It's mentioned thrice in the definition of `efdtIntUpFB`.
Consequently, `c` has almost always `Guidance=NEVER`, except in the `IO`
case, where it miraculously gets `Guidance=IF_ARGS [20 420 0] 674 0` just
when it is inlined. Not sure what this decision is based on.
The inlining decision for `eftIntFB` is much easier: `c`
[https://hackage.haskell.org/package/base-4.11.0.0/docs/src/GHC.Enum.html#eftIntFB
only happens once there].
I'm not sure if `IO` gets special treatment by the inliner, but I see a
few ways out:
* Do the same hacks for `ST`, if there are any which apply (ugly)
* Reduce the number of calls to `c` in the implementation of
`efdtIntUpFB`, probably for worse branch prediction
* Figure out why the floated out expression of `\x -> (nop x *>)` occuring
in `forM_ nop = flip mapM_ nop = foldr ((>>) . nop) (return ())` doesn't
get eta-expanded in the `ST` case, whereas the relevant `IO` code is. I
hope that by fixing this, the `c` expression inlines again.
Here's how it inlines for `IO`:
{{{
(>>) . nop
= \x -> (nop x >>)
= \x -> (nop x *>) -- notice how it's no different than ST up until here
= \x -> (thenIO (nop x))
}}}
The inliner probably stops here, but because of eta-expansion modulo
coercions to `\x k s -> thenIO (nop x) k s`, we can inline
[https://hackage.haskell.org/package/base-4.11.0.0/docs/src/GHC.Base.html#thenIO
thenIO]:
{{{
\x k s -> thenIO (nop x) y s
= \x k s -> case nop x s of (# new_s, _ #) -> k new_s)
}}}
which is much better and probably more keenly inlined than `\x -> (nop x
*>)` in the `ST` case. What makes GHC eta-expand one, but not the other?
This is just a wild guess and the only real difference I could make out in
diffs. Maybe someone with actual insights into the simplifier can comment
on this claim (that the inliner gives up on `c` due to the missed eta-
expansion and inlining)?
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8763#comment:45>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list