Fun with GHC's optimiser
Simon Peyton-Jones
simonpj@microsoft.com
Thu, 7 Dec 2000 09:17:44 -0800
Manuel
OK, I've improved the eta expander so it does a better
job of the code you found.
Try now. On the HEAD. Which should compile OK now, incidentally.
How's your parallel Haskell stuff coming?
Simon
| -----Original Message-----
| From: Manuel M. T. Chakravarty [mailto:chak@cse.unsw.edu.au]
| Sent: 02 November 2000 13:54
| To: Simon Peyton-Jones
| Cc: glasgow-haskell-users@haskell.org
| Subject: RE: Fun with GHC's optimiser
|
|
| Simon Peyton-Jones <simonpj@microsoft.com> wrote,
|
| > I can never resist messages like these, even when I'm meant
| > to be doing other things.
|
| That's good to know ;-)
|
| > It's very helpful when people offer
| > fairly precise performance-bug reports. Thanks!
|
| Thanks for the prompt reply! To SimonM, too.
|
| > | I am wondering whether there is a particular reason why the
| > | optimiser doesn't pull the
| > |
| > | (1) a = NO_CCS PArray! [wild1 mba#];
| >
| > This one is a definite bug. It turns out that the head of the
| > before-ghci-branch doesn't have this bug, so I'm disinclined
| > to investigate it further.
|
| No problem, I can always build a new ghc from CVS.
|
| > | (2) case w of wild3 {
| > | I# e# ->
| > |
| > | As for (2), the loop would be nice and straight if that
| > | unboxing where outside of the loop - as it is, we break the
| > | pipeline once per iteration it seems
| >
| > This one is a bit harder. Basically we want to make a wrapper
| > for a recursive function if it's sure to evaluate its free
| variables.
| >
| > In fact the 'liberate-case' pass (which isn't switched on in 4.08)
| > is meant to do just this. It's in simplCore/LiberateCase.lhs,
| > and it's not very complicated. I've just tried it and it
| doesn't seem
| > to have the desired effect, but I'm sure that's for a boring reason.
| > If anyone would like to fix it, go ahead!
|
| Ok - something for a rainy weekend, I guess...
|
| > Incidentally, you'll find that -ddump-simpl gives you a dump that
| > is pretty close to STG and usually much more readable. Most
| > performance bugs show up there. -dverbose-simpl gives you more
| > clues about what is happening where.
|
| Good to know. I just wasn't sure whether I wouldn't miss
| out on some optimisations.
|
| > | Also if somebody is looking at the attached source, I was
| > | wondering why, when I use the commented out code in
| > | `newPArray', I get a lot worse code (the STG code is in a
| > | comment at the end of the file). In particular, the lambda
| > | abstraction is not inlined, whereas `fill' gets inlined into
| > | the code of which the dump is above. Is it because the
| > | compiler has a lot harder time with explicit recursion than
| > | with fold/build? If so, the right RULES magic should allow
| > | me to do the same for my own recursively defined
| > | combinators, shouldn't it?
| >
| > I couldn't figure out exactly what you meant. The only commented
| > out code is STG code. Maybe send a module with the actual
| > source you are bothered about.
|
| Appended the version of the module where the code is
| activated (it was commented out with --) - you'll find the
| problematic code by searching for (**) in the code. It
| generates
|
| $w$snewPArray
| = \ ww :: Int# w :: Int ->
| case newIntArray# @ RealWorld ww realWorld#
| of wild { (# s2#, mba# #) ->
| case $wsimpleGen
| ww
| (\ i :: Int ->
| case i of wild1 { I# i# ->
| case w of wild2 { I# e# ->
| __coerce (ST RealWorld ())
| (\ s# :: (State# RealWorld) ->
| case writeIntArray# @ RealWorld mba# i# e# s#
| of s2#1 { __DEFAULT ->
| (# s2#1, () #)
| })
| }
| })
| s2#
| of wild1 { (# new_s, r #) ->
| let {
| a :: Int
| a = $wI# ww
| } in (# a, (__coerce ByteArray# mba#) #)
| }
| }
|
| One thing that is not nice is in the lambda abstraction
| based to $wsimpleGen. There is one lambda, then two
| unboxing operations and another lambda for the state
| variables s#. These two nested lambda's become in STG
|
| stg_c1Op =
| NO_CCS[] \r[i]
| case i of wild1 {
| I# i# ->
| case w of wild2 {
| I# e# ->
| let {
| stg_c1Sv =
| NO_CCS[] \r[s#]
| case
| writeIntArray# [mba# i# e# s#] of s2#1 {
| DEFAULT ->
| (#,#) [s2#1 ()]
| };
| } in stg_c1Sv;
| };
| };
|
| So, stg_c1Sv is allocated and immediately entered. I would
| have hoped that the two lambda abstractions are merged into
| something like this:
|
| (\ i :: Int s# :: (State# RealWorld) ->
| case i of wild1 { I# i# ->
| case w of wild2 { I# e# ->
| case writeIntArray# @ RealWorld mba# i# e# s#
| of s2#1 { __DEFAULT -> (# s2#1, () #)
| })
| }
| })
|
| Maybe it is the __coerce, which prevents this from
| happening?
|
| Also, I am wondering why in the case of the
|
| foldr (fill mpa e) (return $ unsafeFreezeMPArray mpa) [0..n-1])
|
| the function `fill' gets inlined into the foldr loop, but in
| the above, the lambda abstraction gets not inlined into
| $wsimpleGen. Maybe it is because $wsimpleGen itself gets
| not inlined...
|
| Cheers,
| Manuel
|