Compiler optimizations questions for ghc 6.10...

Wed Feb 18 12:42:02 EST 2009

On February 18, 2009 04:29:42 Max Bolingbroke wrote:
> > The part of the code under the first lambda in digit is as follows (I
> > didn't keep the original dump, so the uniques have changed here).  It's
> > the second part of the Int overflow bounds check (i.e., y <=
> > (maxBound-x)`div`10), and, indeed, something you don't want to compute
> > unless the easy check fails.
>
> Yes - GHC wants to share the work of (maxBound-x)`div`10 between
> several partial applications of "digit". This is usually a good idea,
> but in this case it sucks because it's resulted in a massively
> increased arity. IMHO GHC should fix this by:
> * Marking divInt# INLINE in the base library. This would result in
> your code would just containing uses of quotInt#
> * Making some operations cheap even if they may fail
> (PrimOp.primpOpIsCheap should change). Though this might mean that we
> turn non-terminating programs into terminating ones (such operations
> get pushed inside lambdas) but this is consistent with our treatment
> of lambdas generally.

Reading your response got me thinking, "quot and div, I thought I had changed 
everything to quot because earlier I found that GHC was leaving evaluation of 
constant expressions like "maxBound-9`div`10" until runtime if I used div."

Turns out I had missed that one in the second bounds check, and changing it 
from "y <= (maxBound-x)`div`10" to "y <= (maxBound-x)`qout`10" resulted in GHC 
"just doing the right thing".  I presume this is because quot must be either 
marked INLINE and/or primOpIsCheap like you said div should be.

I am guess this is because quot maps directly onto the x86 idiv instruction 
due to both of them truncating towards zero, while div, with its truncation 
towards negative infinity, does not.  Running with -ddump-asm seems to back 
this up as quotInt# compiles down to an idiv instruction and divInt# to a call 
through base_GHCziBase_divIntzh_info.  Unfortunately for me, I always seem to 
instinctively go with div and mod ahead of quot and rem.

> Actually, your divInt# call wouldn't even usually be floated out to
> between two lambdas, but at the time FloatOut runs there is something
> in between the \x lambda and the lambdas from the state monad - the
> monadic bind operator! So FloatOut feels free to move the computation
> for "x" up even though that >>= will go away as soon as we run the
> simplifier. What a disaster!

I would like to try combining the GHC optimizer with a genetic algorithm so 
you could set it to pound away on your core loops for an hour or so to find 
the right sequence of ghc optimization steps to generate the tightest code.

Maybe it could then write out an optimization hint file that regular GHC could 
optionally take in for use alongside the standard rules of thumb to produce 
great code.  All I need is more time and more knowledge.  Maybe someday.  : )

> For me, making digit INLINE fixes all this, but that's probably
> because the context of my code is not the same as yours (I had to
> invent parts of a program to bind the bs, q and n variables).

Sorry about that.  I've put the entire routine at

http://www.sharcnet.ca/~tyson/haskell/Example1.hs

I should have done this in the first place as it allows me to provide the full 
code while also stripping it down in the emails to make them more readable.

> For your
> immediate problem, I would suggest this bit of GHC witchdoctory:
>
>                  where digit :: Int -> ParseInt Int Int
>                        digit !x = do !y <- lift get
>                                      ( if y <= (maxBound-9)`quot`10 ||
> y <= ghc_hacks
>                                        then let !y' = y*10+x in (lift
> $ put y') >> s2
>                                        else throwError "integer overflow" )
>                           where {-# INLINE ghc_hacks #-}
>                                 ghc_hacks = (maxBound-x)`div`10
>
> With luck, this should make the loop-invariant cheap in GHC's eyes,
> preventing it from trying to share it. It works for me - but like I
> say things may differ in your program.

I tried that as well, and am pleased to report that it works great.  I think I 
actually understand what is going on here and will hopefully be able to wield 
it in the future to my advantage.

Generally I've had very little luck with the INLINE hammer when the function 
is at all complex.  GHC always seems to finds a way to thwart my pathetic 
attempts and punish me with even worse code (like then not inlining the monad 
bind operation in the code resulting from the inlining).  : ) 

> If you are interested in trying my script, it's available via Git at
> http://github.com/batterseapower/scripts/blob/da1f24ba16c27e3994aa66f9db352
>ec1102c39d2/ghc-dump-split and is called as "ghc-dump-split ghc-dump-file",
> where ghc-dump-file results from redirection of GHC stdout+stderr to a
> file.

I'll grab a copy of that.  : )

> Hope all that helps,

It sure did.  Thanks very much for all your time!

I'm really impressed with GHC and the people hacking on it.  I've actually 
managed to get it to produce some truly tight assembler.  It's also great, 
unlike a lot of projects, having papers that you can read on many details.

Thanks again!  -Tyson
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20090218/8c172691/attachment.bin