Fun with GHC's optimiser

Fri, 8 Dec 2000 05:57:55 -0800

PArrays.$w$snewPArray
  = \ ww :: PrelGHC.Int# w :: PrelBase.Int ->
	case PrelGHC.newIntArray# @ PrelGHC.RealWorld ww PrelGHC.realWorld#
	of wild { (# s2#, mba# #) ->
	let {
	  eta :: PrelBase.Int
	  __A 0 __C
	  eta
	    = PrelBase.$wI# ww
	} in 
	  case PrelGHC.-# ww 1 of sat { __DEFAULT ->
	  case PrelGHC.># 0 sat of wild1 {
	      PrelBase.True ->
		  PArrays.$wPArray
		      @ PrelBase.Int eta (__coerce PrelGHC.ByteArray# mba#);
	      PrelBase.False ->
		  case w of w1 { PrelBase.I# ww1 ->
		  case PrelGHC.writeIntArray# @ PrelGHC.RealWorld mba# 0 ww1
s2#
		  of s2#1 { __DEFAULT ->
		  case PrelGHC.-# ww 1 of wild2 {
		      0 ->
			  PArrays.$wPArray
			      @ PrelBase.Int eta (__coerce
PrelGHC.ByteArray# mba#);
		      __DEFAULT ->
			  let {
			    a4 :: (PArrays.PArray PrelBase.Int)
			    __A 0 __C
			    a4 =
				PArrays.$wPArray
				    @ PrelBase.Int eta (__coerce
PrelGHC.ByteArray# mba#) } in
			  __letrec {
			    $wgo :: (PrelGHC.Int#
				     -> PrelGHC.State# PrelGHC.RealWorld
					-> (PrelGHC.State#
PrelGHC.RealWorld,
					    PArrays.PArray PrelBase.Int))
			    __A 2 __C
			    $wgo
			      = \ w2 :: PrelGHC.Int# w3 :: (PrelGHC.State#
PrelGHC.RealWorld) ->
				    case PrelGHC.writeIntArray# @
PrelGHC.RealWorld mba# w2 ww1 w3
				    of s2#2 { __DEFAULT ->
				    case PrelGHC.-# ww 1 of sat1 { __DEFAULT
->
				    case PrelGHC.==# w2 sat1 of wild11 {
					PrelBase.True -> (# s2#2, a4 #);
					PrelBase.False ->
					    case PrelGHC.+# w2 1 of sat2 {
__DEFAULT ->
					    $wgo sat2 s2#2
					    }
				    }
				    }
				    };
			  } in  case $wgo 1 s2#1 of wild3 { (# ds, r #) -> r
}
		  }
		  }
		  }
	  }
	  }
	}

| However, we came across one problem (read, lack of
| optimisation on GHC's part), which leads to tedious
| duplication of a lot of code in our array library.
| Basically, GHC does not recognise for tail recursive
| functions when certain arguments (accumulators maintained in
| a loop) can be unboxed.  This leads to massive overheads in
| our code.  Currently, we circumvent the inefficiency by
| having manually specialised versions of the loops for
| different accumulator types and using RULES to select them
| where appropriate (based on the type information).  I will
| send you some example code illustrating the problem soon.

Yes please.

Meanwhile, in the head, if you specify -O2 you'll get better
code for some of your loops, namely the ones that repeatedly
evaluate a free variable every time round the loop.  The 
loop is duplicated and everything is nice.  Here's a tiny example:

g :: Int -> Int
g x = let h 0 = 0
	  h y = if x==0 then y else h (y+1)
      in
      h 88 

This now compiles to:

LC.g
  = \ x :: PrelBase.Int ->
	case x of wild { PrelBase.I# x1 ->
	case x1 of wild1 {
	    0 -> PrelBase.$wI# 88;
	    __DEFAULT ->
		__letrec {
		  $wh :: (PrelGHC.Int# -> PrelBase.Int)
		  __A 1 __C
		  $wh
		    = \ ww :: PrelGHC.Int# ->
			  case ww of wild2 {
			      0 -> LC.lit;
			      __DEFAULT ->
				  case PrelGHC.+# wild2 1 of sat { __DEFAULT
-> $wh sat }
			  };
		} in  $wh 89
	}
	}

You mentioned something like this before.

The amount of code duplication is controlled by -fliberate-case-threshold20
(say)
The default is set pretty low.  And it all only happens with -O2.

All in the HEAD.  The pass is call simplCore/LiberateCase.

Simon