Fun with GHC's optimiser

Manuel M. T. Chakravarty chak@cse.unsw.edu.au
Fri, 15 Dec 2000 18:24:04 +1100


Simon Peyton-Jones <simonpj@microsoft.com> wrote,

> | However, we came across one problem (read, lack of
> | optimisation on GHC's part), which leads to tedious
> | duplication of a lot of code in our array library.
> | Basically, GHC does not recognise for tail recursive
> | functions when certain arguments (accumulators maintained in
> | a loop) can be unboxed.  This leads to massive overheads in
> | our code.  Currently, we circumvent the inefficiency by
> | having manually specialised versions of the loops for
> | different accumulator types and using RULES to select them
> | where appropriate (based on the type information).  I will
> | send you some example code illustrating the problem soon.
> 
> Yes please.

I have found a way of rephrasing the definition so that it
is properly optimised by GHC.  However, I think, it should
be possible to do this automatically and it is maybe not
unlike the optimisation done by simplCore/LiberateCase.

The code I would like to write is, for example, the
following

  import PrelGHC
  import PrelBase
  import PrelST

  fill :: MutableByteArray# s
       -> (acc -> Int)
       -> (acc -> acc)
       -> Int
       -> acc
       -> ST s acc
  {-# INLINE fill #-}
  fill mba# f g (I# n#) start = fill0 0# start
    where
      fill0 i# acc | i# ==# n# = return acc
		   | otherwise = do
				   writeIntArray mba# (I# i#) (f acc)
				   fill0 (i# +# 1#) (g acc)

  writeIntArray :: MutableByteArray# s -> Int -> Int -> ST s ()
  {-# INLINE writeIntArray #-}
  writeIntArray mba# (I# i#) (I# e#) = ST $ \s# ->
    case writeIntArray# mba# i# e# s#  of {s2#   ->
    (# s2#, () #)}

  foo mba# n = fill mba# id (+1) 1000 0

The interesting part is the handling of the accumulator.
After inlining `fill' into `foo', it becomes obvious that
the accumulator can be maintained as an unboxed integer.
Unfortunately, it is not obvious to GHC, which generates the
following (this is just the inlined `fill0' loop):

	__letrec {
	  $wfill0 :: (PrelGHC.Int#
		      -> PrelBase.Int
			 -> PrelGHC.State# s -> (PrelGHC.State# s, PrelBase.Int))
	  __A 3 __C
	  $wfill0
	    = \ w2 :: PrelGHC.Int#
		w3 :: PrelBase.Int
		w4 :: (PrelGHC.State# s) ->
		  case w2 of wild {
		      1000 -> (# w4, w3 #);
		      __DEFAULT ->
			  case w3 of wild1 { PrelBase.I# e# ->
			  case PrelGHC.writeIntArray# @ s w wild e# w4 of s2# { __DEFAULT ->
			  case PrelGHC.+# e# 1 of a { __DEFAULT ->
			  let {
			    sat :: PrelBase.Int
			    __A 0 __C
			    sat
			      = PrelBase.$wI# a
			  } in 
			    case PrelGHC.+# wild 1 of sat1 { __DEFAULT ->
			    $wfill0 sat1 sat s2#
			    }
			  }
			  }
			  }
		  };
	} in  $wfill0 0 Test.lit w1

The accumulator (w3) is unboxed immediately before the
writeIntArray# and its next value put into a box (sat) -
only to be unboxed immediately again in the next loop
iteration.

This would make perfect sense when the definition of `foo'
were

  foo mba# n = fill mba# id plus 1000 0
    where
      plus 0 = error "Die horribly"
      plus x = x + 1

I also appreciate that, if the loop is executed zero times,
the initial value of `acc' is not demanded.  But this is not
much different to the case handled by simplCore/LiberateCase.

And indeed with a little help, GHC generates much better
code.  In the following, I rewrote `fill' to explicitly test
for input values that make the loop execute zero times:

  fill mba# f g (I# 0#) start = return start
  fill mba# f g (I# n#) start = fill0 0# start
    where
      fill0 i# acc = do
		       writeIntArray mba# (I# i#) (f acc)
		       let i'#  = i# +# 1#
			   acc' = g acc
		       if i'# ==# n# then return acc' else fill0 i'# acc'

Now, `acc' is guaranteed to be used in each invocation of
`fill0' and GHC generates:

	__letrec {
	  $wfill0 :: (PrelGHC.Int#
		      -> PrelGHC.Int#
			 -> PrelGHC.State# s -> (PrelGHC.State# s, PrelBase.Int))
	  __A 3 __C
	  $wfill0
	    = \ w2 :: PrelGHC.Int#
		ww :: PrelGHC.Int#
		w3 :: (PrelGHC.State# s) ->
		  case PrelGHC.writeIntArray# @ s w w2 ww w3 of s2# { __DEFAULT ->
		  case PrelGHC.+# w2 1 of wild {
		      1000 ->
			  case PrelGHC.+# ww 1 of a { __DEFAULT ->
			  let {
			    a1 :: PrelBase.Int
			    __A 0 __C
			    a1
			      = PrelBase.$wI# a
			  } in  (# s2#, a1 #)
			  };
		      __DEFAULT ->
			  case PrelGHC.+# ww 1 of sat { __DEFAULT -> $wfill0 wild sat s2# }
		  }
		  };
	} in  $wfill0 0 0 w1

A nice tight loop.

However, the initial version of `fill' is the more natural
one to write.  I think, it should be possible to derive the
second version (are at least a similar version)
automatically from the initial code.  The derivation might
go roughly as follows:

    fill mba# f g (I# n#) start = fill0 0# start
      where
	fill0 i# acc = case i# ==# n# of
		         True  -> return acc
		         False -> do
				    writeIntArray mba# (I# i#) (f acc)
				    fill0 (i# +# 1#) (g acc)

  === {pull case out of fill0 (ie, partial unfolding)}

    fill mba# f g (I# n#) start = 
      case 0# ==# n# of
	True  -> return start
	False -> fill0 0# start
      where
	fill0 i# acc = do
		         writeIntArray mba# (I# i#) (f acc)
			 case (i# +# 1#) ==# n# of
			   True  -> return (g acc)
			   False -> fill0 (i# +# 1#) (g acc)

This is essentially the recursive variant of a well known
law for while loops:

    while p do q;
  ===
    if p then do q while p;

Wouldn't this actually subsume the liberate case rule?

    f = \ t -> case v of
		   V a b -> a : f t

  === {pull out the case}

    f = case v of
          V a b -> f = \ t -> a : case v of
			            V a b -> f t

  === {simplification}

    f = case v of
          V a b -> f = \ t -> a : f t

This might be more complicated to implement, as we only
partially unfold the recursive function, but it also has
more scope.

What do you think?

Cheers,
Manuel

PS: All Core code was generated with the HEAD from two days ago.