Fun with GHC's optimiser
Manuel M. T. Chakravarty
chak@cse.unsw.edu.au
Fri, 15 Dec 2000 18:24:04 +1100
Simon Peyton-Jones <simonpj@microsoft.com> wrote,
> | However, we came across one problem (read, lack of
> | optimisation on GHC's part), which leads to tedious
> | duplication of a lot of code in our array library.
> | Basically, GHC does not recognise for tail recursive
> | functions when certain arguments (accumulators maintained in
> | a loop) can be unboxed. This leads to massive overheads in
> | our code. Currently, we circumvent the inefficiency by
> | having manually specialised versions of the loops for
> | different accumulator types and using RULES to select them
> | where appropriate (based on the type information). I will
> | send you some example code illustrating the problem soon.
>
> Yes please.
I have found a way of rephrasing the definition so that it
is properly optimised by GHC. However, I think, it should
be possible to do this automatically and it is maybe not
unlike the optimisation done by simplCore/LiberateCase.
The code I would like to write is, for example, the
following
import PrelGHC
import PrelBase
import PrelST
fill :: MutableByteArray# s
-> (acc -> Int)
-> (acc -> acc)
-> Int
-> acc
-> ST s acc
{-# INLINE fill #-}
fill mba# f g (I# n#) start = fill0 0# start
where
fill0 i# acc | i# ==# n# = return acc
| otherwise = do
writeIntArray mba# (I# i#) (f acc)
fill0 (i# +# 1#) (g acc)
writeIntArray :: MutableByteArray# s -> Int -> Int -> ST s ()
{-# INLINE writeIntArray #-}
writeIntArray mba# (I# i#) (I# e#) = ST $ \s# ->
case writeIntArray# mba# i# e# s# of {s2# ->
(# s2#, () #)}
foo mba# n = fill mba# id (+1) 1000 0
The interesting part is the handling of the accumulator.
After inlining `fill' into `foo', it becomes obvious that
the accumulator can be maintained as an unboxed integer.
Unfortunately, it is not obvious to GHC, which generates the
following (this is just the inlined `fill0' loop):
__letrec {
$wfill0 :: (PrelGHC.Int#
-> PrelBase.Int
-> PrelGHC.State# s -> (PrelGHC.State# s, PrelBase.Int))
__A 3 __C
$wfill0
= \ w2 :: PrelGHC.Int#
w3 :: PrelBase.Int
w4 :: (PrelGHC.State# s) ->
case w2 of wild {
1000 -> (# w4, w3 #);
__DEFAULT ->
case w3 of wild1 { PrelBase.I# e# ->
case PrelGHC.writeIntArray# @ s w wild e# w4 of s2# { __DEFAULT ->
case PrelGHC.+# e# 1 of a { __DEFAULT ->
let {
sat :: PrelBase.Int
__A 0 __C
sat
= PrelBase.$wI# a
} in
case PrelGHC.+# wild 1 of sat1 { __DEFAULT ->
$wfill0 sat1 sat s2#
}
}
}
}
};
} in $wfill0 0 Test.lit w1
The accumulator (w3) is unboxed immediately before the
writeIntArray# and its next value put into a box (sat) -
only to be unboxed immediately again in the next loop
iteration.
This would make perfect sense when the definition of `foo'
were
foo mba# n = fill mba# id plus 1000 0
where
plus 0 = error "Die horribly"
plus x = x + 1
I also appreciate that, if the loop is executed zero times,
the initial value of `acc' is not demanded. But this is not
much different to the case handled by simplCore/LiberateCase.
And indeed with a little help, GHC generates much better
code. In the following, I rewrote `fill' to explicitly test
for input values that make the loop execute zero times:
fill mba# f g (I# 0#) start = return start
fill mba# f g (I# n#) start = fill0 0# start
where
fill0 i# acc = do
writeIntArray mba# (I# i#) (f acc)
let i'# = i# +# 1#
acc' = g acc
if i'# ==# n# then return acc' else fill0 i'# acc'
Now, `acc' is guaranteed to be used in each invocation of
`fill0' and GHC generates:
__letrec {
$wfill0 :: (PrelGHC.Int#
-> PrelGHC.Int#
-> PrelGHC.State# s -> (PrelGHC.State# s, PrelBase.Int))
__A 3 __C
$wfill0
= \ w2 :: PrelGHC.Int#
ww :: PrelGHC.Int#
w3 :: (PrelGHC.State# s) ->
case PrelGHC.writeIntArray# @ s w w2 ww w3 of s2# { __DEFAULT ->
case PrelGHC.+# w2 1 of wild {
1000 ->
case PrelGHC.+# ww 1 of a { __DEFAULT ->
let {
a1 :: PrelBase.Int
__A 0 __C
a1
= PrelBase.$wI# a
} in (# s2#, a1 #)
};
__DEFAULT ->
case PrelGHC.+# ww 1 of sat { __DEFAULT -> $wfill0 wild sat s2# }
}
};
} in $wfill0 0 0 w1
A nice tight loop.
However, the initial version of `fill' is the more natural
one to write. I think, it should be possible to derive the
second version (are at least a similar version)
automatically from the initial code. The derivation might
go roughly as follows:
fill mba# f g (I# n#) start = fill0 0# start
where
fill0 i# acc = case i# ==# n# of
True -> return acc
False -> do
writeIntArray mba# (I# i#) (f acc)
fill0 (i# +# 1#) (g acc)
=== {pull case out of fill0 (ie, partial unfolding)}
fill mba# f g (I# n#) start =
case 0# ==# n# of
True -> return start
False -> fill0 0# start
where
fill0 i# acc = do
writeIntArray mba# (I# i#) (f acc)
case (i# +# 1#) ==# n# of
True -> return (g acc)
False -> fill0 (i# +# 1#) (g acc)
This is essentially the recursive variant of a well known
law for while loops:
while p do q;
===
if p then do q while p;
Wouldn't this actually subsume the liberate case rule?
f = \ t -> case v of
V a b -> a : f t
=== {pull out the case}
f = case v of
V a b -> f = \ t -> a : case v of
V a b -> f t
=== {simplification}
f = case v of
V a b -> f = \ t -> a : f t
This might be more complicated to implement, as we only
partially unfold the recursive function, but it also has
more scope.
What do you think?
Cheers,
Manuel
PS: All Core code was generated with the HEAD from two days ago.