Recursive functions and constant parameter closures
(inlining/strictness analyzer question)
Dan Doel
dan.doel at gmail.com
Fri Jun 20 17:04:46 EDT 2008
On Friday 30 May 2008, Duncan Coutts wrote:
> This is for two reasons. One is because your second foldl' is directly
> recursive so does not get inlined. The static argument transformation it
> what you're doing manually to turn the latter into the former. The SAT
> is implemented in ghc 6.9 (though it seems to be having integration
> problems).
Apologies for replying to this thread after around a month, but I've been
looking at performance of code with and without this transformation a bit
lately, and I'm rather curious what is being planned to go on in GHC 6.10
with regard to it.
Specifically, will GHC simply always perform the static argument transform, or
will it have some kind of heuristic to decide when it's useful? It seems,
according to some tests I've done, that it's not always a win, and is
sometimes a loss (and not just with regard to code duplication). I've been
hard pressed to come up with a definite rule for when it's a benefit, other
than simply testing and seeing what works.
For instance, consider some code from a recent benchmark I posted (I'm running
this all under 6.8.2):
bench :: Int -> Int -> ST s ()
bench (I# k) (I# n) = ST go
where
go s = case sizeOf (0 :: Int) of { I# w ->
case newByteArray# (n *# w) s of { (# s, arr #) ->
case fill arr n s of { s ->
go' arr k s } } }
go' arr 0# s = (# s, () #)
go' arr k s = case reverse arr 0# (n -# 1#) s of { s ->
go' arr (k -# 1#) s }
reverse :: MutableByteArray# s -> Int# -> Int# -> State# s -> State# s
reverse arr i j s
| i <# j = case readIntArray# arr i s of { (# s, ei #) ->
case readIntArray# arr j s of { (# s, ej #) ->
case writeIntArray# arr j ei s of { s ->
case writeIntArray# arr i ej s of { s ->
reverse arr (i +# 1#) (j -# 1#) s } } } }
| otherwise = s
The above is without the SAT on 'arr', obviously. This code runs fast. Doing
SAT on reverse, but not on go' causes lots of heap allocation (for high
iteration counts), slowing things way down. Doing it on go' but not reverse
yields results about the same as not doing it at all. Doing it on both go'
*and* reverse eliminates the heap allocation of the first case, *but* it
results in overall slower code (which may, in fact, invalidate my bug about
MBA# performance; I'll have to check. Just goes to show, even the most
straightforward benchmark may not be measuring what you intend to measure).
Another example is this function from the MBA# version of the fannkuch
benchmark:
shift :: MutableByteArray# s -> Int# -> State# s -> State# s
shift arr r s = case readIntArray# arr 0# s of { (# s, p0 #) ->
case go arr 0# r s of { s ->
writeIntArray# arr r p0 s } }
where
go arr i r s
| i <# r = case readIntArray# arr (i +# 1#) s of { (# s, e #) ->
case writeIntArray# arr i e s of { s ->
go arr (i +# 1#) r s } }
| otherwise = s
{-# INLINE shift #-}
Now, obviously, arr and r are static arguments for go (already in scope, no
less), so one would likely write go without them (it's certainly what I did
initially). However, eliminating either one of them causes heap allocation
and, thus, slowdowns (eliminating both is even worse).
So, I suppose my question is: will GHC be better at this than I am? :) Will it
know that performing the transform on shift above would cause extra heap
allocation/slowdowns? Will it know that transforming reverse (and go') is
slower than not doing so (I suppose it may be suspicious that this is all
low-level code, but I've tried fiddling with higher-level code that I know
gets compiled to this sort of code, and the above results seemed to hold)?
I'm also not yet skilled enough at reading assembly to figure out what exactly
is causing all this. Are values getting kicked out of registers or some such?
Cheers,
-- Dan
More information about the Glasgow-haskell-users
mailing list