Recursive functions and constant parameter closures
(inlining/strictness analyzer question)
Dan Doel
dan.doel at gmail.com
Mon Jun 23 14:29:00 EDT 2008
On Monday 23 June 2008, Isaac Dupree wrote:
> there's no chance for the lower-level near code generation to
> reverse-SAT to eliminate the heap usage? (which would obviously be a
> different optimization that might be useful in other ways too, if it
> could be made to work) (did someone say that -fvia-C with gcc -O3
> managed to do that sometimes?)
I said something similar, but I don't actually know the details. What I have
is a SAT and non-SAT version of a benchmark:
---- SAT ----
bench :: Int -> Int -> ST s ()
bench (I# k) (I# n) = ST go
where
go s = case sizeOf (0 :: Int) of { I# w ->
case newByteArray# (n *# w) s of { (# s, arr #) ->
case fill arr n s of { s ->
go' arr k s } } }
go' arr = go''
where
go'' 0# s = (# s, () #)
go'' k s = case reverse arr 0# (n -# 1#) s of { s ->
go'' (k -# 1#) s }
reverse :: MutableByteArray# s -> Int# -> Int# -> State# s -> State# s
reverse arr = go
where
go i j s
| i <# j = case readIntArray# arr i s of { (# s, ei #) ->
case readIntArray# arr j s of { (# s, ej #) ->
case writeIntArray# arr j ei s of { s ->
case writeIntArray# arr i ej s of { s ->
go (i +# 1#) (j -# 1#) s } } } }
| otherwise = s
---- non-SAT ----
bench :: Int -> Int -> ST s ()
bench (I# k) (I# n) = ST go
where
go s = case sizeOf (0 :: Int) of { I# w ->
case newByteArray# (n *# w) s of { (# s, arr #) ->
case fill arr n s of { s ->
go' arr k s } } }
go' arr 0# s = (# s, () #)
go' arr k s = case reverse arr 0# (n -# 1#) s of { s ->
go' arr (k -# 1#) s }
reverse :: MutableByteArray# s -> Int# -> Int# -> State# s -> State# s
reverse arr i j s
| i <# j = case readIntArray# arr i s of { (# s, ei #) ->
case readIntArray# arr j s of { (# s, ej #) ->
case writeIntArray# arr j ei s of { s ->
case writeIntArray# arr i ej s of { s ->
reverse arr (i +# 1#) (j -# 1#) s } } } }
| otherwise = s
In the SAT version, the native code generator is noticeably slower
than -fvia-c -optc-O3, so the latter is doing something right. In the non-SAT
version, -fvia-c is somewhat faster than the SAT version, but the NCG is just
as fast. I don't know what's going on on the C end in the first case that
makes it better, though (heap allocation is the same (low) in all cases;
via-c doesn't fix large heap allocation when it happens in my experience).
As an aside, I finally got around to compiling 6.9 from a day or so ago, and
noticed that one of the functions I'd mentioned earlier does get SATed to the
bad version (which causes lots of heap allocation in the program) because it
has two static arguments. However, marking it NOINLINE seemed to keep it from
having the negative effects. Does that pragma keep SAT from doing its thing,
and if so, will that continue to be viable in cases where we decide we know
better than GHC?
Cheers,
-- Dan
More information about the Glasgow-haskell-users
mailing list