Recursive functions and constant parameter closures (inlining/strictness analyzer question)

Mon Jun 23 14:29:00 EDT 2008

On Monday 23 June 2008, Isaac Dupree wrote:
> there's no chance for the lower-level near code generation to
> reverse-SAT to eliminate the heap usage? (which would obviously be a
> different optimization that might be useful in other ways too, if it
> could be made to work) (did someone say that -fvia-C with gcc -O3
> managed to do that sometimes?)

I said something similar, but I don't actually know the details. What I have 
is a SAT and non-SAT version of a benchmark:

---- SAT ----

bench :: Int -> Int -> ST s ()
bench (I# k) (I# n) = ST go
 where
 go s = case sizeOf (0 :: Int)        of { I# w ->
        case newByteArray# (n *# w) s of { (# s, arr #) ->
        case fill arr n s             of { s ->
        go' arr k s } } }
 go' arr = go''
  where
  go'' 0# s = (# s, () #)
  go'' k  s = case reverse arr 0# (n -# 1#) s of { s ->
              go'' (k -# 1#) s }

reverse :: MutableByteArray# s -> Int# -> Int# -> State# s -> State# s
reverse arr = go
 where
 go i j s
   | i <# j    = case readIntArray#  arr i    s of { (# s, ei #) ->
                 case readIntArray#  arr j    s of { (# s, ej #) ->
                 case writeIntArray# arr j ei s of { s ->
                 case writeIntArray# arr i ej s of { s ->
                 go (i +# 1#) (j -# 1#) s } } } }
   | otherwise = s

---- non-SAT ----

bench :: Int -> Int -> ST s ()
bench (I# k) (I# n) = ST go
 where
 go s = case sizeOf (0 :: Int)        of { I# w ->
        case newByteArray# (n *# w) s of { (# s, arr #) ->
        case fill arr n s             of { s ->
        go' arr k s } } }
 go' arr 0# s = (# s, () #)
 go' arr k  s = case reverse arr 0# (n -# 1#) s of { s ->
                go' arr (k -# 1#) s }

reverse :: MutableByteArray# s -> Int# -> Int# -> State# s -> State# s
reverse arr i j s
  | i <# j    = case readIntArray#  arr i    s of { (# s, ei #) ->
                case readIntArray#  arr j    s of { (# s, ej #) ->
                case writeIntArray# arr j ei s of { s ->
                case writeIntArray# arr i ej s of { s ->
                reverse arr (i +# 1#) (j -# 1#) s } } } }
  | otherwise = s

In the SAT version, the native code generator is noticeably slower 
than -fvia-c -optc-O3, so the latter is doing something right. In the non-SAT 
version, -fvia-c is somewhat faster than the SAT version, but the NCG is just 
as fast. I don't know what's going on on the C end in the first case that 
makes it better, though (heap allocation is the same (low) in all cases; 
via-c doesn't fix large heap allocation when it happens in my experience).

As an aside, I finally got around to compiling 6.9 from a day or so ago, and 
noticed that one of the functions I'd mentioned earlier does get SATed to the 
bad version (which causes lots of heap allocation in the program) because it 
has two static arguments. However, marking it NOINLINE seemed to keep it from 
having the negative effects. Does that pragma keep SAT from doing its thing, 
and if so, will that continue to be viable in cases where we decide we know 
better than GHC?

Cheers,
-- Dan