Avoiding CAF's

Fri May 18 09:38:34 EDT 2007

Hi Ian and Simon,

> Ian said:
> Does the boxing not get optimised out?
> Is the FFI imported function exported from the module?

http://hpaste.org/1882 (replicated at the end of this message in case
the hpaste is not around forever, but clearly layout and syntax
colouring)

Thats the main branch, which is the bit I want to make go faster, if
at all possible. The FFI call is not exported, I have module
Main(main) at the top. From what I can see, the function is being
called, then:

    case Main.$wccall GHC.Prim.realWorld# of wild_X28 { (# ds_d2ad,
ds1_d2ac #) ->

i.e. it has had an artificial box put around the answer. It may be
impossible to eliminate this, but if it is, I'd like to try.

The motivation for all this is:
http://neilmitchell.blogspot.com/2007/05/13-faster-than-ghc.html

> Simon said:
> That is indeed scary.  Would you like to give a small example of such a program?

>From the above example, you can note that the first argument to
Main.$sprelude_942_ll107 is an Int (v2_aVr), which is entirely ignored
on the recursive branch, and then on the terminating branch is case'd
in a pointless way (this case comes from a seq). If this parameter
could be removed, I suspect a speedup would result.

The reason this parameter is introduced comes from the code:

overlay_get_char h   = inlinePerformIO (getCharIO h)

foreign import ccall unsafe "stdio.h getchar" getchar :: IO CInt

{-# NOINLINE getCharIO #-}
getCharIO h = do
   c <- getchar
   return $ if c == (-1) then h `seq` (-1) else fromIntegral c

I have artifically threaded h through getCharIO, and deliberately
added a pointless seq, to ensure that the definition inside is not
floated up. If I remove the h `seq` then GHC removes the argument from
overlay_get_char, which turns that into a CAF, which then breaks the
required semantics.

I realise all of this trickery is against the spirit of a pure
functional language, and is making assumptions that are not required
to remain true. Right now I just want the fastest possible benchmarks
though.

Thanks

Neil