debugging why we end up calling the wrapper rather than the worker

Mon Jun 4 09:50:12 EDT 2007

On Mon, 2007-06-04 at 14:01 +0100, Simon Peyton-Jones wrote:
> | But that allows it to be inlined in phase 0, and that's exactly what I
> | don't want. I really do not want this function inlined, I want it to be
> | a join point.
> 
> Remind me why you really don't want it inlined, ever?  Even if it's small etc.

I'm partitioning a fast path and a slow path. I don't want an extra copy
of the code hanging around just because of the slow path. So I want both
the fast and slow paths to share a single copy of the object code for
doing the writing to memory. Then the fast path is just a jump to this
code, and the slow path calls a bunch of other out of line functions to
fix things up before jumping.

So there is no advantage to inlining here, except call overhead, but
that should be low too. In fact if it's not impossible to imaging that
the jump to the function could be combined with the conditional test &
jump, rather than it being a conditional test & jump followed by an
unconditional jump in the fast path.

The code was:

write :: Int -> (Ptr Word8 -> IO ()) -> Put ()
write !n body = Put $ \c buf@(Buffer fp o u l) ->
  if n <= l
    then write' c fp o u l  --fast path
    else write' (flushOld c n fp o u) (newBuffer c n) 0 0 0

  where {- NOINLINE write' -}
        write' c !fp !o !u !l =
          -- warning: this is a tad hardcore
          B.inlinePerformIO
            (withForeignPtr fp
              (\p -> body $! (p `plusPtr` (o+u))))
          `seq` c () (Buffer fp o (u+n) (l-n))

where 'body' is an IO function that writes half a dozen bytes into a
memory block.

Duncan