[Haskell-cafe] Optimising UTF8-CString -> String marshaling, plus comments on withCStringLen/peekCStringLen

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Mon Jun 4 08:33:45 EDT 2007


On Mon, 2007-06-04 at 13:12 +0100, Alistair Bayley wrote:

> > > BTW, what's the difference between the indexXxxxOffAddr# and
> > > readXxxxOffAddr# functions in GHC.Prim?
> >
> > Right. So it'd only be safe to use the index ones on immutable arrays
> > because there's no way to enforce sequencing with respect to array
> > writes when using the index version.
> 
> In this case I'm reading from a CString buffer, which is (hopefully)
> not changing during the function invocation, and never written to by
> my code. So presumably it'd be pretty safe to use the index-
> functions.

Yes.

> > >  - Ptrs don't get unboxed. Why is this? Some IO monad thing?
> >
> > Got any more detail?
> 
> OK. readUTF8Char's transformation starts with this:
> 
> $wreadUTF8Char_r3de =
>   \ (ww_s33v :: GHC.Prim.Int#) (w_s33x :: GHC.Ptr.Ptr GHC.Word.Word8) ->
> 
> If we expect it to unbox, I'd expect the Ptr to become Addr#. Later,
> this (w_s33x) gets unboxed just before it's used:
> 
>       case w_s33x of wild6_a2JM { GHC.Ptr.Ptr a_a2JO ->
>       case GHC.Prim.readWord8OffAddr# @ GHC.Prim.RealWorld a_a2JO 1 s_a2Jf
> 
> readUTF8Char is called by fromUTF8Ptr, where there's a little Ptr
> arithmetic. The Ptr argument to fromUTF8Ptr is unboxed, offset is
> added, and the result is reboxed so that it can be consumed by
> readUTF8Char. All a bit unnecessary, I think e.g.

Are you sure fromUTF8Ptr is strict in its ptr arg? Try with a ! pattern
on that arg. You'll need -fbang-patterns. That translates into the seq
False trick that oy're already using elsewhere. Experimenting by
adding ! patterns is much quicker and easier however. Once you've got
the right set of strictness annotations you can go back to using the
more portable, but ugly seq False trick.

You can also get ghc to tell you what strictness it inferred for your
functions. It's shown in the .hi file. Use ghc --show-iface UTF8.hi. I
think the "UL" syntax for describing the strictness is described in the
GHC manual somewhere (or perhaps it's on the GHC wiki).

Duncan



More information about the Haskell-Cafe mailing list