[Haskell-cafe] Optimising UTF8-CString -> String marshaling, plus comments on withCStringLen/peekCStringLen

Stefan O'Rear stefanor at cox.net
Mon Jul 23 01:05:55 EDT 2007


On Mon, Jun 04, 2007 at 09:43:32AM +0100, Alistair Bayley wrote:
> (The docs tell me that using GHC.Exts is the "approved" way of
> accessing GHC-specific extensions, but all of the useful stuff seems
> to be in GHC.Prim.)

All of the useful stuff *is* exported from GHC.Exts, it even says so in the haddock:

   Synopsis
   ...
   module GHC.Prim

That is, GHC.Exts exports everything GHC.Prim does.  Standard H98
re-export syntax.  Besides, user code can't import GHC.Prim at all in
GHC HEADs newer than a couple months (arguably a bug, but it only breaks
bad code, so...)

> Some things I've noticed in the simplifier output:
> - the shiftL call hasn't unboxed or inlined into a call to
>   uncheckedShiftL#, which I would prefer.
>   Would this be possible if we added unchecked versions of
>   the shiftL/R functions to Data.Bits?
> - Ptrs don't get unboxed. Why is this? Some IO monad thing?

fromUTF8Ptr unboxes fine for me with HEAD and 6.6.1.

> - the chr function tests that its Int argument is less than 1114111,
>   before constructing the Char. It'd be nice to avoid this test.

You want unsafeChr from the (undocumented) GHC.Base module.
http://darcs.haskell.org/ghc-6.6/packages/base/GHC/Base.lhs for
reference (but don't copy the file, it's already an importable module).

> - why does this code:
>
>      | x <= 0xF7 = remaining 3 (bAND x 0x07) xs
>      | otherwise = err x
>
>   turn into this
>   i.e. the <= turns into two identical case-branches, using eqword#
>   and ltword#, rather than one case-branch using leword# ?
>
>  case GHC.Prim.eqWord# a11_a2PJ __word 247 of wild25_X2SU {
>    GHC.Base.False ->
>      case GHC.Prim.ltWord# a11_a2PJ __word 247 of wild6_Xcw {
>        GHC.Base.False -> <error call>
>        GHC.Base.True ->
>          $wremaining_r3dD
>            3
>            (__scc {fromUTF8 main:Foreign.C.UTF8 !}
>             GHC.Base.I# (GHC.Prim.word2Int# (GHC.Prim.and# a11_a2PJ __word
> 7)))
>            xs_aVm
>      };
>    GHC.Base.True ->
>      $wremaining_r3dD
>        3
>        (__scc {fromUTF8 main:Foreign.C.UTF8 !}
>         GHC.Base.I# (GHC.Prim.word2Int# (GHC.Prim.and# a11_a2PJ __word 7)))
>        xs_aVm
>  };

ISTR seeing a bug report about this a while back, we know it's dumb.
You could probably use x < 0xF8 instead.

> BTW, what's the difference between the indexXxxxOffAddr# and
> readXxxxOffAddr# functions in GHC.Prim? AFAICT they are equivalent,
> except that the read* functions take an extra State# s parameter.
> Presumably this is to thread the IO monad's RealWorld value through,
> to create some sort of data dependency between the functions (and so
> to ensure ordered evaluation?)

Exactly.  readFoo won't be reordered, indexFoo will - which matters when
doing reads and writes at addresses that might alias.

Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20070722/ba286b28/attachment.bin


More information about the Haskell-Cafe mailing list