[Haskell-cafe] Re: Optimising UTF8-CString -> String marshaling, plus comments on withCStringLen/peekCStringLen

Simon Marlow simonmarhaskell at gmail.com
Tue Jul 24 04:26:09 EDT 2007


Bayley, Alistair wrote:
>> From: haskell-cafe-bounces at haskell.org 
>> [mailto:haskell-cafe-bounces at haskell.org] On Behalf Of Stefan O'Rear
>>
>> fromUTF8Ptr unboxes fine for me with HEAD and 6.6.1.
>>
>>> - the chr function tests that its Int argument is less than 1114111,
>>>   before constructing the Char. It'd be nice to avoid this test.
>> You want unsafeChr from the (undocumented) GHC.Base module.
>> http://darcs.haskell.org/ghc-6.6/packages/base/GHC/Base.lhs for
>> reference (but don't copy the file, it's already an 
>> importable module).
> 
>> <odd duplicated simplifier output)
>> ISTR seeing a bug report about this a while back, we know it's dumb.
>> You could probably use x < 0xF8 instead.
> 
> FWIW,
> 
> I've optimised this to a point where I'm happy with it, and you can see
> the results here:
>   http://darcs.haskell.org/takusen/Foreign/C/UTF8.hs

In that code you have:

       | x <= 0x0010FFFF   -- should be 0x001FFFFF

I wasn't aware that the largest unicode code point had changed.  Do you 
have a reference?  Should we change the range of Char in GHC?

Cheers,
	Simon


More information about the Haskell-Cafe mailing list