[Haskell-cafe] RE: Optimising UTF8-CString -> String marshaling, plus comments on withCStringLen/peekCStringLen

Tue Jul 24 04:39:32 EDT 2007

> From: Simon Marlow [mailto:simonmarhaskell at gmail.com] 
> >   http://darcs.haskell.org/takusen/Foreign/C/UTF8.hs
> 
> In that code you have:
> 
>        | x <= 0x0010FFFF   -- should be 0x001FFFFF
> 
> I wasn't aware that the largest unicode code point had 
> changed.  Do you 
> have a reference?  Should we change the range of Char in GHC?

No, that's merely a bad comment. I think it was meant to refer to that
fact that the UTF8 encoding will permit codepoints up to 0x001FFFFF with
4 bytes, so if the decoder was to handle the full UTF8 range (up to 6
bytes) then this test would read:

>        | x <= 0x001FFFFF

Alistair
*****************************************************************
Confidentiality Note: The information contained in this message,
and any attachments, may contain confidential and/or privileged
material. It is intended solely for the person(s) or entity to
which it is addressed. Any review, retransmission, dissemination,
or taking of any action in reliance upon this information by
persons or entities other than the intended recipient(s) is
prohibited. If you received this in error, please contact the
sender and delete the material from any computer.
*****************************************************************