UTF-8 library

Axel Simon A.Simon@ukc.ac.uk
Tue, 6 Aug 2002 17:51:23 +0100


On Tue, Aug 06, 2002 at 06:11:04PM +0200, George Russell wrote:
[snip]
> Converting CStrings to [Word8] is probably a bad idea anyway, since there is
> absolutely no reason to assume a C character will be only 8 bits long, and
> under some implementations it isn't. 
But the interface should be practical. I do not really want to write 
Haskell programs for architectures where the smallest addressable memory 
entity (i.e. C's char) is something else than 8 bits.

> A better suggestion would be to provide ALTERNATIVE functions which
> got from CString/CStringLen and friends to [CChar], and make your UTF8
> converters go between [CChar] and String.  However we should not be forced
> to do this every time we want to construct a CString from a String (a very
> common need when calling C functions) so the existing functions should remain
> with their existing semantics.
But converting CChar to Char means you are assuming that the C String is 
ISO-8859-1, the lower 255 characters of Unicode. I guess this should be 
made explicit during conversion.

Axel.