UTF-8 library

George Russell ger@tzi.de
Wed, 07 Aug 2002 15:29:33 +0200


Ashley Yakeley wrote
[snip]
> Text encoded with ISO 8859-1 or UTF-8 is octets. If you want to use 
> CChars, you should then subsequently convert the Word8s into CChars.
We were talking about converting CStrings, which are necessarily sequences of CChars.  
I have to say I do not relish the prospect of replacing the current peekCString
interface by three functions which 
(1) translate a CString/CStringLen into [CChar]
(2) translate a [CChar] into a [Word8]
(3) translate a [Word8] into a String
(and of course the inverse functions to go in the other direction.)

If we want to do this, we certainly need to keep the existing functions since I
certainly don't want to have to pass a String through three separate transformations
just to make it suitable for C.