UTF-8 library

Axel Simon A.Simon@ukc.ac.uk
Wed, 7 Aug 2002 17:18:06 +0100


On Wed, Aug 07, 2002 at 03:29:33PM +0200, George Russell wrote:
> Ashley Yakeley wrote
> [snip]
> > Text encoded with ISO 8859-1 or UTF-8 is octets. If you want to use 
> > CChars, you should then subsequently convert the Word8s into CChars.
> We were talking about converting CStrings, which are necessarily sequences of CChars.  
> I have to say I do not relish the prospect of replacing the current peekCString
> interface by three functions which 
> (1) translate a CString/CStringLen into [CChar]
> (2) translate a [CChar] into a [Word8]
> (3) translate a [Word8] into a String
> (and of course the inverse functions to go in the other direction.)
I guess you can avoid (2) or (3) in practice.

> If we want to do this, we certainly need to keep the existing functions since I
> certainly don't want to have to pass a String through three separate transformations
> just to make it suitable for C.
I don't see a problem with supplying a backward compatible withCString 
function, even it might use the current locale to do the conversion.

Let's just wait till someone actually has a ready-to-criticise library for 
ghc at hand.

Axel.