UTF-8 library

George Russell ger@tzi.de
Tue, 06 Aug 2002 18:11:04 +0200


Axel wrote
[snip]
> I guess that is a good point, but due to backwards compatibility this is 
> propably not acceptable: The C interface of the FFI has the string 
> functions:
> peekCString :: CString -> IO String
> newCString :: String -> IO CString
> 
> which should really be
> 
> peekCString :: CString -> IO [Word8]
> newCString :: [Word8] -> IO CString
> 
> Unless that changes, there is really no point to give the encode and 
> decode functions that type.
[snip]
Such a change would be annoying, since I have already used peekCString and
newCString quite a lot.  (They are a great improvement on what we had before!)

Converting CStrings to [Word8] is probably a bad idea anyway, since there is
absolutely no reason to assume a C character will be only 8 bits long, and
under some implementations it isn't. 

A better suggestion would be to provide ALTERNATIVE functions which
got from CString/CStringLen and friends to [CChar], and make your UTF8
converters go between [CChar] and String.  However we should not be forced
to do this every time we want to construct a CString from a String (a very
common need when calling C functions) so the existing functions should remain
with their existing semantics.