UTF-8 library
Ashley Yakeley
ashley@semantic.org
Tue, 6 Aug 2002 16:53:42 -0700
At 2002-08-06 05:38, John Meacham wrote:
>One major nit I have with this is the type signature of
>decodeUTF8 and encodeUTF8
>a String should always represent a string of characters, not a byte
>stream, the signatures should be
>
>decodeUTF8 :: String -> [Word8]
>encodeUTF8 :: [Word8] -> String
I think you mean
encodeUTF8 :: String -> [Word8]
decodeUTF8 :: [Word8] -> String
...or even
decodeUTF8 :: [Word8] -> Maybe String
It might also be useful to have stream functions. Decoding UTF8 octets is
a kind of parsing, after all.
But yes, you're right. A Char is a Unicode codepoint, nothing else, and
certainly not a C 'char'. A C char is _usually_ a Word8 or an Int8, but
not necessarily IIRC. I've always thought it a bit odd that the
well-specified types Word8, Int8 etc. are hidden away in a package while
the machine-dependent Int type, which I avoid in all my code, is in the
Prelude.
--
Ashley Yakeley, Seattle WA