UTF-8 encode/decode libraries.
Sven.Panne at aedion.de
Mon Apr 26 21:33:38 EDT 2004
Duncan Coutts wrote:
> On Mon, 2004-04-26 at 18:49, David Brown wrote: [...]
> toUTF :: String -> String
Hmmm, "String -> [Word8]" would be nicer...
> fromUTF :: String -> String
... and here: "[Word8] -> String" or "[Word8] -> Maybe String".
Furthermore, UTF-8 is not restricted to a maximum of 3 bytes per character,
here an excerpt from "man utf8" on my SuSE Linux:
* UTF-8 encoded UCS characters may be up to six bytes
long, however the Unicode standard specifies no characters
above 0x10ffff, so Unicode characters can only be up to
four bytes long in UTF-8.
IIRC we discussed encoders/decoders quite some time ago on the libraries
mailing list, but nothing really happened, which is a pity. We should
strive for something more general than UTF-8 <-> UCS/Unicode, there are
quite a few more widely used encodings, e.g. GSM 03.38, etc. Any takers?
More information about the Libraries