UTF8 libraries
Alistair Bayley
alistair at abayley.org
Fri Feb 2 07:01:04 EST 2007
What is the state of UTF8 support in Haskell libraries (base or
user-contributed)? I had a need for a UTF8 en & de-coder for Takusen,
and after looking around couldn't find anything particularly
satisfactory, so ended up writing (yet another) one.
I'm interested mainly in marshalling to/from CStrings, so support for
functions like peekUTF8String, newUTF8String, withUTF8String, etc is
interesting. I realise that one can use one of the pure decoders after
a peekCString, but that means building an intermediate list, which
isn't strictly necessary.
So far I've found the following:
- John Meacham's UTF8 lib:
http://repetae.net/repos/jhc/UTF8.hs
(only handles codepoints < 65536, pure String <-> [Word8] so no
direct CString marshalling)
- HXT's Text.XML.HXT.DOM.Unicode:
http://www.fh-wedel.de/~si/HXmlToolbox/
(full Unicode range - up to 6 bytes per char, pure String <-> String)
- George Russell's:
http://www.haskell.org/pipermail/glasgow-haskell-users/2004-April/006564.html
(buggy - won't roundtrip chars > 127, pure String <-> String)
The one I wrote, which is largely based on John Meacham's and HXT's
code, can be seen here:
http://darcs.haskell.org/takusen/Foreign/C/UTF8.hs
Alistair
More information about the Libraries
mailing list