UTF8 libraries

Alistair Bayley alistair at abayley.org
Fri Feb 2 07:01:04 EST 2007


What is the state of UTF8 support in Haskell libraries (base or
user-contributed)? I had a need for a UTF8 en & de-coder for Takusen,
and after looking around couldn't find anything particularly
satisfactory, so ended up writing (yet another) one.

I'm interested mainly in marshalling to/from CStrings, so support for
functions like peekUTF8String, newUTF8String, withUTF8String, etc is
interesting. I realise that one can use one of the pure decoders after
a peekCString, but that means building an intermediate list, which
isn't strictly necessary.

So far I've found the following:

 - John Meacham's UTF8 lib:
   http://repetae.net/repos/jhc/UTF8.hs
   (only handles codepoints < 65536, pure String <-> [Word8] so no
direct CString marshalling)

 - HXT's  Text.XML.HXT.DOM.Unicode:
   http://www.fh-wedel.de/~si/HXmlToolbox/
   (full Unicode range - up to 6 bytes per char, pure String <-> String)

 - George Russell's:
   http://www.haskell.org/pipermail/glasgow-haskell-users/2004-April/006564.html
   (buggy - won't roundtrip chars > 127, pure String <-> String)


The one I wrote, which is largely based on John Meacham's and HXT's
code, can be seen here:
  http://darcs.haskell.org/takusen/Foreign/C/UTF8.hs

Alistair


More information about the Libraries mailing list