[Haskell-cafe] Unicode normalization
wren ng thornton
wren at freegeek.org
Wed Apr 20 04:02:09 CEST 2011
Hello all,
I'm in need of a Unicode normalization function, Utf8 NFC for ByteString
in particular. From some quick Googling around it looks like the only
available option is to use ICU in some form. The text-icu package has a
nice binding to it, but unfortunately that means a lot of redundant
conversions (Utf8 ByteString -> Text; Text -> Utf8 ByteString) and an
additional rather large non-Haskell dependency[1].
Is ICU really the only available implementation of normalization? The
TR15 doesn't really give a complete algorithm and only hints at the
"numerous opportunities for optimization" implicit in the complexity of
the spec.
[1] Which is especially annoying on OSX since OSX does ship with libicu
in a public location, but it doesn't provide header files and apparently
it's incomplete somehow, meaning you'd have to reinstall it for text-icu
to use (and hilarity ensues when your copy gets out of sync with the OS's).
--
Live well,
~wren
More information about the Haskell-Cafe
mailing list