[Haskell-cafe] Unicode normalization

Colin Adams colinpauladams at googlemail.com
Wed Apr 20 08:22:58 CEST 2011


It may be the only highly optimised implementation with source code
available.

I needed to implement this some years ago in Eiffel. I did it in a fairly
straight-forward way, without worrying too much about optimal code, as my
requirement was merely to conform. The nice thing about implementing
normalization is that the test file provided on unicode.org means you can
automate generation of a comprehensive test suite. So you can implement a
naive algorithm (as I did), and then progressively optimize (which I never
got round to doing), with 100% assurance that you're not introducing bugs.

On 20 April 2011 03:02, wren ng thornton <wren at freegeek.org> wrote:

> Hello all,
>
> I'm in need of a Unicode normalization function, Utf8 NFC for ByteString in
> particular. From some quick Googling around it looks like the only available
> option is to use ICU in some form. The text-icu package has a nice binding
> to it, but unfortunately that means a lot of redundant conversions (Utf8
> ByteString -> Text; Text -> Utf8 ByteString) and an additional rather large
> non-Haskell dependency[1].
>
> Is ICU really the only available implementation of normalization? The TR15
> doesn't really give a complete algorithm and only hints at the "numerous
> opportunities for optimization" implicit in the complexity of the spec.
>
>
> [1] Which is especially annoying on OSX since OSX does ship with libicu in
> a public location, but it doesn't provide header files and apparently it's
> incomplete somehow, meaning you'd have to reinstall it for text-icu to use
> (and hilarity ensues when your copy gets out of sync with the OS's).
>
> --
> Live well,
> ~wren
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>



-- 
Colin Adams
Preston, Lancashire, ENGLAND
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110420/6da7dc42/attachment.htm>


More information about the Haskell-Cafe mailing list