[Haskell-cafe] PROPOSAL: New efficient Unicode string library.
Deborah Goldsmith
dgoldsmith at mac.com
Tue Sep 25 22:47:07 EDT 2007
I'll look over the proposal more carefully when I get time, but the
most important issue is to not let the storage type leak into the
interface.
From an implementation point of view, UTF-16 is the most efficient
representation for processing Unicode. It's the native Unicode
representation for Windows, Mac OS X, and the ICU open source i18n
library. UTF-8 is not very efficient for anything except English. Its
most valuable property is compatibility with software that thinks of
character strings as byte arrays, and in fact that's why it was
invented.
UTF-32 is conceptually cleaner, but characters outside the BMP (Basic
Multilingual Plane) are rare in actual text, so UTF-16 turns out to
be the best combination of space and time efficiency.
Deborah
On Sep 24, 2007, at 3:52 PM, Johan Tibell wrote:
> Dear haskell-cafe,
>
> I would like to propose a new, ByteString like, Unicode string library
> which can be used where both efficiency (currently offered by
> ByteString) and i18n support (currently offered by vanilla Strings)
> are needed. I wrote a skeleton draft today but I'm a bit tired so I
> didn't get all the details. Nevertheless I think it fleshed out enough
> for some initial feedback. If I can get the important parts nailed
> down before Hackathon I could hack on it there.
>
> Apologies for not getting everything we discussed on #haskell down in
> the first draft. It'll get in there eventually.
>
> Bring out your Unicode kung-fu!
>
> http://haskell.org/haskellwiki/UnicodeByteString
>
> Cheers,
>
> Johan Tibell
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
More information about the Haskell-Cafe
mailing list