[Haskell-cafe] PROPOSAL: New efficient Unicode string library.

Deborah Goldsmith dgoldsmith at mac.com
Tue Sep 25 22:47:07 EDT 2007


I'll look over the proposal more carefully when I get time, but the  
most important issue is to not let the storage type leak into the  
interface.

 From an implementation point of view, UTF-16 is the most efficient  
representation for processing Unicode. It's the native Unicode  
representation for Windows, Mac OS X, and the ICU open source i18n  
library. UTF-8 is not very efficient for anything except English. Its  
most valuable property is compatibility with software that thinks of  
character strings as byte arrays, and in fact that's why it was  
invented.

UTF-32 is conceptually cleaner, but characters outside the BMP (Basic  
Multilingual Plane) are rare in actual text, so UTF-16 turns out to  
be the best combination of space and time efficiency.

Deborah

On Sep 24, 2007, at 3:52 PM, Johan Tibell wrote:

> Dear haskell-cafe,
>
> I would like to propose a new, ByteString like, Unicode string library
> which can be used where both efficiency (currently offered by
> ByteString) and i18n support (currently offered by vanilla Strings)
> are needed. I wrote a skeleton draft today but I'm a bit tired so I
> didn't get all the details. Nevertheless I think it fleshed out enough
> for some initial feedback. If I can get the important parts nailed
> down before Hackathon I could hack on it there.
>
> Apologies for not getting everything we discussed on #haskell down in
> the first draft. It'll get in there eventually.
>
> Bring out your Unicode kung-fu!
>
> http://haskell.org/haskellwiki/UnicodeByteString
>
> Cheers,
>
> Johan Tibell
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe



More information about the Haskell-Cafe mailing list