[Haskell-cafe] PROPOSAL: New efficient Unicode string library.

Wed Sep 26 12:44:04 EDT 2007

On Wed, 2007-09-26 at 09:05 +0200, Johan Tibell wrote:
> > I'll look over the proposal more carefully when I get time, but the
> > most important issue is to not let the storage type leak into the
> > interface.
> 
> Agreed,
> 
> >  From an implementation point of view, UTF-16 is the most efficient
> > representation for processing Unicode. It's the native Unicode
> > representation for Windows, Mac OS X, and the ICU open source i18n
> > library. UTF-8 is not very efficient for anything except English. Its
> > most valuable property is compatibility with software that thinks of
> > character strings as byte arrays, and in fact that's why it was
> > invented.
> 
> If UTF-16 is what's used by everyone else (how about Java? Python?) I
> think that's a strong reason to use it. I don't know Unicode well
> enough to say otherwise.

I disagree.  I realize I'm a dissenter in this regard, but my position
is: excellent Unix support first, portability second, excellent support
for Win32/MacOS a distant third.  That seems to be the opposite of every
language's position.  Unix absolutely needs UTF-8 for backward
compatibility.

jcc