[Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

Tue Sep 25 23:10:44 EDT 2007

On 2007-09-26, Deborah Goldsmith <dgoldsmith at mac.com> wrote:
>  From an implementation point of view, UTF-16 is the most efficient  
> representation for processing Unicode.

This depends on the characteristics of the text being processed.
Spacewise, English stays 1 byte/char in UTF-8.  Most European languages
go up to at most 2, and on average only a bit above 1.  Greek and
Cyrillic are 2 bytes/char.  It's really only the Asian, African, Arabic,
etc, that lose space-wise.

It's true that time-wise there are definite issues in finding character
boundaries.

-- 
Aaron Denney
-><-