[Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.
Aaron Denney
wnoise at ofb.net
Tue Sep 25 23:10:44 EDT 2007
On 2007-09-26, Deborah Goldsmith <dgoldsmith at mac.com> wrote:
> From an implementation point of view, UTF-16 is the most efficient
> representation for processing Unicode.
This depends on the characteristics of the text being processed.
Spacewise, English stays 1 byte/char in UTF-8. Most European languages
go up to at most 2, and on average only a bit above 1. Greek and
Cyrillic are 2 bytes/char. It's really only the Asian, African, Arabic,
etc, that lose space-wise.
It's true that time-wise there are definite issues in finding character
boundaries.
--
Aaron Denney
-><-
More information about the Haskell-Cafe
mailing list