[Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

Aaron Denney wnoise at ofb.net
Thu Sep 27 02:39:24 EDT 2007


On 2007-09-27, Deborah Goldsmith <dgoldsmith at mac.com> wrote:
> On Sep 26, 2007, at 11:06 AM, Aaron Denney wrote:
>>> UTF-16 has no advantage over UTF-8 in this respect, because of  
>>> surrogate
>>> pairs and combining characters.
>>
>> Good point.
>
> Well, not so much. As Duncan mentioned, it's a matter of what the most  
> common case is. UTF-16 is effectively fixed-width for the majority of  
> text in the majority of languages. Combining sequences and surrogate  
> pairs are relatively infrequent.

Infrequent, but they exist, which means you can't seek x/2 bytes ahead
to seek x characters ahead.  All such seeking must be linear for both
UTF-16 *and* UTF-8.

-- 
Aaron Denney
-><-



More information about the Haskell-Cafe mailing list