[Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

Thu Sep 27 16:30:30 EDT 2007

On 2007-09-27, Duncan Coutts <duncan.coutts at worc.ox.ac.uk> wrote:
> In message <slrnffmk0s.ic5.wnoise at ofb.net> wnoise at ofb.net writes:
>> On 2007-09-27, Deborah Goldsmith <dgoldsmith at mac.com> wrote:
>> > On Sep 26, 2007, at 11:06 AM, Aaron Denney wrote:
>> >>> UTF-16 has no advantage over UTF-8 in this respect, because of  
>> >>> surrogate
>> >>> pairs and combining characters.
>> >>
>> >> Good point.
>> >
>> > Well, not so much. As Duncan mentioned, it's a matter of what the most  
>> > common case is. UTF-16 is effectively fixed-width for the majority of  
>> > text in the majority of languages. Combining sequences and surrogate  
>> > pairs are relatively infrequent.
>> 
>> Infrequent, but they exist, which means you can't seek x/2 bytes ahead
>> to seek x characters ahead.  All such seeking must be linear for both
>> UTF-16 *and* UTF-8.
>
> And in [Char] for all these years, yet I don't hear people complaining. Most
> string processing is linear and does not need random access to characters.

Yeah.  I'm saying the differences between them are going to be in the
constant factors, and that these constant factors will differ between 
workloads.  

-- 
Aaron Denney
-><-