[Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Thu Sep 27 06:34:39 EDT 2007


In message <slrnffmk0s.ic5.wnoise at ofb.net> wnoise at ofb.net writes:
> On 2007-09-27, Deborah Goldsmith <dgoldsmith at mac.com> wrote:
> > On Sep 26, 2007, at 11:06 AM, Aaron Denney wrote:
> >>> UTF-16 has no advantage over UTF-8 in this respect, because of  
> >>> surrogate
> >>> pairs and combining characters.
> >>
> >> Good point.
> >
> > Well, not so much. As Duncan mentioned, it's a matter of what the most  
> > common case is. UTF-16 is effectively fixed-width for the majority of  
> > text in the majority of languages. Combining sequences and surrogate  
> > pairs are relatively infrequent.
> 
> Infrequent, but they exist, which means you can't seek x/2 bytes ahead
> to seek x characters ahead.  All such seeking must be linear for both
> UTF-16 *and* UTF-8.

And in [Char] for all these years, yet I don't hear people complaining. Most
string processing is linear and does not need random access to characters.

Duncan


More information about the Haskell-Cafe mailing list