[Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.
Brandon S. Allbery KF8NH
allbery at ece.cmu.edu
Tue Oct 2 21:45:59 EDT 2007
On Oct 2, 2007, at 21:12 , Isaac Dupree wrote:
> Stefan O'Rear wrote:
>> On Tue, Oct 02, 2007 at 11:05:38PM +0200, Johan Tibell wrote:
>>>> I do not believe that anyone was seriously advocating multiple
>>>> blessed
>>>> encodings. The main question is *which* encoding to bless. 99+
>>>> % of
>>>> text I encounter is in US-ASCII, so I would favor UTF-8. Why is
>>>> UTF-16
>>>> better for me?
>>> All software I write professional have to support 40 languages
>>> (including CJK ones) so I would prefer UTF-16 in case I could use
>>> Haskell at work some day in the future. I dunno that who uses what
>>> encoding the most is good grounds to pick encoding though. Ease of
>>> implementation and speed on some representative sample set of
>>> text may
>>> be.
>> UTF-8 supports CJK languages too. The only question is efficiency
>
> Due to the additional complexity of handling UTF-8 -- EVEN IF the
> actual text processed happens all to be US-ASCII -- will UTF-8
> perhaps be less efficient than UTF-16, or only as fast?
UTF8 will be very slightly faster in the all-ASCII case, but quickly
blows chunks if you have *any* characters that require multibyte.
Given the way UTF8 encoding works, this includes even Latin-1 non-
ASCII, never mind CJK. (I think people have been missing that
point. UTF8 is only cheap for 00-7f, *nothing else*.)
--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery at kf8nh.com
system administrator [openafs,heimdal,too many hats] allbery at ece.cmu.edu
electrical and computer engineering, carnegie mellon university KF8NH
More information about the Haskell-Cafe
mailing list