[Haskell-cafe] Re: String vs ByteString

Tako Schotanus tako at codejive.org
Tue Aug 17 04:46:35 EDT 2010


On Tue, Aug 17, 2010 at 10:34, Bulat Ziganshin <bulat.ziganshin at gmail.com>wrote:

> Hello Johan,
>
> Tuesday, August 17, 2010, 12:20:37 PM, you wrote:
>
> >  I agree, Data.Text is great.  Unfortunately, its internal use of UTF-16
> >  makes it inefficient for many purposes.
>
> > It's not clear to me that using UTF-16 internally does make
> > Data.Text noticeably slower.
>
> not slower but require 2x more memory. speed is the same since
> Unicode contains 2^20 codepoints
>
>
This is not entirely correct because it all depends on your data.
For western languages is normally holds true that UTF16 occupies twice the
memory of UTF8, but for other languages code points might take up to 3 bytes
(I thought even 4, but the wikipedia page only mentions 3:
http://en.wikipedia.org/wiki/UTF-8).

That wikipedia page is a nice read anyway, it mentions some of the
advantages and disadvantages of the different encodings.
(The complexity of the code that determines the length of an UTF string
depends on the encoding for example)

Cheers,
 -Tako
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100817/06767364/attachment.html


More information about the Haskell-Cafe mailing list