[Haskell-cafe] Re: String vs ByteString

Tako Schotanus tako at codejive.org
Tue Aug 17 07:55:05 EDT 2010


On Tue, Aug 17, 2010 at 13:40, Ketil Malde <ketil at malde.org> wrote:

> Michael Snoyman <michael at snoyman.com> writes:
>
> > As far as space usage, you are correct that CJK data will take up more
> > memory in UTF-8 than UTF-16.
>
> With the danger of sounding ... alphabetist? as well as belaboring a
> point I agree is irrelevant (the storage format):
>
> I'd point out that it seems at least as unfair to optimize for CJK at
> the cost of Western languages.
>

Thing is that here you're only talking about size optimizations, for
somebody having to handle a lot of international texts (and I'm not
necessarily talking about Chinese or Japanese here) it would be important
that this is handled in the most efficient way possible, because in the end
storing and retrieving you only do once each while maybe doing a lot of
processing in between. And the on-disk storage or the over-the-wire format
might very well be different than the in-memory format. Each can be selected
for what it's best at.

I'll repeat here that in my opinion a Text package should be good at
handling text, human text, from whatever country. If I need to handle large
streams of ASCII I'll use something else.

:)

Cheers,
 -Tako
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100817/875f9942/attachment.html


More information about the Haskell-Cafe mailing list