[Haskell-cafe] Re: String vs ByteString
Tako Schotanus
tako at codejive.org
Tue Aug 17 07:55:05 EDT 2010
On Tue, Aug 17, 2010 at 13:40, Ketil Malde <ketil at malde.org> wrote:
> Michael Snoyman <michael at snoyman.com> writes:
>
> > As far as space usage, you are correct that CJK data will take up more
> > memory in UTF-8 than UTF-16.
>
> With the danger of sounding ... alphabetist? as well as belaboring a
> point I agree is irrelevant (the storage format):
>
> I'd point out that it seems at least as unfair to optimize for CJK at
> the cost of Western languages.
>
Thing is that here you're only talking about size optimizations, for
somebody having to handle a lot of international texts (and I'm not
necessarily talking about Chinese or Japanese here) it would be important
that this is handled in the most efficient way possible, because in the end
storing and retrieving you only do once each while maybe doing a lot of
processing in between. And the on-disk storage or the over-the-wire format
might very well be different than the in-memory format. Each can be selected
for what it's best at.
I'll repeat here that in my opinion a Text package should be good at
handling text, human text, from whatever country. If I need to handle large
streams of ASCII I'll use something else.
:)
Cheers,
-Tako
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100817/875f9942/attachment.html
More information about the Haskell-Cafe
mailing list