[Haskell-cafe] Re: String vs ByteString

Wed Aug 18 10:04:45 EDT 2010

On Wed, Aug 18, 2010 at 2:39 PM, Johan Tibell <johan.tibell at gmail.com>wrote:

> On Wed, Aug 18, 2010 at 2:12 AM, John Meacham <john at repetae.net> wrote:
>
>> <ranty thing to follow>
>> That said, there is never a reason to use UTF-16, it is a vestigial
>> remanent from the brief period when it was thought 16 bits would be
>> enough for the unicode standard, any defense of it nowadays is after the
>> fact justification for having accidentally standardized on it back in
>> the day.
>
>
> This is false. Text uses UTF-16 internally as early benchmarks indicated
> that it was faster. See Tom Harper's response to the other thread that was
> spawned of this thread by Ketil.
>
> Text continues to be UTF-16 today because
>
>     * no one has written a benchmark that shows that UTF-8 would be faster
> *for use in Data.Text*, and
>     * no one has written a patch that converts Text to use UTF-8
> internally.
>
> I'm quite frustrated by this whole discussion; there's lots of talking, no
> coding, and only a little benchmarking (of web sites, not code). This will
> get us nowhere.
>
> Here's my response to the two points:

* I haven't written a patch showing that Data.Text would be faster using
UTF-8 because that would require fulfilling the second point (I'll get to in
a second). I *have* shown where there are huge performance differences
between text and ByteString/String. Unfortunately, the response has been
"don't use bytestring, it's the wrong datatype, text will get fixed," which
is quite underwhelming.

* Since the prevailing attitude has been such a disregard to any facts shown
thus far, it seems that the effort required to learn the internals of the
text package and attempt a patch would be wasted. In the meanwhile, Jasper
has released blaze-builder which does an amazing job at producing UTF-8
encoded data, which for the moment is my main need. As much as I'll be
chastised by the community, I'll stick with this approach for the moment.

Now if you tell me that text would consider applying a UTF-8 patch, that
would be a different story. But I don't have the time to maintain a separate
UTF-8 version of text. For me, the whole point of this discussion was to
determine whether we should attempt porting to UTF-8, which as I understand
it would be a rather large undertaking.

Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100818/1d495b80/attachment.html