[Haskell-cafe] Re: String vs ByteString

Bryan O'Sullivan bos at serpentine.com
Sat Aug 14 19:38:58 EDT 2010


On Sat, Aug 14, 2010 at 3:46 PM, Sean Leather <leather at cs.uu.nl> wrote:

>
> So then, what is the standard?
>

There isn't one. There are many national standards:

   - China: GB-2312, GBK and GB18030
   - Taiwan: Big5
   - Japan: JIS and Shift-JIS (0208 and 0213 variants) and EUC-JP
   - Korea: KS-X-2001, EUC-KR, and ISO-2022-KR

In general, Unicode uptake is increasing rapidly:
http://googleblog.blogspot.com/2010/01/unicode-nearing-50-of-web.html

Being not familiar with this area, I googled a bit, and I don't see a
> consensus. But I also noticeably don't see UTF-16. So, if this is the case,
> then a similar question still arises for CJK text: What format/library to
> use for it (assuming one doesn't want a performance penalty for translating
> between Data.Text's internal format and the target format)?
>

In my opinion, this "performance penalty" hand-wringing is mostly silly.
We're talking a pretty small factor of performance difference in most of
these cases. Even the biggest difference, between ByteString and String, is
usually much less than a factor of 100.

Your absolute first concern should be correctness, for which you should (a)
use text and (b) assume that any performance issues are being actively
worked on, especially if you report concrete problems and how to reproduce
them. In the unlikely event that you need to support non-Unicode encodings,
they are readily available via text-icu.

The only significant change to the text API that lies ahead is an
introduction of locale support in a few critical places, so that we can do
the right thing for languages like Turkish.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100814/b9d1d43d/attachment-0001.html


More information about the Haskell-Cafe mailing list