[Haskell-cafe] Re: String vs ByteString
Bryan O'Sullivan
bos at serpentine.com
Sat Aug 14 19:38:58 EDT 2010
On Sat, Aug 14, 2010 at 3:46 PM, Sean Leather <leather at cs.uu.nl> wrote:
>
> So then, what is the standard?
>
There isn't one. There are many national standards:
- China: GB-2312, GBK and GB18030
- Taiwan: Big5
- Japan: JIS and Shift-JIS (0208 and 0213 variants) and EUC-JP
- Korea: KS-X-2001, EUC-KR, and ISO-2022-KR
In general, Unicode uptake is increasing rapidly:
http://googleblog.blogspot.com/2010/01/unicode-nearing-50-of-web.html
Being not familiar with this area, I googled a bit, and I don't see a
> consensus. But I also noticeably don't see UTF-16. So, if this is the case,
> then a similar question still arises for CJK text: What format/library to
> use for it (assuming one doesn't want a performance penalty for translating
> between Data.Text's internal format and the target format)?
>
In my opinion, this "performance penalty" hand-wringing is mostly silly.
We're talking a pretty small factor of performance difference in most of
these cases. Even the biggest difference, between ByteString and String, is
usually much less than a factor of 100.
Your absolute first concern should be correctness, for which you should (a)
use text and (b) assume that any performance issues are being actively
worked on, especially if you report concrete problems and how to reproduce
them. In the unlikely event that you need to support non-Unicode encodings,
they are readily available via text-icu.
The only significant change to the text API that lies ahead is an
introduction of locale support in a few critical places, so that we can do
the right thing for languages like Turkish.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100814/b9d1d43d/attachment-0001.html
More information about the Haskell-Cafe
mailing list