[Haskell-cafe] Re: String vs ByteString

wren ng thornton wren at freegeek.org
Wed Aug 18 04:38:40 EDT 2010

Jinjing Wang wrote:
>> John Millikin wrote:
>>> The reason many Japanese and Chinese users reject UTF-8 isn't due to
>>> space constraints (UTF-8 and UTF-16 are roughly equal), it's because
>>> they reject Unicode itself.
>> +1.
>> This is the thing Unicode advocates don't want to admit. Until Unicode has
>> code points for _all_ Chinese and Japanese characters, there will be active
>> resistance to adoption.
> [...] 
> However, many of the popular websites started during web 2.0 are adopting utf-8
> for example:
> * renren.com (chinese largest facebook clone)
> * www.kaixin001.com (chinese second largest facebook clone)
> * t.sina.com.cn (an example of twitter clone)
> These websites adopted utf-8 because (I think) most web development
> tools have already standardized on utf-8, and there's little reason
> change it.

Interesting. I don't know much about the politics of Chinese encodings, 
other than that the GB formats are/were dominant.

As for the politics of Japanese encodings, last time I did web work 
(just at the beginning of web2.0, before they started calling it that) 
there was still a lot of active resistance among the Japanese. Given 
some of the characters folks were complaining about, I think it's more 
an issue of principle than practicality. Then again, the Japanese do 
love their language games, so obscure and archaic characters are used 
far more often than would be expected... Whether web2.0 has caused the 
Japanese to change too, I can't say. I got out of that line of work ^_^

> I'm not aware of any (at least common) chinese characters that can be
> represented by gb2312 but not in unicode. Since the range of gb2312 is
> a subset of the range of gbk, which is a subset of the range of
> gb18030. And gb18030 is just another encoding of unicode.

All the specific characters I've seen folks complain about were very 
uncommon or even archaic. All the common characters are there for 
Japanese too. The only time I've run into issues it was for an archaic 
character used in a manga title. I was working on a library catalog, and 
was too pedantic to spell it "wrong".

Live well,

More information about the Haskell-Cafe mailing list