[Haskell-cafe] Re: String vs ByteString

Wed Aug 18 01:42:27 EDT 2010

> John Millikin wrote:
>>
>> The reason many Japanese and Chinese users reject UTF-8 isn't due to
>> space constraints (UTF-8 and UTF-16 are roughly equal), it's because
>> they reject Unicode itself.
>
> +1.
>
> This is the thing Unicode advocates don't want to admit. Until Unicode has
> code points for _all_ Chinese and Japanese characters, there will be active
> resistance to adoption.
>
> --
> Live well,
> ~wren

For mainland chinese websites:

Most that became popular during web 1.0 (5-10 years ago) are using
utf-8 incompatible format, e.g. gb2312.

for example:

* www.sina.com.cn
* www.sohu.com

They didn't switch to utf-8 probably just because they never have to.

However, many of the popular websites started during web 2.0 are adopting utf-8

for example:

* renren.com (chinese largest facebook clone)
* www.kaixin001.com (chinese second largest facebook clone)
* t.sina.com.cn (an example of twitter clone)

These websites adopted utf-8 because (I think) most web development
tools have already standardized on utf-8, and there's little reason
change it.

I'm not aware of any (at least common) chinese characters that can be
represented by gb2312 but not in unicode. Since the range of gb2312 is
a subset of the range of gbk, which is a subset of the range of
gb18030. And gb18030 is just another encoding of unicode.

ref:

* http://en.wikipedia.org/wiki/GB_18030

-- 
jinjing