[Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

Wed Oct 3 02:27:02 EDT 2007

On Tue, 2007-10-02 at 14:32 -0700, Stefan O'Rear wrote:

> UTF-8 supports CJK languages too.  The only question is efficiency, and
> I believe CJK is still a relatively uncommon case compared to English
> and other Latin-alphabet languages.  (That said, I live in a country all
> of whose dominant languages use the Latin alphabet)

As for space efficiency, I guess the argument could be made that since
an ideogram typically conveys a whole word, it is reasonably to spend
more bits for it.

Anyway, I am unsure if I should take part in this discussion, as I'm not
really dealing with text as such in multiple languages.  Most of my data
is in ASCII, and when they are not, I'm happy to treat it ("treat" here
meaning "mostly ignore") as Latin1 bytes (current ByteString) or UTF-8.
The only thing I miss is the ability to use String syntactic sugar --
but IIUC, that's coming?

However, increased space usage is not acceptable, and I also don't want
any conversion layer which could conceivably modify my data (e.g. by
normalizing or error handling).

-k