To show or not to show french accents
Marcin Benke
marcin at cs.chalmers.se
Fri Dec 19 17:52:37 EST 2003
MR K P SCHUPKE wrote:
>>The problem is that if you are reading single bytes, 233 is
>>not necessarily é.
>>
>>
>
>Erm, Internationalisation is not my thin as such... but I can't help
>commenting that from a systems point of view this is an utterly bad
>sitiation to be in... I though Haskell used unicode? I thought in unicode
>the id of a character was fixed irrespective of language. Where is
>unicode support lacking?
>
> Regards,
> Keean Schupke.
>
>
quoting from the latest version of Unicode standard:
"The Unicode Standard specifies a numeric value (code point) and a name
for each of its characters.[...]
Unicode provides for three encoding forms: a 32-bit form (UTF-32), a
16-bit form (UTF- 16), and an 8-bit form (UTF-8)."
Hence in Unicode proper, characters are encoded as numbers (or actually
"code points"), not bytes. The byte-oriented encoding variant is UTF-8.
In UTF-8, however the byte "233" does not represent any character on its
own, but can only occur as the first byte of a 3 byte sequence. OTOH,
UTF-8 encodes characters in ASCII range in the same way as ASCII.
Regards,
Marcin Benke
More information about the Glasgow-haskell-users
mailing list