[Haskell-i18n] Surrogate pairs?
Ashley Yakeley
ashley@semantic.org
Wed, 21 Aug 2002 00:35:01 -0700
At 2002-08-21 00:17, Ketil Z. Malde wrote:
>> \#00E1 [LATIN SMALL LETTER A WITH ACUTE]
>
>> or
>
>> \#0061 [LATIN SMALL LETTER A] + \#0301 [COMBINING ACUTE ACCENT]
>
>I guess they must be treated the same, too? That is, the length of
>the strings should be the same, they should compare equal, etc etc.
In my opinion no. As far as String is concerned, since it is simply
[Char], it should be considered as simply a list of codepoints without
further interpretation. So 'length' and its instance for Eq should be the
same as for any other list.
>Or is it an alternative to just ignore the issue, and simply think of
>the latter as two characters?
Consider the latter as two codepoints, and don't worry about characters.
There should be separate functions for doing such things as decomposition
and equivalence.
--
Ashley Yakeley, Seattle WA