[Haskell-i18n] Surrogate pairs?

Ashley Yakeley ashley@semantic.org
Wed, 21 Aug 2002 00:35:01 -0700


At 2002-08-21 00:17, Ketil Z. Malde wrote:

>>   \#00E1 [LATIN SMALL LETTER A WITH ACUTE]
>
>> or
>
>>   \#0061 [LATIN SMALL LETTER A] + \#0301 [COMBINING ACUTE ACCENT]
>
>I guess they must be treated the same, too?  That is, the length of
>the strings should be the same, they should compare equal, etc etc.

In my opinion no. As far as String is concerned, since it is simply 
[Char], it should be considered as simply a list of codepoints without 
further interpretation. So 'length' and its instance for Eq should be the 
same as for any other list.

>Or is it an alternative to just ignore the issue, and simply think of
>the latter as two characters?

Consider the latter as two codepoints, and don't worry about characters. 
There should be separate functions for doing such things as decomposition 
and equivalence.

-- 
Ashley Yakeley, Seattle WA