Unicode

Ashley Yakeley ashley@semantic.org
Thu, 24 May 2001 14:41:21 -0700


At 2001-05-24 05:57, Julian Seward (Intl Vendor) wrote:

>   - Initial Unicode support - the Char type is now 31 bits.

It might be appropriate to have two types for Unicode, a UCS2 type (16 
bits) and a UCS4 type (31 bits). For instance, something like:

--
newtype UCS2CodePoint = MkUCS2CodePoint Word16
newtype UCS4CodePoint = MkUCS4CodePoint Word31
type Char = UCS4CodePoint

toUCS4 :: UCS2CodePoint -> UCS4CodePoint
fromUCS4 :: UCS4CodePoint -> Maybe UCS2CodePoint
encodeUTF16 :: [UCS4CodePoint] -> Maybe [UCS2CodePoint]
decodeUTF16 :: [UCS2CodePoint] -> Maybe [UCS4CodePoint]
--


-- 
Ashley Yakeley, Seattle WA