UTF8 (was Re: Hexdump)

Malcolm Wallace Malcolm.Wallace at cs.york.ac.uk
Tue Mar 21 10:34:42 EST 2006


Oops, I wrote:

fromUTF8 (w:ws)
    | w <  0x80  {- 0xxxxxxx -} = toEnum (fromEnum w) : fromUTF8 ws
    | w >= 0xc0  {- 1111110x -} = bytes 5 (fromEnum (w`mask`0x01)) ws
    | w >= 0xe0  {- 111110xx -} = bytes 4 (fromEnum (w`mask`0x03)) ws
    | w >= 0xf0  {- 11110xxx -} = bytes 3 (fromEnum (w`mask`0x07)) ws
    | w >= 0xf8  {- 1110xxxx -} = bytes 2 (fromEnum (w`mask`0x0f)) ws
    | w >= 0xfc  {- 110xxxxx -} = bytes 1 (fromEnum (w`mask`0x1f)) ws

which should of course have been

fromUTF8 (w:ws)
    | w <  0x80  {- 0xxxxxxx -} = toEnum (fromEnum w) : fromUTF8 ws
    | w >= 0xfc  {- 1111110x -} = bytes 5 (fromEnum (w`mask`0x01)) ws
    | w >= 0xf8  {- 111110xx -} = bytes 4 (fromEnum (w`mask`0x03)) ws
    | w >= 0xf0  {- 11110xxx -} = bytes 3 (fromEnum (w`mask`0x07)) ws
    | w >= 0xe0  {- 1110xxxx -} = bytes 2 (fromEnum (w`mask`0x0f)) ws
    | w >= 0xc0  {- 110xxxxx -} = bytes 1 (fromEnum (w`mask`0x1f)) ws

Regards,
    Malcolm


More information about the Libraries mailing list