UTF-8 encode/decode libraries.

Duncan Coutts duncan.coutts at worcester.oxford.ac.uk
Mon Apr 26 20:05:02 EDT 2004


On Mon, 2004-04-26 at 18:49, David Brown wrote:
> Is anyone aware of any Haskell libraries for doing UTF-8 decoding and
> encoding?  If not, I'll write something simple.

The gtk2hs library uses the following functions internally.
Credit to Axel Simon I believe unless he swiped them from somewhere too.

-- Convert Unicode characters to UTF-8.
--
toUTF :: String -> String
toUTF [] = []
toUTF (x:xs) | ord x<=0x007F = x:toUTF xs
             | ord x<=0x07FF = chr (0xC0 .|. ((ord x `shift` (-6)) .&. 0x1F)):
                               chr (0x80 .|. (ord x .&. 0x3F)):
                               toUTF xs
             | otherwise     = chr (0xE0 .|. ((ord x `shift` (-12)) .&. 0x0F)):
                               chr (0x80 .|. ((ord x `shift` (-6)) .&. 0x3F)):
                               chr (0x80 .|. (ord x .&. 0x3F)):
                               toUTF xs

-- Convert UTF-8 to Unicode.
--
fromUTF :: String -> String
fromUTF [] = []
fromUTF (all@(x:xs)) | ord x<=0x7F = x:fromUTF xs
                     | ord x<=0xBF = err
                     | ord x<=0xDF = twoBytes all
                     | ord x<=0xEF = threeBytes all
                     | otherwise   = err
  where
    twoBytes (x1:x2:xs) = chr (((ord x1 .&. 0x1F) `shift` 6) .|.
                               (ord x2 .&. 0x3F)):fromUTF xs
    twoBytes _ = error "fromUTF: illegal two byte sequence"

    threeBytes (x1:x2:x3:xs) = chr (((ord x1 .&. 0x0F) `shift` 12) .|.
                                    ((ord x2 .&. 0x3F) `shift` 6) .|.
                                    (ord x3 .&. 0x3F)):fromUTF xs
    threeBytes _ = error "fromUTF: illegal three byte sequence"

    err = error "fromUTF: illegal UTF-8 character"

Duncan



More information about the Glasgow-haskell-users mailing list