UTF-8 encode/decode libraries.

George Russell ger at informatik.uni-bremen.de
Tue May 4 12:16:22 EDT 2004


Sven Panne wrote:

 > Hmmm, "String -> [Word8]" would be nicer...

My UTF8 encoder is
    toUTF8 :: String -> String
but an obvious alternative would be
    toUTF8 :: Enum codedChar => String -> [codedChar]
and I could implement this quite easily, by globally-exchanging
chr with toEnum.  It would then be appropriate to SPECIALIZE
to types String -> String and String -> [Word8], satisfying
both the purists and those who actually want to write the
output to a file.

 > ... and here: "[Word8] -> String" or "[Word8] -> Maybe String
and my UTF8 decoder has type

    fromUTF8WE :: Monad m => String -> m String

Errors are reported by "fail".  If for example you import
Control.Monad.Error that means you have a function returning
either an error message or the converted string

    fromUTF8WE :: String -> Either String String

Of course for Word8, you would change the type of the decoder to

    fromUTF8WE :: (Monad m,Enum codedChar) => [codedChar] -> m String

Incidentally I am *hoping* I shall be able to say that my UTF8 code
is LGPL but you know what University administrators are like ...


More information about the Libraries mailing list