UTF-8 encode/decode libraries.
Wolfgang Jeltsch
wolfgang at jeltsch.net
Tue May 4 19:28:50 EDT 2004
Am Dienstag, 4. Mai 2004 11:16 schrieb George Russell:
> Sven Panne wrote:
> > Hmmm, "String -> [Word8]" would be nicer...
>
> My UTF8 encoder is
> toUTF8 :: String -> String
> but an obvious alternative would be
> toUTF8 :: Enum codedChar => String -> [codedChar]
> and I could implement this quite easily, by globally-exchanging
> chr with toEnum. It would then be appropriate to SPECIALIZE
> to types String -> String and String -> [Word8], satisfying
> both the purists and those who actually want to write the
> output to a file.
Writing UTF-8 to a file should be done using binary output anyway, since UTF-8
is a sequence of octets. So Word8 would also be the way to go for the "file
writers".
> > ... and here: "[Word8] -> String" or "[Word8] -> Maybe String
>
> and my UTF8 decoder has type
>
> fromUTF8WE :: Monad m => String -> m String
>
> Errors are reported by "fail". If for example you import
> Control.Monad.Error that means you have a function returning
> either an error message or the converted string
>
> fromUTF8WE :: String -> Either String String
I like this "error handling via monads" and use it myself a lot.
> Of course for Word8, you would change the type of the decoder to
>
> fromUTF8WE :: (Monad m,Enum codedChar) => [codedChar] -> m String
>
> Incidentally I am *hoping* I shall be able to say that my UTF8 code
> is LGPL but you know what University administrators are like ...
Wolfgang
More information about the Libraries
mailing list