UTF8 (was Re: Hexdump)

Bulat Ziganshin bulat.ziganshin at gmail.com
Wed Mar 29 09:38:11 EST 2006


Hello Malcolm,

Tuesday, March 21, 2006, 7:07:53 PM, you wrote:

> I was also thinking it would be nice to have pure Haskell
> implementations of the various Unicode encodings.  Here is my attempt at
> the UTF-8 codec.

UTF-8 codecs are migrating from app to app, you can find such code in
the ghc, jhc, darcs... all these codecs use the ([Char] <-> [Word8])
conversion that is both slow (because lists are lazy) and can't be
used in non-list environment (how, for example, we can read enough
bytes to decode just one Char?). in my Streams library, i used
higher-order monadic functions to implement encodings.

In my model, encoder is just a higher-order function that accepts as
parameter function (putByte :: (Monad m) => Word8 -> m ()) and uses it to
implement (putChar :: (Monad m) => Char -> m ()) operation, so each
encoder has type:
utf8Encode :: (Monad m) => (Word8 -> m ()) -> Char -> m ()

In the same fashion, each decoder accepts parameter of functional type
(getByte :: (Monad m) => m Word8), and uses it to implement
(getChar :: (Monad m) => m Char) operation, so the whole decoder has
type:
utf8Decode :: (Monad m) => m Word8 -> m Char

Using these higher-order functions allows me to implement both UTF8
(and any other) encoding for text streams and UTF8 encoding for
serializing strings/chars in binary i/o module. i attached this
module.

-- 
Best regards,
 Bulat                            mailto:Bulat.Ziganshin at gmail.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CharEncoding.hs
Type: application/octet-stream
Size: 4047 bytes
Desc: not available
Url : http://www.haskell.org//pipermail/libraries/attachments/20060329/240fa8b3/CharEncoding.obj


More information about the Libraries mailing list