Text in Haskell: a second proposal

Ashley Yakeley ashley@semantic.org
Sat, 10 Aug 2002 00:59:43 -0700


At 2002-08-09 03:26, Simon Marlow wrote:

>Why combine I/O and {en,de}coding?  Firstly, efficiency. 

Hmm... surely the encoding functions can be defined efficiently?

    decodeISO88591 :: [Word8] -> [Char];
    encodeISO88591 :: [Char] -> [Word8]; -- uses low octet of codepoint

You could surely define them as native functions very efficiently, if 
necessary.

> Secondly,
>because it's convenient: if we were to express encodings as stream
>transformers, eg:
>
>	decodeUTF8 :: [Word8] -> [Char]
>
>Then we would have to do all our I/O using lazy streams.  You can't
>write hGetChar in terms of hGetWord8 using this: you need the non-stream
>version which in general looks something like
>
>	decode :: Word8 -> DecodingState 
>		-> (Maybe [Char], DecodingState)
>
>for UTF-8 you can get away with something simpler,

A monadic stream-transformer:

   decodeStreamUTF8 :: (Monad m) => m Word8 -> m Char;

   hGetChar h = decodeStreamUTF8 (hGetWord8 h);

This works provided each Char corresponds to a contiguous block of 
Word8s, with no state between them. I think that includes all the 
standard character encoding schemes.

> but AFAIK that's not
>true in general.  You might want to use compression as an encoding, for
>example. 

Wait a minute... you're not proposing that the existing Char-based IO 
functions be extended to do compression, are you?

If not, I don't see what your point is.


-- 
Ashley Yakeley, Seattle WA