Text in Haskell: a second proposal
Ashley Yakeley
ashley@semantic.org
Sat, 10 Aug 2002 00:59:43 -0700
At 2002-08-09 03:26, Simon Marlow wrote:
>Why combine I/O and {en,de}coding? Firstly, efficiency.
Hmm... surely the encoding functions can be defined efficiently?
decodeISO88591 :: [Word8] -> [Char];
encodeISO88591 :: [Char] -> [Word8]; -- uses low octet of codepoint
You could surely define them as native functions very efficiently, if
necessary.
> Secondly,
>because it's convenient: if we were to express encodings as stream
>transformers, eg:
>
> decodeUTF8 :: [Word8] -> [Char]
>
>Then we would have to do all our I/O using lazy streams. You can't
>write hGetChar in terms of hGetWord8 using this: you need the non-stream
>version which in general looks something like
>
> decode :: Word8 -> DecodingState
> -> (Maybe [Char], DecodingState)
>
>for UTF-8 you can get away with something simpler,
A monadic stream-transformer:
decodeStreamUTF8 :: (Monad m) => m Word8 -> m Char;
hGetChar h = decodeStreamUTF8 (hGetWord8 h);
This works provided each Char corresponds to a contiguous block of
Word8s, with no state between them. I think that includes all the
standard character encoding schemes.
> but AFAIK that's not
>true in general. You might want to use compression as an encoding, for
>example.
Wait a minute... you're not proposing that the existing Char-based IO
functions be extended to do compression, are you?
If not, I don't see what your point is.
--
Ashley Yakeley, Seattle WA