Text in Haskell: a second proposal
Thu, 15 Aug 2002 10:36:21 +0100
[ moving to firstname.lastname@example.org ]
> For ISO-8859-1 each Char is exactly one Word8, so surely it=20
> would work fine with partial reads?
> decodeCharISO88591 :: Word8 -> Char;
> encodeCharISO88591 :: Char -> Word8;
> decodeISO88591 :: [Word8] -> [Char];
> decodeISO88591 =3D fmap decodeCharISO88591;
> encodeISO88591 :: [Char] -> [Word8];
> encodeISO88591 =3D fmap encodeCharISO88591;
Sorry, I thought you were just using ISO8859-1 as an example.
> >This is better: it doesn't force you to use lazy I/O, and when
> >specialised to the IO monad it might get decent performance. The
> >problem is that in general I don't think you can assume the lack of
> >state. For example: UTF-7 has a state which needs to be retained
> >between characters, and UTF-16 and UTF-32 have an endianness=20
> state which
> >can be changed by a special sequence at the beginning of the=20
> file. Some
> >other encodings have states too.
> But it is possible to do this in Haskell...
> The rule for the many functions in the standard libraries seems to be=20
> "implement as much in Haskell as possible". Why is it any=20
> different with the file APIs?
I think we've lost track of the discussion here... I'll try to
I think character encoding/decoding should be built-in to the I/O
system. I also think there should be a low-level I/O interface that
doesn't do any encoding, and high-level interfaces to the various
Now, you can by all means specify the high-level I/O in terms of the
low-level I/O + encodings, but I strongly suspect that implementing it
that way will be expensive. Character I/O in Haskell is *already* very
slow (see Doug Bagely's language shootout for evidence), and I don't
want to add another factor of 2 or more to that. The point is that by
building encoding into the I/O interface the implementor gets the
opportunity to optimise.