Text in Haskell: a second proposal

Thu, 15 Aug 2002 10:36:21 +0100

[ moving to haskell-i18n@haskell.org ]

> For ISO-8859-1 each Char is exactly one Word8, so surely it=20
> would work fine with partial reads?
>=20
>      decodeCharISO88591 :: Word8 -> Char;
>=20
>      encodeCharISO88591 :: Char -> Word8;
>=20
>      decodeISO88591 :: [Word8] -> [Char];
>      decodeISO88591 =3D fmap decodeCharISO88591;
>=20
>      encodeISO88591 :: [Char] -> [Word8];
>      encodeISO88591 =3D fmap encodeCharISO88591;

Sorry, I thought you were just using ISO8859-1 as an example.

> >This is better: it doesn't force you to use lazy I/O, and when
> >specialised to the IO monad it might get decent performance.  The
> >problem is that in general I don't think you can assume the lack of
> >state.  For example: UTF-7 has a state which needs to be retained
> >between characters, and UTF-16 and UTF-32 have an endianness=20
> state which
> >can be changed by a special sequence at the beginning of the=20
> file.  Some
> >other encodings have states too.
>=20
> But it is possible to do this in Haskell...
>=20
> The rule for the many functions in the standard libraries seems to be=20
> "implement as much in Haskell as possible". Why is it any=20
> different with  the file APIs?

I think we've lost track of the discussion here... I'll try to
summarise.

I think character encoding/decoding should be built-in to the I/O
system.  I also think there should be a low-level I/O interface that
doesn't do any encoding, and high-level interfaces to the various
encodings.

Now, you can by all means specify the high-level I/O in terms of the
low-level I/O + encodings, but I strongly suspect that implementing it
that way will be expensive.  Character I/O in Haskell is *already* very
slow (see Doug Bagely's language shootout for evidence), and I don't
want to add another factor of 2 or more to that.  The point is that by
building encoding into the I/O interface the implementor gets the
opportunity to optimise.

Cheers,
	Simon