[Haskell-cafe] Has character changed in GHC 6.8?

Wed Jan 23 07:12:55 EST 2008

> > > > What *does* matter to the programmer is what encodings putStr and
> > > > getLine use. AFAIK, they use "lower 8 bits of unicode code point" which
> > > > is almost functionally equivalent to latin-1.
> > >
> > > Which is terrible! You should have to be explicit about what encoding
> > > you expect. Python 3000 does it right.
> >
> > Presumably there wasn't a sufficiently good answer available in time for
> > haskell98.
>
> Will there be one for haskell prime ?

The I/O library needs an overhaul but I'm not sure how to do this in a
backwards compatible manner which probably would be required for
inclusion in Haskell'. One could, like Python 3000, break backwards
compatibility. I'm not sure about the implications of doing this.
Maybe introducing a new System.IO.Unicode module would be an option.

If one wants to keep the interface but change the semantics slightly
one could define e.g. getChar as:

getChar :: IO Char
getChar = getWord8 >>= decodeChar latin1

Assuming latin-1 is what's used now.

The benefit would be that if the input is not in latin-1 an exception
could be thrown rather than returning a Char representing the wrong
Unicode code point.

I recommend reading about the Python I/O system overhaul for Python
3000 which is outlined in PEP 3116
http://www.python.org/dev/peps/pep-3116/

My proposal is for I/O functions to specify the encoding they use if
they accept or return Chars (and Strings). If they deal in terms of
bytes (e.g. socket functions) they should accept and return Word8s.
Optionally, text I/O functions could default to the system locale
setting.

-- Johan