H98 Text IO

Chris Kuklewicz haskell at list.mightyreason.com
Tue Feb 26 12:42:25 EST 2008

The H98 spec has the inside half of story nailed down: Char is
Unicode, and Handles are text I/O that deal in [Char].  The outside
half of the story is the binary encoding of the [Char], which was
unspecified, and left to the implementation.

The implementation dependence allows GHC to create a
"setHandleEncoding" (or "withHandleEncoding") operation. [I do not
want to get bogged down in syntax]. This is something that, like all
details of encoding, is not the H98 spec.  In addition, there may be
some command line parameters to GHC.

Imagine that GHC 6.10.1 is released with encoding support.  If the
user runs ghc with no options or setup changes, then the new defaults
will apply.

The goal is that more complicated situations are reflected in
more complicated "ghc" or "main" invocations.  The least complicated
usage defaults to being identical cross-platform and regardless of
terminal I/O.

I think the best default would be UTF8 for all text handles.  This can
be easily documented, it can be easily understood, and will produce
the fewest suprises.

I imagine that in this proposed ghc-6.10.1:

* GHC's handles now carry an encoding parameter.

** There is a way to create a new handle from an old one that differs
   only in the encoding.
   (perhaps 'hNew <- cloneHandleWithEncoding "Latin1" hOld')

* GHC's has mutable global variables that control the encoding
   parameter of new handles.

** Unless influenced by command-line switches, these default to UTF8.

** There are IO commands to read & write these global variables.

** There are different defaults for new terminal I/O handles and other
    I/O handles, so they could be given different encodings.

If you want to use the "local" or native encoding, then compile with
"ghc --local-encoding" or start the program with something like
"main = handlesUseLocalEncoding >> do ..."

If you want to use "Latin1" then use either
"ghc --encoding Latin1" or
"main = handlesUseEncoding "Latin1" >> do ..."

To compile older programs one could use "ghc --compat 6.8" or "ghc
--encoding Latin1" to access the old defaults.

One might even add "+RTS --encoding Latin1 -RTS" runtime options to
set the initial encoding.  Though I think this is unlikely to be
useful in practice.

I think that having terminal I/O be special is great for command line
applications.  But the nice behavior of such applications like "ls"
must not determine what the GHC runtime does by default.

More information about the Libraries mailing list