H98 Text IO

Chris Kuklewicz haskell at list.mightyreason.com
Tue Feb 26 16:27:31 EST 2008

Reinier Lamers wrote:
> Op 26-feb-2008, om 18:42 heeft Chris Kuklewicz het volgende geschreven:
>> The goal is that more complicated situations are reflected in
>> more complicated "ghc" or "main" invocations.  The least complicated
>> usage defaults to being identical cross-platform and regardless of
>> terminal I/O.
>> I think the best default would be UTF8 for all text handles.  This can
>> be easily documented, it can be easily understood, and will produce
>> the fewest suprises.
>  > (...)
>> ** Unless influenced by command-line switches, these default to UTF8.
> I think that making the behavior of programs change, depending on 
> compiler options, will produce a lot of surprises. I think that being 
> only able to set the default encoding from within the program is a 
> better idea, because it keeps the specification of the behavior of the 
> program inside the source.
> Reinier

I thought about that.  I started with realizing that *all* code written for GHC 
is written knowing Handles only return Word8 sized Latin1 characters.

So there are several way one might proceed, some of which are:

   1) No command line switches, default to Latin1.  To get unicode you call
      a special 'turnOnUnicodeHandleGoodness' IO operation.  This is good since
      it does not break old code.

   2) No command line switches, default to something new.  This required all old
      code to be conditionally retrofit with a 'turnOffUnicode' IO operation.
      This breaks much of the code that has been written, and is thus bad.

   3) Add a "ghc --turn-on-unicode" command line switch.  This makes all old code
      build just fine, since it lacks the switch to activate the new behavior.

   4) Add a "ghc --turn-off-unicode" command line switch.  This is nice since
      it lets new code use the new Handle encoding by default, but not nice in
      requiring that old code built using ghc-6.10 use an additional option.

I also think the following are likely to be true:

   *) Cabal is already controlling the ghc compiler switches for most code.

   *) The experience of the ghc-6.6 to ghc-6.8 transition involved updating most
      cabal files to allow old code to work with new compiler.

   *) Other changes, unrelated to the unicode handles, will require most
      old packages to update their cabal files to with with ghc-6.10

   *) The additional work to updated the cabal file to add the
      "--turn-off-unicode" command line switch to ghc would be 1 word to 1 line.

So I think that making ghc default to option (4) above saves nearly zero work 
when updating old cabal files compared to option (3).  The benefit of option (3) 
compared to (4) is that no boilerplate will be needed to obtain the new handle 

And I simply prefer that the better handle encoding be the default; move the 
implementation forward.

Now if GHC does not have a command line switch then either with (2) you have to 
conditionally (perhaps with #ifdef) update almost every bit of code on hackage 
or with (1) you have all future programs burdened with boilerplate, which some 
people may forget.

So I will enjoy having switches as well as the IO commands.

More information about the Glasgow-haskell-users mailing list