UTF-8 decoding error

Simon Marlow simonmarhaskell at gmail.com
Fri Feb 3 10:29:20 EST 2006

Christian Maeder wrote:

>>  So - do you need Latin-1, or could you use UTF-8?
> I'm not amused to change the encoding of many haskell source files 
> (particular of those that are not mine).

Fair enough, but there will have to be some way to specify the encoding, 
either via a pragma, command-line option, or the locale.  I'm really not 
sure what is the best choice here.  Perhaps all three, with locale being 
the default, overriden by pragmas and command-line options.

The easiest way for us to handle encodings other than UTF-8 is for it to 
be a new preprocessing step, running 'iconv'.  (but what do we do on 
Windows?  bundle iconv?  ew.)

John - what do you plan to do here?

> These files can then no longer be compiled by earlier ghcs (though I 
> don't understand, how ghc-6.4.1 recognises the lexical error).
> I'm tempted to replace "ä" bei "\228" in literals. What does haddock do 
> with utf-8 in comments? Will DrIFT -- using read- and writeFile -- still 
> work correctly?

Haddock needs to be updated too.  But if GHC implements recoding via 
iconv, you can use GHC as a preprocesor to recode back to Latin-1; since 
you have to use GHC as a preprocessor with Haddock anyway, this 
shouldn't be much harder (of course, if you use non-Latin-1 characters 
this fails).  Eventually, when Haddock runs on top of GHC, the issue 
will go away :)

I don't know about DrIFT.


