patch applied (cabal): First pass at parsing .cabal files as UTF8

Don Stewart dons at galois.com
Mon Feb 25 16:26:52 EST 2008


duncan.coutts:
> 
> On Mon, 2008-02-25 at 11:53 +0000, Ross Paterson wrote:
> > On Sun, Feb 24, 2008 at 05:46:35PM +0000, Duncan Coutts wrote:
> > > I've added readTextFile and writeTextFile to the Utils module and
> > > checked all other uses of readFile and writeFile.
> > > 
> > > I've also switched the rawSystemStdout to assume UTF8 output format.
> > 
> > The read and write functions ought to open their files in binary mode.
> > It's just wrong to read Unicode characters (which is what a plain text
> > Handle promises you) and treat them as bytes. There's a similar problem
> > with using toUTF on stdout and stderr.  Haskell 98 is very clear that
> > putChar on those Handles takes Unicode characters, though it does not
> > specify how these are encoded in the environment.  GHC has historically
> > assumed an ISO-8859-1 encoding, truncating larger characters, but other
> > implementations could map them to the current locale (as Hugs does).
> > Perhaps a future GHC will map them to UTF.  I think you should just
> > hand the characters to putChar and leave their presentation to the
> > implementation, flawed though GHC's currently is.
> 
> It is a mess.
> 
> It's no use pretending that readFile returns Unicode, it just doesn't
> (except on Hugs which does it properly). GHC is not going to catch up on
> this any time soon.

Why don't we use the existing, portable UTF8 IO package?

    http://hackage.haskell.org/packages/archive/utf8-string/0.2/doc/html/System-IO-UTF8.html

-- Don



More information about the cabal-devel mailing list