[Haskell-cafe] Re: getting crazy with character encoding

Ketil Malde ketil at ii.uib.no
Thu Sep 13 06:06:15 EDT 2007


On Wed, 2007-09-12 at 17:40 -0700, Stefan O'Rear wrote:
> On Thu, Sep 13, 2007 at 12:23:33AM +0000, Aaron Denney wrote:
> > Unfortunately, at this point it is a well entrenched bug, and changing
> > the behaviour will undoubtedly break programs.
> ...
> > There should be another system for getting the exact bytes in and 
> > out (as Word8s, say, rather than Chars), 

> I'm pretty sure Hugs does the right thing.

..which makes me wonder what the right thing actually is?

Since IO on Unix (or at least on Linux) consists of bytes, I don't see
how a Unicode-only interface is ever going to do the 'right thing' for
all people.

One possible solution might be to have IO functions deal with [Word8]
instead of [Char]. If string and character constants were polymorphic,
Char and String made aliases for byte-based types, and a new type
introduced for Unicode characters, it might even be possible to fix
without breaking absolutely all legacy code.

But even this would probably only fix the Unix side of things.

-k



More information about the Haskell-Cafe mailing list