[Haskell-cafe] invalid character encoding

Ross Paterson ross at soi.city.ac.uk
Mon Mar 21 05:35:19 EST 2005


On Sun, Mar 20, 2005 at 04:34:12AM +0000, Ian Lynagh wrote:
> On Sun, Mar 20, 2005 at 01:33:44AM +0000, ross at soi.city.ac.uk wrote:
> > On Sat, Mar 19, 2005 at 07:14:25PM +0000, Ian Lynagh wrote:
> > > Is there anything LC_CTYPE can be set to that will act like C/POSIX but
> > > accept 8-bit bytes as chars too?
> > 
> > en_GB.iso88591 (or indeed any .iso88591 locale) will match the old
> > behaviour (and the GHC behaviour).
> 
> This works for me with en_GB.iso88591 (or en_GB), but not en_US.iso88591
> (or en_US). My /etc/locale.gen contains:
> 
> en_GB ISO-8859-1
> en_GB.ISO-8859-15 ISO-8859-15
> en_GB.UTF-8 UTF-8
> 
> So is there anything that /always/ works?

Since systems may have no locale other than C/POSIX, no.

> > Yes, I don't see how to avoid this when using mbtowc() to do the
> > conversion: it makes no distinction between a bad byte sequence and an
> > incomplete one.
> 
> Perhaps you could use mbrtowc instead?

Indeed.  Thanks for pointing it out.


More information about the Haskell-Cafe mailing list