[Haskell-cafe] invalid character encoding

Ross Paterson ross at soi.city.ac.uk
Wed Mar 16 06:55:18 EST 2005


On Wed, Mar 16, 2005 at 03:54:19AM +0000, Ian Lynagh wrote:
> Do you have a list of functions which behave differently in the new
> release to how they did in the previous release?
> (I'm not interested in changes that will affect only whether something
> compiles, not how it behaves given it compiles both before and after).

I got lost in the negatives here.  It affects all Haskell 98 primitives
that do character I/O, or that exchange C strings with the C library.

It doesn't affect functions added by the hierarchical libraries, i.e.
those functions are safe only with the ASCII subset.  (There is a vague
plan to make Foreign.C.String conform to the FFI spec, which mandates
locale-based encoding, and thus would change all those, but it's still
up in the air.)

> Finally, the hugs behaviour seems a little odd to me. The below shows 4
> cases where iconv complains when asked to convert utf8 to utf8, but hugs
> only gives an error in one of them. In the others it just truncates the
> input. Is this really correct? It also seems to behave the same for me
> regardless of whether I export LC_CTYPE to en_GB.UTF-8 or C.

It's a bug: an unrecognized encoding at the end of the input was being
ignored instead of triggering the exception.  Now fixed in CVS
(rev. 1.14 of src/char.c if anyone's backporting).  It was an accident
of this example that the behaviour in all locales was the same.


More information about the Haskell-Cafe mailing list