[Haskell-cafe] invalid character encoding

Wed Mar 16 08:09:02 EST 2005

On Wed, 2005-03-16 at 11:55 +0000, Ross Paterson wrote:
> On Wed, Mar 16, 2005 at 03:54:19AM +0000, Ian Lynagh wrote:
> > Do you have a list of functions which behave differently in the new
> > release to how they did in the previous release?
> > (I'm not interested in changes that will affect only whether something
> > compiles, not how it behaves given it compiles both before and after).
> 
> I got lost in the negatives here.  It affects all Haskell 98 primitives
> that do character I/O, or that exchange C strings with the C library.
> 
> It doesn't affect functions added by the hierarchical libraries, i.e.
> those functions are safe only with the ASCII subset.  (There is a vague
> plan to make Foreign.C.String conform to the FFI spec, which mandates
> locale-based encoding, and thus would change all those, but it's still
> up in the air.)

Hmm. I'm not convinced that automatically converting to the current
locale is the ideal behaviour (it'd certianly break all my programs!).
Certainly a function for converting into the encoding of the current
locale would be useful for may users but it's important to be able to
know the encoding with certainty. For example some libraries (eg Gtk+)
take all strings in UTF-8 irrespective of the current locale (it does
locale-dependent conversions on IO etc but the internal representation
is always UTF8). We do the conversion to UTF8 on the Haskell side and so
produce a byte string which we marshal using the FFI CString functions. 

If the implementations get fixed to conform to the FFI spec, I suppose
we could roll our own version of withCString that marshals [Word8] ->
char*.

Duncan