[Haskell-cafe] Same compiled program behaving differently when called from ghci and shell

Manlio Perillo manlio_perillo at libero.it
Sun Nov 21 15:51:16 EST 2010


Il 21/11/2010 19:06, Bruno Damour ha scritto:
> Le 21/11/10 17:21, Manlio Perillo a écrit :
>> Il 21/11/2010 06:49, Bruno Damour ha scritto:
>>> Hello,
>>> I have a very strange (for me) problem that I manage to reduce to this :
>>> I have a small program that reads a file with 1 only character (è = e8)
>>> The program is ftest2.hs :
>>>
> [...]
>> Now, "fate" is that (Python console):
>>>>> '\xe8'.decode('cp1252').encode('cp850')
>> '\x8a'
>>>>> '\xde'.decode('cp1252').encode('cp850')
>> '\xe8'
>>
> [...]
>
> yes I kind of began to figure that IO might use an environment setting.

Did you tried to execute again the program, setting the console codepage
to 1252?

> That souns a bit weird to me (newbe) at it should impact the result of a
> program depending on where it is launched... its the same binary anyway
> ? or ?

This is only a guess, but recent versions of GHC I/O lib do a low level
encoding, when reading a file in text mode.

This is the correct way, since a Char is supposed to be an Unicode
character.

I assume that when reading a text file, the I/O lib just check the
system encoding and use it.

In your case, you have a text file, codified with codepage 1252, but
that GHC is trying to read using codepage 850, instead.

So, as in the example I posted, you have (using, again, Python syntax):
- the character u'è' - Unicode code point 0xe8
- a byte data in the file, as 0xe8; this is the result of
  u'è'.encode('cp1252')
- a Haskell Char '\xde'; this is the result of
  '\xe8'.decode('cp850')


There are 3 solutions:
1) open the file in binary mode
2) set the console codepage to 1252.

   I do this by changing the "Command Prompt" shortcut destination to:
     `%SystemRoot%\system32\cmd.exe /k chcp 1252`
3) explicitly set the encoding when reading the file in text mode

   Unfortunately this is now a rather low level and GHC specific
   operation:

http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/GHC-IO-Handle.html

   The Python API is, by the way:
   http://docs.python.org/dev/py3k/library/functions.html#open

   GHC API is quite different (if I understand it correctly).
   You can change the encoding only after the file has been opened, and
   you can change it again after having read some data (in Python,
   instead, the file encoding is immutable)



Regards   Manlio


More information about the Haskell-Cafe mailing list