[Haskell-cafe] Same compiled program behaving differently when
called from ghci and shell
Manlio Perillo
manlio_perillo at libero.it
Sun Nov 21 15:51:16 EST 2010
Il 21/11/2010 19:06, Bruno Damour ha scritto:
> Le 21/11/10 17:21, Manlio Perillo a écrit :
>> Il 21/11/2010 06:49, Bruno Damour ha scritto:
>>> Hello,
>>> I have a very strange (for me) problem that I manage to reduce to this :
>>> I have a small program that reads a file with 1 only character (è = e8)
>>> The program is ftest2.hs :
>>>
> [...]
>> Now, "fate" is that (Python console):
>>>>> '\xe8'.decode('cp1252').encode('cp850')
>> '\x8a'
>>>>> '\xde'.decode('cp1252').encode('cp850')
>> '\xe8'
>>
> [...]
>
> yes I kind of began to figure that IO might use an environment setting.
Did you tried to execute again the program, setting the console codepage
to 1252?
> That souns a bit weird to me (newbe) at it should impact the result of a
> program depending on where it is launched... its the same binary anyway
> ? or ?
This is only a guess, but recent versions of GHC I/O lib do a low level
encoding, when reading a file in text mode.
This is the correct way, since a Char is supposed to be an Unicode
character.
I assume that when reading a text file, the I/O lib just check the
system encoding and use it.
In your case, you have a text file, codified with codepage 1252, but
that GHC is trying to read using codepage 850, instead.
So, as in the example I posted, you have (using, again, Python syntax):
- the character u'è' - Unicode code point 0xe8
- a byte data in the file, as 0xe8; this is the result of
u'è'.encode('cp1252')
- a Haskell Char '\xde'; this is the result of
'\xe8'.decode('cp850')
There are 3 solutions:
1) open the file in binary mode
2) set the console codepage to 1252.
I do this by changing the "Command Prompt" shortcut destination to:
`%SystemRoot%\system32\cmd.exe /k chcp 1252`
3) explicitly set the encoding when reading the file in text mode
Unfortunately this is now a rather low level and GHC specific
operation:
http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/GHC-IO-Handle.html
The Python API is, by the way:
http://docs.python.org/dev/py3k/library/functions.html#open
GHC API is quite different (if I understand it correctly).
You can change the encoding only after the file has been opened, and
you can change it again after having read some data (in Python,
instead, the file encoding is immutable)
Regards Manlio
More information about the Haskell-Cafe
mailing list