[GHC] #10762: On Windows, out-of-codepage characters can cause GHC build to fail
GHC
ghc-devs at haskell.org
Sun Aug 9 10:58:01 UTC 2015
#10762: On Windows, out-of-codepage characters can cause GHC build to fail
-----------------------------------------+---------------------------------
Reporter: snoyberg | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.10.2
Keywords: | Operating System: Windows
Architecture: x86_64 (amd64) | Type of failure: None/Unknown
Test Case: | Blocked By:
Blocking: | Related Tickets:
Differential Revisions: |
-----------------------------------------+---------------------------------
You can see where this hit us recently on stack with issues
[https://github.com/commercialhaskell/stack/issues/738 738] and
[https://github.com/commercialhaskell/stack/issues/734 734]. To
demonstrate, I'm attaching a UTF-8 encoded Haskell program with some
Hebrew characters, and some warnings. The contents of that file are:
{{{#!hs
module Main
( main
, שלום
) where
main :: IO ()
main = putStrLn שלום
שלום = "shalom"
}}}
If I first set my codepage to 65001 (UTF-8), everything works as expected:
{{{
C:\Users\Michael\Desktop>chcp 65001
Active code page: 65001
C:\Users\Michael\Desktop>ghc -fforce-recomp -Wall -ddump-hi -ddump-to-file
shalom.hs
[1 of 1] Compiling Main ( shalom.hs, shalom.o )
shalom.hs:9:1: Warning:
Top-level binding with no type signature: שלום :: [Char]
Linking shalom.exe ...
}}}
However, if I set my codepage to 437 (US), both the warnings sent to the
console, and the .hi dump file, cause GHC to exit prematurely:
{{{
C:\Users\Michael\Desktop>chcp 437
Active code page: 437
C:\Users\Michael\Desktop>ghc -fforce-recomp -Wall shalom.hs
[1 of 1] Compiling Main ( shalom.hs, shalom.o )
shalom.hs:9:1: Warning:
Top-level binding with no type signature: <stderr>: commitBuffer:
invalid argument (invalid character)
}}}
{{{
C:\Users\Michael\Desktop>chcp 437
Active code page: 437
C:\Users\Michael\Desktop>ghc -fforce-recomp -ddump-hi -ddump-to-file
shalom.hs
[1 of 1] Compiling Main ( shalom.hs, shalom.o )
shalom.dump-hi: commitBuffer: invalid argument (invalid character)
}}}
At the very least, I would argue that -ddump-to-file should always dump to
the output files as UTF-8, as this is the most useful for tooling. Beyond
that, there are a few options here:
* Have all output- including to the console- go out as UTF-8. This may not
play terribly nicely with consoles without setting the output codepage.
* Provide a command line option or environment variable to specify "output
as UTF-8."
* More radical: change the default way that all Handles work so that UTF-8
is the default, instead of paying attention to code pages and environment
variables. Honestly, this is my preference, but it's a bigger discussion
than this one bug.
The workaround we've implemented in stack for now is setting the codepage
to 65001 for the console while running stack. This is not ideal, since
this is essentially a global setting for the entire console.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10762>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list