[commit: ghc] master: Use latin1 code page on Windows for response files. (f63c8ef)

git at git.haskell.org git at git.haskell.org
Sun Jan 15 13:00:27 UTC 2017


Repository : ssh://git@git.haskell.org/ghc

On branch  : master
Link       : http://ghc.haskell.org/trac/ghc/changeset/f63c8ef33ec9666688163abe4ccf2d6c0428a7e7/ghc

>---------------------------------------------------------------

commit f63c8ef33ec9666688163abe4ccf2d6c0428a7e7
Author: Tamar Christina <tamar at zhox.com>
Date:   Sun Jan 15 12:52:14 2017 +0000

    Use latin1 code page on Windows for response files.
    
    Summary:
    D2917 added a change that will make paths on Windows response files
    use DOS 8.3 shortnames to get around the fact that `libiberty` assumes
    a one byte per character encoding.
    
    This is actually not the problem, the actual problem is that GCC on
    Windows doesn't seem to support Unicode at all.
    
    This comes down to how unicode characters are handled between POSIX and
    Windows. On Windows, Unicode is only supported using a multibyte character
    encoding such as `wchar_t` with calls to the appropriate wide version of
    APIs (name post-fixed with the `W` character). On Posix I believe the standard
    `char` is used and based on the value it is decoded to the correct string.
    
    GCC doesn't seem to make calls to the Wide version of the Windows APIs,
    and even if it did, it's character representation would be wrong. So I
    believe GCC just does not support utf-8 paths on Windows.
    
    So the hack in D2917 is the only way to get Unicode support. The problem is
    however that `GCC` is not the only tool with this issue and we don't use response
    files for every invocation of the tools. Most of the tools probably don't support it.
    
    Furthermore, DOS 8.1 shortnames only exist when the path or file physically exists on
    disk. We pass lots of paths to GCC that don't exist yet, like the output file.
    D2917 works around this by splitting the path from the file and try shortening that.
    
    But this may not always work.
    
    In short, even if we do Unicode correctly (which we don't atm, the GCC driver we build
    uses `char` instead of `wchar_t`) we won't be able to compile using unicode paths that
    need to be passed to `GCC`. So not sure about the point of D2917.
    
    What we can do is support the most common non-ascii characters by writing the response
    files out using the `latin1` code page.
    
    Test Plan: compile + make test TEST=T12971
    
    Reviewers: austin, bgamari, erikd
    
    Reviewed By: bgamari
    
    Subscribers: thomie, #ghc_windows_task_force
    
    Differential Revision: https://phabricator.haskell.org/D2942
    
    GHC Trac Issues: #12971


>---------------------------------------------------------------

f63c8ef33ec9666688163abe4ccf2d6c0428a7e7
 compiler/main/SysTools.hs       | 4 ++++
 docs/users_guide/bugs.rst       | 3 +++
 testsuite/tests/driver/Makefile | 2 +-
 testsuite/tests/driver/all.T    | 2 +-
 4 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/compiler/main/SysTools.hs b/compiler/main/SysTools.hs
index 38d866e..ea3c461 100644
--- a/compiler/main/SysTools.hs
+++ b/compiler/main/SysTools.hs
@@ -1240,7 +1240,11 @@ runSomethingResponseFile dflags filter_fn phase_name pgm args mb_env =
     getResponseFile args = do
       fp <- newTempName dflags "rsp"
       withFile fp WriteMode $ \h -> do
+#if defined(mingw32_HOST_OS)
+          hSetEncoding h latin1
+#else
           hSetEncoding h utf8
+#endif
           hPutStr h $ unlines $ map escape args
       return fp
 
diff --git a/docs/users_guide/bugs.rst b/docs/users_guide/bugs.rst
index 875820b..c1527f1 100644
--- a/docs/users_guide/bugs.rst
+++ b/docs/users_guide/bugs.rst
@@ -540,6 +540,9 @@ Bugs in GHC
    in the compiler's internal representation and can be unified producing
    unexpected results. See :ghc-ticket:`11715` for one example.
 
+-  Because of a toolchain limitation we are unable to support full Unicode paths
+   on WIndows. On Windows we support up to Latin-1. See :ghc-ticket:`12971` for more.
+
 .. _bugs-ghci:
 
 Bugs in GHCi (the interactive GHC)
diff --git a/testsuite/tests/driver/Makefile b/testsuite/tests/driver/Makefile
index d3f78ef..ffb924a 100644
--- a/testsuite/tests/driver/Makefile
+++ b/testsuite/tests/driver/Makefile
@@ -645,4 +645,4 @@ T12955:
 .PHONY: T12971
 T12971:
 	mkdir -p ä
-	! TMP=ä "$(TEST_HC)" $(TEST_HC_OPTS) --make T12971
+	TMP=ä "$(TEST_HC)" $(TEST_HC_OPTS) --make T12971
diff --git a/testsuite/tests/driver/all.T b/testsuite/tests/driver/all.T
index d327ac5..380f288 100644
--- a/testsuite/tests/driver/all.T
+++ b/testsuite/tests/driver/all.T
@@ -502,4 +502,4 @@ test('T12752pass', normal, compile, ['-DSHOULD_PASS=1 -Wcpp-undef'])
 
 test('T12955', normal, run_command, ['$MAKE -s --no-print-directory T12955'])
 
-test('T12971', expect_broken(12971), run_command, ['$MAKE -s --no-print-directory T12971'])
\ No newline at end of file
+test('T12971', ignore_stdout, run_command, ['$MAKE -s --no-print-directory T12971'])



More information about the ghc-commits mailing list