[GHC] #12971: Paths are encoded incorrectly when invoking GCC

GHC ghc-devs at haskell.org
Sun Jan 15 13:00:21 UTC 2017


#12971: Paths are encoded incorrectly when invoking GCC
-------------------------------------+-------------------------------------
        Reporter:  erikprantare      |                Owner:  Phyx
            Type:  bug               |               Status:  patch
        Priority:  highest           |            Milestone:  8.2.1
       Component:  Compiler          |              Version:  8.0.1
      Resolution:                    |             Keywords:
Operating System:  Windows           |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):  Phab:D2917
       Wiki Page:                    |  Phab:/D2942
-------------------------------------+-------------------------------------

Comment (by Tamar Christina <tamar@…>):

 In [changeset:"f63c8ef33ec9666688163abe4ccf2d6c0428a7e7/ghc" f63c8ef/ghc]:
 {{{
 #!CommitTicketReference repository="ghc"
 revision="f63c8ef33ec9666688163abe4ccf2d6c0428a7e7"
 Use latin1 code page on Windows for response files.

 Summary:
 D2917 added a change that will make paths on Windows response files
 use DOS 8.3 shortnames to get around the fact that `libiberty` assumes
 a one byte per character encoding.

 This is actually not the problem, the actual problem is that GCC on
 Windows doesn't seem to support Unicode at all.

 This comes down to how unicode characters are handled between POSIX and
 Windows. On Windows, Unicode is only supported using a multibyte character
 encoding such as `wchar_t` with calls to the appropriate wide version of
 APIs (name post-fixed with the `W` character). On Posix I believe the
 standard
 `char` is used and based on the value it is decoded to the correct string.

 GCC doesn't seem to make calls to the Wide version of the Windows APIs,
 and even if it did, it's character representation would be wrong. So I
 believe GCC just does not support utf-8 paths on Windows.

 So the hack in D2917 is the only way to get Unicode support. The problem
 is
 however that `GCC` is not the only tool with this issue and we don't use
 response
 files for every invocation of the tools. Most of the tools probably don't
 support it.

 Furthermore, DOS 8.1 shortnames only exist when the path or file
 physically exists on
 disk. We pass lots of paths to GCC that don't exist yet, like the output
 file.
 D2917 works around this by splitting the path from the file and try
 shortening that.

 But this may not always work.

 In short, even if we do Unicode correctly (which we don't atm, the GCC
 driver we build
 uses `char` instead of `wchar_t`) we won't be able to compile using
 unicode paths that
 need to be passed to `GCC`. So not sure about the point of D2917.

 What we can do is support the most common non-ascii characters by writing
 the response
 files out using the `latin1` code page.

 Test Plan: compile + make test TEST=T12971

 Reviewers: austin, bgamari, erikd

 Reviewed By: bgamari

 Subscribers: thomie, #ghc_windows_task_force

 Differential Revision: https://phabricator.haskell.org/D2942

 GHC Trac Issues: #12971
 }}}

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/12971#comment:12>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list