[GHC] #10907: GHC fails to read file with byte-order mark when LANG=C

Wed Sep 23 07:41:48 UTC 2015

#10907: GHC fails to read file with byte-order mark when LANG=C
-------------------------------------+-------------------------------------
        Reporter:  RyanGlScott       |                   Owner:
            Type:  bug               |                  Status:  new
        Priority:  normal            |               Milestone:
       Component:  Compiler          |                 Version:  7.10.2
  (Parser)                           |
      Resolution:                    |                Keywords:
Operating System:  Linux             |            Architecture:  x86_64
 Type of failure:  GHC doesn't work  |  (amd64)
  at all                             |               Test Case:
      Blocked By:                    |                Blocking:
 Related Tickets:  #6016, #6037      |  Differential Revisions:
-------------------------------------+-------------------------------------

Comment (by nomeata):

 The problem seems to be `skipBOM` in `StringUtils.hs`, which switches to
 text mode so that `hLookAhead` is able to consume the whole BOM, instead
 of just the first character. But in text mode we are locale dependent.

 At first I thought it would make sense to stay in binary mode, but then
 `hLookAhead` returns just one bytes, which is not enough to detect a bom.
 Using `hGetChar` twice would help, but if there is no BOM, we’d have to
 rewind. Are we sure we can `hSeek` on all buffers that we need to?

 A `Word16` encoding would help. Or maybe it works well enough to force
 utf8 for this single `hLookAhead`.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10907#comment:6>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler