[GHC] #13486: inconsistency in handling the BOM Byte-order-mark in reading and putStrLn

GHC ghc-devs at haskell.org
Sun Mar 26 10:04:06 UTC 2017


#13486: inconsistency in handling the BOM Byte-order-mark in reading and putStrLn
-------------------------------------+-------------------------------------
        Reporter:  andrewufrank      |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.0.2
      Resolution:                    |             Keywords:
Operating System:  Linux             |         Architecture:
 Type of failure:  Poor/confusing    |  Unknown/Multiple
  error message                      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Description changed by andrewufrank:

Old description:

> this is a very annoying issue and has been discussed already (e.g. #1744)
> and https://mail.haskell.org/pipermail/haskell-
> cafe/2011-January/088021.html.
>
> i think it is ok that the BOM character is not automatically removed when
> reading a file, but it is INCONSISTENT then to not show the BOM character
> when printing the file content.
>
> a minimal test:
>
>     v <- readFile "fileWithBOM"
>     putStrLn "the file content"
>     putStrLn v
>     putStrLn (show v)
>
>     return ()
>
> the first line does not indicate that there is a BOM character in the
> input and not removed from the result - only the second putStrLn (with
> the incorrect show on the result string) demonstrates the presence of the
> BOM character:
>
> "\65279\r\n.sprache English\r\n\.....
>
> consistency here is important to warn the programmer early on (after
> reading and checking file content) because other tools (e.g. parsec) see
> the BOM character and fail.
>
> i recommend that the BOM character is read but shown in printStrLn - i
> guess this is preferably over automatic (silent) removal. reading in and
> not showing, however, leads to misguided searches for strange errors
> caused by the BOM.

New description:

 this is a very annoying issue and has been discussed already (e.g. #1744)
 and https://mail.haskell.org/pipermail/haskell-
 cafe/2011-January/088021.html.

 i think it is ok that the BOM character is not automatically removed when
 reading a file, but it is INCONSISTENT then to not show the BOM character
 when printing the file content.

 a minimal test:


 {{{
   v <- readFile "fileWithBOM"
     putStrLn "the file content"
     putStrLn v
     putStrLn (show v)

     return ()
 }}}


 the first line does not indicate that there is a BOM character in the
 input and not removed from the result - only the second putStrLn (with the
 incorrect show on the result string) demonstrates the presence of the BOM
 character:

 "\65279\r\n.sprache English\r\n\.....

 consistency here is important to warn the programmer early on (after
 reading and checking file content) because other tools (e.g. parsec) see
 the BOM character and fail.

 i recommend that the BOM character is read but shown in printStrLn - i
 guess this is preferably over automatic (silent) removal. reading in and
 not showing, however, leads to misguided searches for strange errors
 caused by the BOM.

--

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/13486#comment:1>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list