[GHC] #8524: GHC is inconsistent with the Haskell Report on which Unicode characters are allowed in string and character literals

GHC ghc-devs at haskell.org
Tue Nov 12 03:39:03 UTC 2013


#8524: GHC is inconsistent with the Haskell Report on which Unicode characters are
allowed in string and character literals
----------------------------+----------------------------------------------
       Reporter:  oerjan    |             Owner:
           Type:  bug       |            Status:  new
       Priority:  normal    |         Milestone:
      Component:  Compiler  |           Version:  7.6.3
       Keywords:            |  Operating System:  Unknown/Multiple
   Architecture:            |   Type of failure:  GHC rejects valid program
  Unknown/Multiple          |         Test Case:
     Difficulty:  Unknown   |          Blocking:
     Blocked By:            |
Related Tickets:            |
----------------------------+----------------------------------------------
 GHC is inconsistent with the Haskell Report on which Unicode characters
 are allowed in string and character literals.  (And I don't like either
 option, why leave out any characters in strings unnecessarily?)

 Examples from ghci 7.6.3 (also tested in lambdabot on irc):
 {{{
 Prelude> "​" -- Unicode char \8203, Format class.

 <interactive>:10:2:
     lexical error in string/character literal at character '\8203'
 Prelude> " " -- Unicode char \8202, Space class.
 "\8202"
 Prelude> "t\ \est" -- Unicode char \8202 in a string gap.

 <interactive>:14:4:
     lexical error in string/character literal at character '\8202'
 }}}

 My reading of
 http://www.haskell.org/onlinereport/haskell2010/haskellch2.html
 (section 2.2 and 2.6):
 * The report BNF token "graphic", which can be used in literals, includes
 indirectly many Unicode classes, but uniWhite is not one of them.  Thus
 the only Unicode whitespace allowed to represent itself in literals is
 ASCII space.
 * Unicode formatting characters are not mentioned in the BNF that I can
 see, so are not allowed in literals.
 * String gaps are made out of the report BNF token whitespace, which
 ''does'' include uniWhite.

 Who wants what:
 ||                              ||= GHC =||= Report =||= Me           =||
 || Format in string             ||  No   ||  No      ||  Yes           ||
 || Space/uniWhite in string     ||  Yes  ||  No      ||  Yes           ||
 || Space/uniWhite in string gap ||  No   ||  Yes     ||  Dunno         ||

 In short, GHC's behavior is buggy and/or annoying in two opposite ways:
 * It leaves out some Unicode characters as allowable in strings and
 character literals, presumably because the report says so.
 * It allows some characters the report says it ''shouldn't'', and refuses
 some characters the report says it ''should''.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8524>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list