[GHC] #8524: GHC is inconsistent with the Haskell Report on which Unicode characters are allowed in string and character literals

GHC ghc-devs at haskell.org
Fri Sep 11 15:12:47 UTC 2015


#8524: GHC is inconsistent with the Haskell Report on which Unicode characters are
allowed in string and character literals
-------------------------------------+-------------------------------------
        Reporter:  oerjan            |                   Owner:
                                     |  RyanGlScott
            Type:  bug               |                  Status:  new
        Priority:  low               |               Milestone:
       Component:  Compiler          |                 Version:  7.6.3
  (Parser)                           |
      Resolution:                    |                Keywords:  newcomer
Operating System:  Unknown/Multiple  |            Architecture:
 Type of failure:  GHC rejects       |  Unknown/Multiple
  valid program                      |               Test Case:
      Blocked By:                    |                Blocking:
 Related Tickets:                    |  Differential Revisions:  Phab:D1235
-------------------------------------+-------------------------------------
Changes (by thomie):

 * cc: hvr (added)


Comment:

 @RyanGlScott: sorry about that, I should not have put the newcomer keyword
 on this ticket prematurely.

 Some code:
 * Whitespace characters that the report excludes from strings:
 {{{
 > delete '\SP' $ filter isSpace ['\0'..]
 "\t\n\v\f\r\160\5760\8192\8193\8194\8195\8196\8197\8198\8199\8200\8201\8202\8239\8287\12288"
 }}}
 * Whitespace characters that GHC excludes from strings:
 {{{
 > filter (\c -> generalCategory c == Control && isSpace c) ['\0'..]
 "\t\n\v\f\r"
 }}}

 * `generalCategories` that the report and GHC also exclude from strings:
 {{{
 > nub $ map generalCategory $ filter (not . isPrint) ['\0'..]
 [Control,Format,NotAssigned,LineSeparator,ParagraphSeparator,Surrogate,PrivateUse]
 }}}

 If we're going to be "as inclusive as possible", why not allow all of
 these? Are there any downsides to this? Perhaps under a new flag
 `FullUnicodeStrings`, enabled by default and disabled in Haskell98 and
 Haskell2010 mode.

 I'm also ok with just mentioning the current deviation from the report in
 https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/bugs-and-
 infelicities.html.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8524#comment:6>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list