[GHC] #8524: GHC is inconsistent with the Haskell Report on which Unicode characters are allowed in string and character literals
GHC
ghc-devs at haskell.org
Tue Nov 12 03:39:03 UTC 2013
#8524: GHC is inconsistent with the Haskell Report on which Unicode characters are
allowed in string and character literals
----------------------------+----------------------------------------------
Reporter: oerjan | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.6.3
Keywords: | Operating System: Unknown/Multiple
Architecture: | Type of failure: GHC rejects valid program
Unknown/Multiple | Test Case:
Difficulty: Unknown | Blocking:
Blocked By: |
Related Tickets: |
----------------------------+----------------------------------------------
GHC is inconsistent with the Haskell Report on which Unicode characters
are allowed in string and character literals. (And I don't like either
option, why leave out any characters in strings unnecessarily?)
Examples from ghci 7.6.3 (also tested in lambdabot on irc):
{{{
Prelude> "" -- Unicode char \8203, Format class.
<interactive>:10:2:
lexical error in string/character literal at character '\8203'
Prelude> " " -- Unicode char \8202, Space class.
"\8202"
Prelude> "t\ \est" -- Unicode char \8202 in a string gap.
<interactive>:14:4:
lexical error in string/character literal at character '\8202'
}}}
My reading of
http://www.haskell.org/onlinereport/haskell2010/haskellch2.html
(section 2.2 and 2.6):
* The report BNF token "graphic", which can be used in literals, includes
indirectly many Unicode classes, but uniWhite is not one of them. Thus
the only Unicode whitespace allowed to represent itself in literals is
ASCII space.
* Unicode formatting characters are not mentioned in the BNF that I can
see, so are not allowed in literals.
* String gaps are made out of the report BNF token whitespace, which
''does'' include uniWhite.
Who wants what:
|| ||= GHC =||= Report =||= Me =||
|| Format in string || No || No || Yes ||
|| Space/uniWhite in string || Yes || No || Yes ||
|| Space/uniWhite in string gap || No || Yes || Dunno ||
In short, GHC's behavior is buggy and/or annoying in two opposite ways:
* It leaves out some Unicode characters as allowable in strings and
character literals, presumably because the report says so.
* It allows some characters the report says it ''shouldn't'', and refuses
some characters the report says it ''should''.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8524>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list