[GHC] #10583: Chaos in Lexeme.hs

Sat Jun 27 14:28:01 UTC 2015

#10583: Chaos in Lexeme.hs
-------------------------------------+-------------------------------------
              Reporter:  goldfire    |             Owner:  goldfire
                  Type:  bug         |            Status:  new
              Priority:  normal      |         Milestone:
             Component:  Compiler    |           Version:  7.10.1
              Keywords:              |  Operating System:  Unknown/Multiple
          Architecture:              |   Type of failure:  None/Unknown
  Unknown/Multiple                   |        Blocked By:
             Test Case:              |   Related Tickets:
              Blocking:              |
Differential Revisions:              |
-------------------------------------+-------------------------------------
 I've been looking at the `Lexeme` module (in `basicTypes`), where -- as
 far as I can tell -- utter chaos reigns. (Full disclosure: I wrote this
 module some time ago, inheriting its code from various places. But I
 clearly did a poor job of it.) Here is a sampling of the chaos:

 * `isLexConSym` claims to recognize type and data constructor infix
 symbols. But it requires symbols to start with a `:` (or be `->`). This is
 out-of-date with respect to the change in type constructor infix symbols
 in 7.6(?), which now do not need to start with a `:`.

 * `isVarSymChar` and `okSymChar` both purport to recognize characters that
 are valid parts of symbolic identifiers. But they have entirely different,
 unrelated implementations. These should be the '''same''' function, I
 believe.

 * The `notFollowedBySymbol` function defined in `parser/Lexer.x` overlaps
 with the functions above. But it has a '''third''' implementation,
 different than either of these other two.

 * The `isLexXXX` functions all just look at first characters, except for
 `isLexVarSym`, which looks at all characters. There is a reason for this
 -- that GHC-generated names start with a `$` but should be printed prefix
 -- but I'm not sure I buy it. Is it sufficient to look at the first two
 characters instead of the first one?

 I'm happy to make the code changes around this, but I need some advice
 from someone who has more knowledge about both Haskell's lexical structure
 and quite possibly Unicode.

 Happily, the function in `Lexeme` are not used much. But it would be
 awfully nice if they did the right thing when they are used.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10583>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler