RFC: Unicode primes and super/subscript characters in GHC
Mikhail Vorozhtsov
mikhail.vorozhtsov at gmail.com
Sat Jun 14 14:48:16 UTC 2014
Hello lists,
As some of you may know, GHC's support for Unicode characters in lexemes
is rather crude and hence prone to inconsistencies in their handling
versus the ASCII counterparts. For example, APOSTROPHE is treated
differently from PRIME:
λ> data a +' b = Plus a b
<interactive>:3:9:
Unexpected type ‘b’
In the data declaration for ‘+’
A data declaration should have form
data + a b c = ...
λ> data a +′ b = Plus a b
λ> let a' = 1
λ> let a′ = 1
<interactive>:10:8: parse error on input ‘=’
Also some rather bizarre looking things are accepted:
λ> let ᵤxᵤy = 1
In the spirit of improving things little by little I would like to propose:
1. Handle single/double/triple/quadruple Unicode PRIMEs the same way as
APOSTROPHE, meaning the following alterations to the lexer:
primes -> U+2032 | U+2033 | U+2034 | U+2057
symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes)
graphic -> small | large | symbol | digit | special | " | ' | primes
varid -> (small { small | large | digit | ' | primes }) (EXCEPT reservedid)
conid -> large { small | large | digit | ' | primes }
2. Introduce a new lexer nonterminal "subsup" that would include the
Unicode sub/superscript[1] versions of numbers, "-", "+", "=", "(", ")",
Latin and Greek letters. And allow these characters to be used in names
and operators:
symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes |
subsup )
digit -> ascDigit | uniDigit (EXCEPT subsup)
small -> ascSmall | uniSmall (EXCEPT subsup) | _
large -> ascLarge | uniLarge (EXCEPT subsup)
graphic -> small | large | symbol | digit | special | " | ' | primes |
subsup
varid -> (small { small | large | digit | ' | primes | subsup }) (EXCEPT
reservedid)
conid -> large { small | large | digit | ' | primes | subsup }
varsym -> (symbol (EXCEPT :) {symbol | subsup}) (EXCEPT reservedop | dashes)
consym -> (: {symbol | subsup}) (EXCEPT reservedop)
If this proposal is received favorably, I'll write a patch for GHC based
on my previous stab at the problem[2].
P.S. I'm CC-ing Cafe for extra attention, but please keep the discussion
to the GHC users list.
[1] https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts
[2] https://ghc.haskell.org/trac/ghc/ticket/5108
More information about the Glasgow-haskell-users
mailing list