RFC: Unicode primes and super/subscript characters in GHC

John Meacham john at repetae.net
Sat Jun 14 23:58:14 UTC 2014


I have this feature in jhc, where I have a 'trailing' character class
that can appear at the end of both symbols and ids.

currently it consists of

 $trailing = [₀₁₂₃₄₅₆₇₈₉⁰¹²³⁴⁵⁶⁷⁸⁹₍₎⁽⁾₊₋]

 John

On Sat, Jun 14, 2014 at 7:48 AM, Mikhail Vorozhtsov
<mikhail.vorozhtsov at gmail.com> wrote:
> Hello lists,
>
> As some of you may know, GHC's support for Unicode characters in lexemes is
> rather crude and hence prone to inconsistencies in their handling versus the
> ASCII counterparts. For example, APOSTROPHE is treated differently from
> PRIME:
>
> λ> data a +' b = Plus a b
> <interactive>:3:9:
>     Unexpected type ‘b’
>     In the data declaration for ‘+’
>     A data declaration should have form
>       data + a b c = ...
> λ> data a +′ b = Plus a b
>
> λ> let a' = 1
> λ> let a′ = 1
> <interactive>:10:8: parse error on input ‘=’
>
> Also some rather bizarre looking things are accepted:
>
> λ> let ᵤxᵤy = 1
>
> In the spirit of improving things little by little I would like to propose:
>
> 1. Handle single/double/triple/quadruple Unicode PRIMEs the same way as
> APOSTROPHE, meaning the following alterations to the lexer:
>
> primes -> U+2032 | U+2033 | U+2034 | U+2057
> symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes)
> graphic -> small | large | symbol | digit | special | " | ' | primes
> varid -> (small { small | large | digit | ' | primes }) (EXCEPT reservedid)
> conid -> large { small | large | digit | ' | primes }
>
> 2. Introduce a new lexer nonterminal "subsup" that would include the Unicode
> sub/superscript[1] versions of numbers, "-", "+", "=", "(", ")", Latin and
> Greek letters. And allow these characters to be used in names and operators:
>
> symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes |
> subsup )
> digit -> ascDigit | uniDigit (EXCEPT subsup)
> small -> ascSmall | uniSmall (EXCEPT subsup) | _
> large -> ascLarge | uniLarge (EXCEPT subsup)
> graphic -> small | large | symbol | digit | special | " | ' | primes |
> subsup
> varid -> (small { small | large | digit | ' | primes | subsup }) (EXCEPT
> reservedid)
> conid -> large { small | large | digit | ' | primes | subsup }
> varsym -> (symbol (EXCEPT :) {symbol | subsup}) (EXCEPT reservedop | dashes)
> consym -> (: {symbol | subsup}) (EXCEPT reservedop)
>
> If this proposal is received favorably, I'll write a patch for GHC based on
> my previous stab at the problem[2].
>
> P.S. I'm CC-ing Cafe for extra attention, but please keep the discussion to
> the GHC users list.
>
> [1] https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts
> [2] https://ghc.haskell.org/trac/ghc/ticket/5108
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



-- 
John Meacham - http://notanumber.net/


More information about the Glasgow-haskell-users mailing list