RFC: Unicode primes and super/subscript characters in GHC

John Meacham john at repetae.net
Fri Jun 27 19:53:16 UTC 2014


Yeah, I specifically excluded ascii prime (') from special handling in
jhc due to its already overloaded meaning in haskell. I just added the
subscript/superscript ones to the 'trailing' character class.

    John

On Wed, Jun 25, 2014 at 12:54 PM, Mikhail Vorozhtsov
<mikhail.vorozhtsov at gmail.com> wrote:
> Isn't it weird that you can't write `a₁'`? I was considering proposing
>
> varid -> (small { small | large | digit | ' | primes } { subsup | primes })
> (EXCEPT reservedid)
>
> but felt that it would be odd to allow primes in the middle of an identifier
> but not super/subscripts. I wish we could just abandon things like `a'bc'd`
> altogether...
>
>
> On 06/15/2014 03:58 AM, John Meacham wrote:
>>
>> I have this feature in jhc, where I have a 'trailing' character class
>> that can appear at the end of both symbols and ids.
>>
>> currently it consists of
>>
>>   $trailing = [₀₁₂₃₄₅₆₇₈₉⁰¹²³⁴⁵⁶⁷⁸⁹₍₎⁽⁾₊₋]
>>
>>   John
>>
>> On Sat, Jun 14, 2014 at 7:48 AM, Mikhail Vorozhtsov
>> <mikhail.vorozhtsov at gmail.com> wrote:
>>>
>>> Hello lists,
>>>
>>> As some of you may know, GHC's support for Unicode characters in lexemes
>>> is
>>> rather crude and hence prone to inconsistencies in their handling versus
>>> the
>>> ASCII counterparts. For example, APOSTROPHE is treated differently from
>>> PRIME:
>>>
>>> λ> data a +' b = Plus a b
>>> <interactive>:3:9:
>>>      Unexpected type ‘b’
>>>      In the data declaration for ‘+’
>>>      A data declaration should have form
>>>        data + a b c = ...
>>> λ> data a +′ b = Plus a b
>>>
>>> λ> let a' = 1
>>> λ> let a′ = 1
>>> <interactive>:10:8: parse error on input ‘=’
>>>
>>> Also some rather bizarre looking things are accepted:
>>>
>>> λ> let ᵤxᵤy = 1
>>>
>>> In the spirit of improving things little by little I would like to
>>> propose:
>>>
>>> 1. Handle single/double/triple/quadruple Unicode PRIMEs the same way as
>>> APOSTROPHE, meaning the following alterations to the lexer:
>>>
>>> primes -> U+2032 | U+2033 | U+2034 | U+2057
>>> symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes)
>>> graphic -> small | large | symbol | digit | special | " | ' | primes
>>> varid -> (small { small | large | digit | ' | primes }) (EXCEPT
>>> reservedid)
>>> conid -> large { small | large | digit | ' | primes }
>>>
>>> 2. Introduce a new lexer nonterminal "subsup" that would include the
>>> Unicode
>>> sub/superscript[1] versions of numbers, "-", "+", "=", "(", ")", Latin
>>> and
>>> Greek letters. And allow these characters to be used in names and
>>> operators:
>>>
>>> symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes |
>>> subsup )
>>> digit -> ascDigit | uniDigit (EXCEPT subsup)
>>> small -> ascSmall | uniSmall (EXCEPT subsup) | _
>>> large -> ascLarge | uniLarge (EXCEPT subsup)
>>> graphic -> small | large | symbol | digit | special | " | ' | primes |
>>> subsup
>>> varid -> (small { small | large | digit | ' | primes | subsup }) (EXCEPT
>>> reservedid)
>>> conid -> large { small | large | digit | ' | primes | subsup }
>>> varsym -> (symbol (EXCEPT :) {symbol | subsup}) (EXCEPT reservedop |
>>> dashes)
>>> consym -> (: {symbol | subsup}) (EXCEPT reservedop)
>>>
>>> If this proposal is received favorably, I'll write a patch for GHC based
>>> on
>>> my previous stab at the problem[2].
>>>
>>> P.S. I'm CC-ing Cafe for extra attention, but please keep the discussion
>>> to
>>> the GHC users list.
>>>
>>> [1] https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts
>>> [2] https://ghc.haskell.org/trac/ghc/ticket/5108
>>> _______________________________________________
>>> Glasgow-haskell-users mailing list
>>> Glasgow-haskell-users at haskell.org
>>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>>
>>
>>
>



-- 
John Meacham - http://notanumber.net/


More information about the Glasgow-haskell-users mailing list