RFC: Unicode primes and super/subscript characters in GHC

John Meacham john at repetae.net
Tue Jun 17 14:19:16 UTC 2014


Don't forget that for every line of haskell code on hackage there are
dozens of lines used internally within organizations where
compatibility beyond their target internal tools may not be a concern.
Deciding on a policy of allowing primes or whatnot within an
organization seems quite plausible and doesn't entail CPP concerns.

    John

On Sun, Jun 15, 2014 at 5:26 PM, Mateusz Kowalczyk
<fuuzetsu at fuuzetsu.co.uk> wrote:
> On 06/14/2014 04:48 PM, Mikhail Vorozhtsov wrote:
>> Hello lists,
>>
>> As some of you may know, GHC's support for Unicode characters in lexemes
>> is rather crude and hence prone to inconsistencies in their handling
>> versus the ASCII counterparts. For example, APOSTROPHE is treated
>> differently from PRIME:
>>
>> λ> data a +' b = Plus a b
>> <interactive>:3:9:
>>      Unexpected type ‘b’
>>      In the data declaration for ‘+’
>>      A data declaration should have form
>>        data + a b c = ...
>> λ> data a +′ b = Plus a b
>>
>> λ> let a' = 1
>> λ> let a′ = 1
>> <interactive>:10:8: parse error on input ‘=’
>>
>> Also some rather bizarre looking things are accepted:
>>
>> λ> let ᵤxᵤy = 1
>>
>> In the spirit of improving things little by little I would like to propose:
>>
>> 1. Handle single/double/triple/quadruple Unicode PRIMEs the same way as
>> APOSTROPHE, meaning the following alterations to the lexer:
>>
>> primes -> U+2032 | U+2033 | U+2034 | U+2057
>> symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes)
>> graphic -> small | large | symbol | digit | special | " | ' | primes
>> varid -> (small { small | large | digit | ' | primes }) (EXCEPT reservedid)
>> conid -> large { small | large | digit | ' | primes }
>>
>> 2. Introduce a new lexer nonterminal "subsup" that would include the
>> Unicode sub/superscript[1] versions of numbers, "-", "+", "=", "(", ")",
>> Latin and Greek letters. And allow these characters to be used in names
>> and operators:
>>
>> symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes |
>> subsup )
>> digit -> ascDigit | uniDigit (EXCEPT subsup)
>> small -> ascSmall | uniSmall (EXCEPT subsup) | _
>> large -> ascLarge | uniLarge (EXCEPT subsup)
>> graphic -> small | large | symbol | digit | special | " | ' | primes |
>> subsup
>> varid -> (small { small | large | digit | ' | primes | subsup }) (EXCEPT
>> reservedid)
>> conid -> large { small | large | digit | ' | primes | subsup }
>> varsym -> (symbol (EXCEPT :) {symbol | subsup}) (EXCEPT reservedop | dashes)
>> consym -> (: {symbol | subsup}) (EXCEPT reservedop)
>>
>> If this proposal is received favorably, I'll write a patch for GHC based
>> on my previous stab at the problem[2].
>>
>> P.S. I'm CC-ing Cafe for extra attention, but please keep the discussion
>> to the GHC users list.
>>
>> [1] https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts
>> [2] https://ghc.haskell.org/trac/ghc/ticket/5108
>> _______________________________________________
>> Glasgow-haskell-users mailing list
>> Glasgow-haskell-users at haskell.org
>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>>
>
> While personally I like the proposal (wanted prime and sub/sup scripts
> way too many times), I worry what this means for compatibility reasons:
> suddenly we'll have code that fails to build on 7.8 and before because
> someone using 7.9/7.10+ used ′ somewhere. Even using CPP based on
> version of the compiler used is not too great in this scenario because
> it doesn't bring significant practical advantage to justify the CPP
> clutter in code. If the choice is either extra lines due to CPP or using
> ‘'’ instead of ‘′’, I know which I'll go for.
>
> I also worry (although not based on anything particular you said)
> whether this will not change meaning of any existing programs. Does it
> only allow new programs?
>
> Will it be enabled by a pragma?
>
> I simply worry about how practical it will be to use for actual programs
> and libraries that will go out on Hackage and wider world, even if it is
> accepted.
>
> --
> Mateusz K.
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



-- 
John Meacham - http://notanumber.net/


More information about the Glasgow-haskell-users mailing list