[Haskell-cafe] RFC: Unicode primes and super/subscript characters in GHC

Andrew Gibiansky andrew.gibiansky at gmail.com
Sun Jun 15 06:16:05 UTC 2014


I personally like this idea. Mathematica allows all sorts of bizarre names
and it'd be cool for Haskell to be similar, so that mathematical Haskell
scripts and IHaskell notebooks can be just as fancy and incomprehensible as
dense Mathematica code!

Since GHC already accepts *some* unicode, I think it'd be a great idea to
extend it in this way.


On Sat, Jun 14, 2014 at 4:58 PM, John Meacham <john at repetae.net> wrote:

> I have this feature in jhc, where I have a 'trailing' character class
> that can appear at the end of both symbols and ids.
>
> currently it consists of
>
>  $trailing = [₀₁₂₃₄₅₆₇₈₉⁰¹²³⁴⁵⁶⁷⁸⁹₍₎⁽⁾₊₋]
>
>  John
>
> On Sat, Jun 14, 2014 at 7:48 AM, Mikhail Vorozhtsov
> <mikhail.vorozhtsov at gmail.com> wrote:
> > Hello lists,
> >
> > As some of you may know, GHC's support for Unicode characters in lexemes
> is
> > rather crude and hence prone to inconsistencies in their handling versus
> the
> > ASCII counterparts. For example, APOSTROPHE is treated differently from
> > PRIME:
> >
> > λ> data a +' b = Plus a b
> > <interactive>:3:9:
> >     Unexpected type ‘b’
> >     In the data declaration for ‘+’
> >     A data declaration should have form
> >       data + a b c = ...
> > λ> data a +′ b = Plus a b
> >
> > λ> let a' = 1
> > λ> let a′ = 1
> > <interactive>:10:8: parse error on input ‘=’
> >
> > Also some rather bizarre looking things are accepted:
> >
> > λ> let ᵤxᵤy = 1
> >
> > In the spirit of improving things little by little I would like to
> propose:
> >
> > 1. Handle single/double/triple/quadruple Unicode PRIMEs the same way as
> > APOSTROPHE, meaning the following alterations to the lexer:
> >
> > primes -> U+2032 | U+2033 | U+2034 | U+2057
> > symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes)
> > graphic -> small | large | symbol | digit | special | " | ' | primes
> > varid -> (small { small | large | digit | ' | primes }) (EXCEPT
> reservedid)
> > conid -> large { small | large | digit | ' | primes }
> >
> > 2. Introduce a new lexer nonterminal "subsup" that would include the
> Unicode
> > sub/superscript[1] versions of numbers, "-", "+", "=", "(", ")", Latin
> and
> > Greek letters. And allow these characters to be used in names and
> operators:
> >
> > symbol -> ascSymbol | uniSymbol (EXCEPT special | _ | " | ' | primes |
> > subsup )
> > digit -> ascDigit | uniDigit (EXCEPT subsup)
> > small -> ascSmall | uniSmall (EXCEPT subsup) | _
> > large -> ascLarge | uniLarge (EXCEPT subsup)
> > graphic -> small | large | symbol | digit | special | " | ' | primes |
> > subsup
> > varid -> (small { small | large | digit | ' | primes | subsup }) (EXCEPT
> > reservedid)
> > conid -> large { small | large | digit | ' | primes | subsup }
> > varsym -> (symbol (EXCEPT :) {symbol | subsup}) (EXCEPT reservedop |
> dashes)
> > consym -> (: {symbol | subsup}) (EXCEPT reservedop)
> >
> > If this proposal is received favorably, I'll write a patch for GHC based
> on
> > my previous stab at the problem[2].
> >
> > P.S. I'm CC-ing Cafe for extra attention, but please keep the discussion
> to
> > the GHC users list.
> >
> > [1] https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts
> > [2] https://ghc.haskell.org/trac/ghc/ticket/5108
> > _______________________________________________
> > Glasgow-haskell-users mailing list
> > Glasgow-haskell-users at haskell.org
> > http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>
>
>
> --
> John Meacham - http://notanumber.net/
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20140614/898e43eb/attachment.html>


More information about the Haskell-Cafe mailing list