Proposal: Treat OtherLetter as lower case in the grammar

Thu Aug 14 15:53:39 UTC 2014

I'd like to fix more of our unicode mess while we're at it.

For example Mn (non-spacing combining marks) should allowed in
varid_{cont}, so it won't be like this:

  h> let é=() in é
  ()

  h> let x́=() in x́
  <interactive>:6:6: lexical error at character '\769'

(that’s because x́ is denormalized and is actually 2 code points).

[:Mc:] probably too. Also we can include include unicode ′ ″ ‴ ⁗ primes
(there's already a proposal for this IIRC).

We have some prior work to look at, — that is at least the java language
specification and UAX #31. One problem is that doing it perfectly will
require normalization (but there's always the java way — to just ignore
it).

(I'm willing to formulate everything if there's some agreement to fix
this.)

Edward Kmett <ekmett at gmail.com> writes:

> Back in 2008 or so, GHC changed the behavior of unicode characters in
> the parser that parse as OtherLetter to make them parse as lower case
> so that languages like Japanese that lack case could be used in
> identifier names:
>
> https://ghc.haskell.org/trac/ghc/ticket/1103
>
> In a recent thread on reddit Lennart Augustsson pointed out that this
> change 
> was never backported to Haskell'.
>
> http://www.reddit.com/r/haskell/comments/2dce3d/%E0%B2%A0_%E0%B2%A0_string_
> a/cjo68ij
>
> Would it make sense to adopt this change in the language standard?
>
> Marlow when he made the change to GHC noted he was considering
> bringing it up to Haskell' but here we are 6 years later.