RFC re #10196 & GHC 7.10.2: Allow Unicode Lm category for 2nd+ character on in identifiers

Herbert Valerio Riedel hvriedel at gmail.com
Wed May 6 08:22:32 UTC 2015


Hello *,

As you may be aware, GHC 7.10.1 updated its Unicode catalog to version
7.0, thereby causing some subscript symbols to change character
properties, thereby causing GHC 7.10.1 to reject some Unicode-subscript
characters that GHC 7.8.4 accepted.

See https://ghc.haskell.org/trac/ghc/ticket/10196 for specific examples.

In order to address this regression, one suggestion is to allow
characters in the 'Letter, Modifier' category[1] from the 2nd position
on in an identifier. This however may do more than just fix the
regression, as it looks from [1] that it would allow many more new
identifiers than were previously possible.

So, is allowing 'Lm'-chars the the 2nd+ characters in an identifier a
sensible change for addressing #10196 or not. If it's a bad idea, are
there other suggestions worth considering?

 [1]: http://www.fileformat.info/info/unicode/category/Lm/list.htm

Cheers,
  hvr


More information about the ghc-devs mailing list