[Haskell-cafe] Unicode symbols in operators

Mikhail Glushenkov the.dead.shall.rise at gmail.com
Thu Oct 16 06:22:16 UTC 2014


Hi,

On 15 October 2014 21:23, Niklas Hambüchen <mail at nh2.me> wrote:
> (I'm trying to improve Sublime Text's Haskell lexer.)
>
> https://www.haskell.org/onlinereport/haskell2010/haskellch10.html says
>   uniSymbol     →       any Unicode symbol or punctuation
>
> What is meant here, is "Unicode symbol" literally \p{Symbol} in regex,
> or more?
>
> So uniSymbol = \p{Symbol} | \p{Punctuation}

Looking at the source of GHC's lexer [1], the relevant part seems to be:

                case generalCategory c of
                  [...]
                  ConnectorPunctuation  -> symbol
                  DashPunctuation       -> symbol
                  [...]
                  OtherPunctuation      -> symbol
                  MathSymbol            -> symbol
                  CurrencySymbol        -> symbol
                  ModifierSymbol        -> symbol
                  OtherSymbol           -> symbol
                  [...]

[1] https://github.com/ghc/ghc/blob/master/compiler/parser/Lexer.x


More information about the Haskell-Cafe mailing list