Alex unicode trick

Mateusz Kowalczyk fuuzetsu at fuuzetsu.co.uk
Tue Jan 7 07:25:28 UTC 2014


Greetings,

When looking at the GHC lexer (Lexer.x), there's:

> $unispace    = \x05 -- Trick Alex into handling Unicode. See alexGetChar.
> $whitechar   = [\ \n\r\f\v $unispace]
> $white_no_nl = $whitechar # \n
> $tab         = \t

Scrolling down to alexGetChar and alexGetChar', we see the comments:


> -- backwards compatibility for Alex 2.x
> alexGetChar :: AlexInput -> Maybe (Char,AlexInput)
>
> -- This version does not squash unicode characters, it is used when
> -- lexing strings.
> alexGetChar' :: AlexInput -> Maybe (Char,AlexInput)

What's the reason for these? I was under the impression that since
3.0, Alex has natively supported unicode. Is it just dead code? Could
all the hex $uni* functions be removed? If not, why not?

--
Mateusz K.


More information about the ghc-devs mailing list