Alex unicode trick
Mateusz Kowalczyk
fuuzetsu at fuuzetsu.co.uk
Tue Jan 7 07:25:28 UTC 2014
Greetings,
When looking at the GHC lexer (Lexer.x), there's:
> $unispace = \x05 -- Trick Alex into handling Unicode. See alexGetChar.
> $whitechar = [\ \n\r\f\v $unispace]
> $white_no_nl = $whitechar # \n
> $tab = \t
Scrolling down to alexGetChar and alexGetChar', we see the comments:
> -- backwards compatibility for Alex 2.x
> alexGetChar :: AlexInput -> Maybe (Char,AlexInput)
>
> -- This version does not squash unicode characters, it is used when
> -- lexing strings.
> alexGetChar' :: AlexInput -> Maybe (Char,AlexInput)
What's the reason for these? I was under the impression that since
3.0, Alex has natively supported unicode. Is it just dead code? Could
all the hex $uni* functions be removed? If not, why not?
--
Mateusz K.
More information about the ghc-devs
mailing list