[C2hs] CHS lexer goes into infinite loop on chars > 255
Manuel M T Chakravarty
chak at cse.unsw.edu.au
Sun Dec 13 23:03:57 EST 2009
> Found another bug that surfaces when we compile c2hs with ghc-6.12.
> By default text files are now read in the locale encoding rather than
> just ASCII. This means we can (and do) get characters over 255. The
> behaviour is that c2hs goes into an infinite loop and consumes all the
> memory on your machine (in particular this happens with some files in
> Unfortunately the 255 assumption is pretty strongly wired into the c2hs
> lexer. From Lexer.hs:
> -- * Unicode posses a problem as the character domain becomes too big
> -- for using arrays to represent transition tables and even sparse
> -- structures will posse a significant overhead when character ranges
> -- are naively represented. So, it might be time for finite maps again.
> The short term solution is to set the text mode to be ASCII. In the
> longer term we might want to replace the .chs lexer and parser, like we
> did already for the C parser.
Yes, that make sense. At the time, unicode support in GHC was a still far away.
More information about the C2hs