[C2hs] CHS lexer goes into infinite loop on chars > 255

Thu Dec 10 11:53:24 EST 2009

Found another bug that surfaces when we compile c2hs with ghc-6.12.

By default text files are now read in the locale encoding rather than
just ASCII. This means we can (and do) get characters over 255. The
behaviour is that c2hs goes into an infinite loop and consumes all the
memory on your machine (in particular this happens with some files in
gtk2hs).

Unfortunately the 255 assumption is pretty strongly wired into the c2hs
lexer. From Lexer.hs:

-- * Unicode posses a problem as the character domain becomes too big 
-- for using arrays to represent transition tables and even sparse 
-- structures will posse a significant overhead when character ranges
-- are naively represented. So, it might be time for finite maps again.

The short term solution is to set the text mode to be ASCII. In the
longer term we might want to replace the .chs lexer and parser, like we
did already for the C parser.

Duncan