[Haskell-cafe] RFC: Unicode support in Alex

Jean-Philippe Bernardy jeanphilippe.bernardy at gmail.com
Wed Jul 29 08:47:04 EDT 2009


I have modified the Alex lexer generator to support unicode. 

The general idea is that the state-machine works on the UTF8
representation of the text. I submit my work here for review
in order to off-load the maintainer (Simon Marlow) as far
as possible.

The prototype is available on github:


Be sure to 
 * checkout the "utf8" branch (so "git diff master" shows the changes)
 * Do a 2-stage bootstrapping before testing

 * The generated code depends on some utf8 packages;
 * There is no attempt to fix the bytestring-based wrappers;
 * Left-context recognition is not table-based any more;
 * Presence of debug code.

Bug reports, comments, and especially patches are welcome :)

-- JP

