[Haskell-cafe] Regular Expression Simplification

Fri Mar 21 05:00:08 UTC 2014

On Thu, 20 Mar 2014 12:17:58 +0000, "Berg, Matias Juho" <m.berg.10 at aberdeen.ac.uk> wrote:
> ?Hi all,
> 
> I am a final year undergraduate student at a university and I am doing my final honours project on natural language generation from regular expressions. For this to work efficiently I need to simplify the regular expressions before I translate them. It seems that there is some previous work done on this in Haskell but I have only been able to find this code (http://hackage.haskell.org/package/HaLeX-1.1/docs/src/Language-HaLex-RegExp.html?) which does some elementary simplification.
> 
> Does anyone have any suggestions on where to look for more examples so I can see what kinds of attempts people have used to try and solve this problem? Also if someone has worked on this kind of problem was Kleene algebra a good starting point?
> 
> Best regards,
> 
> Matias

My first thought was: Maybe it would be possible to convert the regular
expression to something like an NFA, and then use one of the existing
known algorithms to generate a minimal NFA from that, and then turn that
back into a regular expression somehow?

Although I'm slightly worried that the overhead of converting to an NFA
is far too great for complicated regular expressions (finite doesn't
mean reasonable), and that the conversion from NFA to regular expression
will generate very suboptimal regular expressions despite the NFA being
minimal.

Maybe a similar analog could be used for another type of regular
machine? Or a regular grammar, perhaps? Those are also fairly easy to
bring into a normalized form and then convert back to a regex.