[Haskell-cafe] Regular Expressions without GADTs

Tue Oct 18 12:51:13 EDT 2005

Hi folks

>There is nothing ingenious with this solution --- I basically translated
>Oleg's and Conor's code.
>

Neither Oleg nor Bruno translated my code; they threw away my 
structurally recursive on-the-fly automaton and wrote combinator parsers 
instead. That's why there's no existential, etc. The suggestion that 
removing the GADT simplifies the code would be better substantiated if 
like was compared with like. By the way, the combinator parsers, as 
presented, don't quite work: try

  asMayBe $ parse (Star One) "I really must get to the bottom of this..."

with Oleg's code, or the analogous (star one) with Bruno's.

But it's almost as easy a mistake to fix (I hope) as it is to make, so 
the point that the GADT isn't strictly necessary does stand. I wouldn't 
swear my code's correct either, but I don't think it loops on finite input.

Oleg simulates GADTs with Ye Olde Type Class Trick, getting around the 
fact that type classes aren't closed by adding each necessary piece of 
functionality to the methods. By contrast, Bruno goes straight for the 
Church encoding, as conveniently packaged by type classes: even if you 
can't write down the (morally) *initial* RegExp algebra, you can write 
down what RegExp algebras are in general. These two approaches aren't 
mutually exclusive: you can use Bruno's algebras to capture the 
canonical functionality which should be supported for all types in 
Oleg's class. Then you're pretty much writing down that a GADT is an 
inductive relation, which perhaps you had guessed already...

Where we came in, if you recall, was the use of a datatype to define 
regular expressions. I used a GADT to equip this type with some useful 
semantic information (namely a type of parsed objects), and I wrote a 
program (divide) which exploited this representation to compute the 
regexp the tail of the input must match from the regexp the whole input 
must match. This may be a bit of a weird way to write a parser, but it's 
a useful sort of thing to be able to do. There are plenty of other 
examples of 'auxiliary' types computed from types to which they relate 
in some interesting way, the Zipper being the a favourite. It's useful 
to have a syntax (Generic Haskell, GADTs etc) which makes that look as 
straightforward and concrete as possible.

I'm sure the program I actually wrote can be expressed with the type 
class trick, just by cutting up my functions and pasting the bits into 
individual instances; the use of the existential is still available. I 
don't immediately see how to code this up in Bruno's style, but that 
doesn't mean it can't be done. Still, it might be worth comparing like 
with like. I suspect that once you start producing values with the 
relevant properties (as I do) rather than just consuming them (as Oleg 
and Bruno do), the GADT method might work out a little neater.

And that's the point: not only does the GADT have all the disadvantages 
of a closed datatype, it has all the advantages too! It's easy to write 
new functions which both consume and produce data, without having to go 
back and change the definition. I didn't have to define Division or 
divide in order to say what a regular expression was. If I remember 
rightly, regular languages are closed under a whole bunch of useful 
stuff (intersection, complementation, ...) so someone who can remember 
better than me might implement the corresponding algorithms together 
with their related operations; they don't need to change the definition 
of RegExp to do that. Might we actually decide semantic subtyping 
(yielding a coercion) that way?

So what's the point? GADTs are a very convenient way to provide data 
which describe types. Their encoding via type classes, or
whatever, may be possible, but this encoding is not necessarily as 
convenient as the GADT. Whether or not the various encodings of GADTs 
deliver more convenience in this example is not clear, because the 
programs do not correspond. Moreover, once you have to start messing 
about with equality constraints, the coding to eliminate GADTs becomes a 
lot hairier. I implemented it back in '95 (when all these things had 
different names) and I was very glad not to have to do it by hand any more.

But your mileage may vary

Conor