[Haskell-cafe] Regular Expressions without GADTs
Conor McBride
ctm at cs.nott.ac.uk
Tue Oct 18 12:51:13 EDT 2005
Hi folks
>There is nothing ingenious with this solution --- I basically translated
>Oleg's and Conor's code.
>
Neither Oleg nor Bruno translated my code; they threw away my
structurally recursive on-the-fly automaton and wrote combinator parsers
instead. That's why there's no existential, etc. The suggestion that
removing the GADT simplifies the code would be better substantiated if
like was compared with like. By the way, the combinator parsers, as
presented, don't quite work: try
asMayBe $ parse (Star One) "I really must get to the bottom of this..."
with Oleg's code, or the analogous (star one) with Bruno's.
But it's almost as easy a mistake to fix (I hope) as it is to make, so
the point that the GADT isn't strictly necessary does stand. I wouldn't
swear my code's correct either, but I don't think it loops on finite input.
Oleg simulates GADTs with Ye Olde Type Class Trick, getting around the
fact that type classes aren't closed by adding each necessary piece of
functionality to the methods. By contrast, Bruno goes straight for the
Church encoding, as conveniently packaged by type classes: even if you
can't write down the (morally) *initial* RegExp algebra, you can write
down what RegExp algebras are in general. These two approaches aren't
mutually exclusive: you can use Bruno's algebras to capture the
canonical functionality which should be supported for all types in
Oleg's class. Then you're pretty much writing down that a GADT is an
inductive relation, which perhaps you had guessed already...
Where we came in, if you recall, was the use of a datatype to define
regular expressions. I used a GADT to equip this type with some useful
semantic information (namely a type of parsed objects), and I wrote a
program (divide) which exploited this representation to compute the
regexp the tail of the input must match from the regexp the whole input
must match. This may be a bit of a weird way to write a parser, but it's
a useful sort of thing to be able to do. There are plenty of other
examples of 'auxiliary' types computed from types to which they relate
in some interesting way, the Zipper being the a favourite. It's useful
to have a syntax (Generic Haskell, GADTs etc) which makes that look as
straightforward and concrete as possible.
I'm sure the program I actually wrote can be expressed with the type
class trick, just by cutting up my functions and pasting the bits into
individual instances; the use of the existential is still available. I
don't immediately see how to code this up in Bruno's style, but that
doesn't mean it can't be done. Still, it might be worth comparing like
with like. I suspect that once you start producing values with the
relevant properties (as I do) rather than just consuming them (as Oleg
and Bruno do), the GADT method might work out a little neater.
And that's the point: not only does the GADT have all the disadvantages
of a closed datatype, it has all the advantages too! It's easy to write
new functions which both consume and produce data, without having to go
back and change the definition. I didn't have to define Division or
divide in order to say what a regular expression was. If I remember
rightly, regular languages are closed under a whole bunch of useful
stuff (intersection, complementation, ...) so someone who can remember
better than me might implement the corresponding algorithms together
with their related operations; they don't need to change the definition
of RegExp to do that. Might we actually decide semantic subtyping
(yielding a coercion) that way?
So what's the point? GADTs are a very convenient way to provide data
which describe types. Their encoding via type classes, or
whatever, may be possible, but this encoding is not necessarily as
convenient as the GADT. Whether or not the various encodings of GADTs
deliver more convenience in this example is not clear, because the
programs do not correspond. Moreover, once you have to start messing
about with equality constraints, the coding to eliminate GADTs becomes a
lot hairier. I implemented it back in '95 (when all these things had
different names) and I was very glad not to have to do it by hand any more.
But your mileage may vary
Conor
More information about the Haskell-Cafe
mailing list