[Haskell-cafe] Regular Expressions without GADTs
Conor McBride
ctm at cs.nott.ac.uk
Wed Oct 19 10:17:41 EDT 2005
oleg at pobox.com wrote:
>>I suspect that once you start producing values with the
>>relevant properties (as I do) rather than just consuming them (as Oleg
>>and Bruno do), the GADT method might work out a little neater.
>>
>>
>
>Actually, the code is pretty much your original code, with downcased
>identifiers. It faithfully implements that parser division
>approach. Still, there are no existentials.
>
That's because this isn't quite my program either, although I'll grant
you it's a lot closer. Give me back the uppercase identifiers and you've
pretty much done it. That's to say, I want a datatype of regular
expressions which happens (at least) to support parsing, rather than a
parser library which supports (at least) regular languages.
We can leave it to others to decide which is neater, but I'd remark that...
> I wouldn't say that GADT
>code is so much different. Perhaps the code below is a bit neater due
>to the absence of existentials, `case' statements, and local type
>annotations.
>
>
...I'm also irritated by the case statements, the local type
annotations, and so on. I was rather surprised to find that I needed
them. I bet it's not an essential or permanent problem, and it's
certainly a problem we have planned not to have in Epigram.
The existential also bothers me, but for different reasons. In the
dependently typed version, RegExp isn't a GADT: it's just an ordinary
datatype, and there's a perfectly ordinary function, Sem :: RegExp -> *,
which computes the type corresponding to a regular expression. The
syntactic part of division is thus completely separated from the
semantic part (glueing the head back on). The GADT version here replaces
Sem by an index in the datatype, so sometimes I'm forced to write an
existential in order to collect what would have been the result of
applying Sem. You've avoided the existential by throwing away the syntax
and keeping only the specific empty-and-divide semantics required for
this particular example. In effect
* I define regular expressions and show they possess at least the
empty-and-divide semantics;
* You define the empty-and-divide semantics and show that it's closed
under at least the regular expression formers (nicely expressed via
Bruno's class)
Let's see...
Here's the class of regular expression algebras:
>-- Bruno.Oliveira's type class
>class RegExp g where
> zero :: g tok Zero
> one :: g tok ()
> check :: (tok -> Bool) -> g tok tok
> plus :: g tok a -> g tok b -> g tok (Either a b)
> mult :: g tok a -> g tok b -> g tok (a,b)
> star :: g tok a -> g tok [a]
>
>
Here's the specification of the intended semantics:
>data Parse tok t = Parse { empty :: Maybe t
> , divide :: tok -> Parse tok t}
>
>
Here's where you explain how stuff other than regular expressions also
has the semantics. This is how the existential is made to disappear: you
use liftP to fuse the head-glueing function into the parsers themselves.
It's something like adding
Map :: (s -> t) -> RegExp tok s -> RegExp tok t
to the syntax of regular expressions, isn't it?
>liftP f p = Parse{empty = liftM f (empty p),
> divide = \tok -> liftP f (divide p tok)}
>
>
And then you show that indeed the desired semantics is closed under
RegExp formation.
>instance RegExp Parse where
>
>
It's a classic instance of the Expression Problem. If you fix the
elimination behaviour, it's easy to add new introduction behaviour; if
you fix the introduction behaviour, it's easy to add new elimination
behaviour. I think it's important to support both modes of working: type
classes are good at the former, datatypes the latter, so let's be open
to both. There's always a danger with small examples that there's not so
much to choose between the two approaches---except that they scale up in
completely different ways. It's important not to get into a sterile
argument 'Oh, but I can extend the grammar', 'Oh, but I can extend the
functionality', as if only one of these is ever important. I will
happily cheer anyone who can help us to be lighter on our feet in
jumping between the two.
I'm not saying there's no value to 'poor man's' alternatives to GADTs. I
certainly agree with Bruno
(a) that it's important to know how to work within the standard; my
experiences implementing the first version of Epigram have convinced me
to be much more cautious about which Haskell extensions I'm willing to
use in the second
(b) that the pattern 'show type constructor f has such-an-such an
algebra' is useful and worth abstracting.
I was just about to send, when up popped Ross's program, which does give
me a datatype, including just enough extra stuff to support functoriality.
>> data RegExp tok a
>> = Zero
>> | One a
>> | Check (tok -> a) (tok -> Bool)
>> | Plus (RegExp tok a) (RegExp tok a)
>> | forall b. Mult (RegExp tok (b -> a)) (RegExp tok b)
>> | forall e. Star ([e] -> a) (RegExp tok e)
>
>
This combines several dodges rather neatly. You can see the traces of
the translation from GADT to nested-datatype-with-equations, but instead
of saying forall e. Star (Eq [e] a) (RegExp tok e), Ross has kept just
the half of the iso he needs in order to extract the parsed value.
Throwing away the other half more-or-less allows him to hide the
head-glueing functions inside the grammar of the regular expressions. In
effect, Map has been distributed through the syntax. It's nice that the
syntax of regular expressions survives, but it has been somewhat spun in
favour of the functionality required in the example. Of the fakes so
far, I like this the best, because it is *data*.
I would say that the availability of 'this workaround for this example,
that workaround for that example', is evidence in *favour* of adopting
the general tools which are designed to support many examples. If a
job's worth doing, it's worth doing well. Is there a *uniform*
workaround which delivers all the GADT functionality cheaply and
cleanly? Kind of subjective, I suppose, but I haven't seen anything
cheap enough for me.
All the best
Conor
More information about the Haskell-Cafe
mailing list