[Haskell-cafe] Re: What *is* a DSL?

Ben Franksen ben.franksen at online.de
Sun Oct 11 18:00:54 EDT 2009


Ben Franksen wrote:
> Ben Franksen wrote:
>> Next thing I'll try is to transform such a grammar into an actual
>> parser...
> 
> Which I also managed to get working.

First, before all this talking to myself here is boring you to death, please
shout and I'll go away. Anyway, at least one person has privately expressed
interest, so I'll post my code for the translation.(*)

> {-# LANGUAGE ExistentialQuantification, GADTs, Rank2Types #-}
> {-# LANGUAGE TypeSynonymInstances, MultiParamTypeClasses,
ImpredicativeTypes #-}
> import qualified Text.ParserCombinators.Parsec as P

Note that I have parameterized everything on the token (terminal) type. Here
are the data types, adapting the rest of the code is completely mechanical.

> data Production nt t a
>     =           Stop        a
>     |           Terminal    t      (Production nt t a)
>     | forall b. NonTerminal (nt b) (Production nt t (b -> a))

> newtype Rule nt t a = Rule [Production nt t a]

> type RuleSet nt t = forall a. nt a -> Rule nt t a

> type Grammar nt t b = (nt b, RuleSet nt t)

I should probably turn this into a proper data type, which would BTW also
make the ImpredicativeTypes extension unnecessary.

Translation to Parsec
---------------------

We restrict ourselves to Char as terminals for simplicity. The
generalization to arbitrary token types would need another three arguments:
showTok :: (tok -> String), nextPos :: (SourcePos -> tok -> [tok] ->
SourcePos), and testTok :: (tok -> Maybe a), which are needed by
P.tokenPrim.

> parseGrammar :: Print nt => Grammar nt Char a -> P.Parser a
> parseGrammar (start,rules) = parseNonTerminal rules start

> parseNonTerminal :: Print nt => RuleSet nt Char -> nt a -> P.Parser a
> parseNonTerminal rs nt = parseRule rs (rs nt) P.<?> pr nt

> parseRule :: Print nt => RuleSet nt Char -> Rule nt Char a -> P.Parser a
> parseRule rs (Rule ps) = P.choice (map ({- P.try . -} parseProduction rs)
ps)

> parseProduction :: Print nt => RuleSet nt Char -> Production nt Char a ->
P.Parser a
> parseProduction _  (Stop x) = return x
> parseProduction rs (Terminal c p) = P.char c >> parseProduction rs p
> parseProduction rs (NonTerminal nt p) = do
>   vnt <- parseNonTerminal rs nt
>   vp <- parseProduction rs p
>   return (vp vnt)

This is really not difficult, once you understand how the list-like
Production type works. The trick is that a NonTerminal forces the "rest" of
the production to return a function type, so you can apply its result to
the result of parsing the nonterminal. Whereas the result of parsing
terminals gets ignored by the "rest" of the production. You might wonder
how the code manages to return the correct integer values inside an ENum.
Well, I did, at least. I don't yet understand it completely but I think the
answer is in in the Functor and Applicative instances: all the code that
interprets syntactic elements (up to the abstract syntax) inside the myGrm
function gets pushed down through the elements of a production until it
ends up at a Stop, where we can finally pull it out (see the first clause
of parseProduction).

Note also the (commented-out) use of P.try in function parseRule. Let's try
it:

*Main> putStrLn (printGrammar myGrm)
*Start ::= Sum
Sum ::= Product '+' Sum | Product
Product ::= Value '*' Product | Value
Value ::= Number | '(' Sum ')'
Number ::= Digit Number | Digit
Digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
*Main> P.parseTest (parseGrammar myGrm) "2*(2+52)"
parse error at (line 1, column 2):
unexpected "*"
expecting Number

After re-inserting the P.try call, I can actually parse expressions (yay!):

*Main> :r
[1 of 1] Compiling Main             ( Grammar.lhs, interpreted )
Ok, modules loaded: Main.
*Main> P.parseTest (parseGrammar myGrm) "2*(2+52)"
EProduct (ENum 2) (ESum (ENum 2) (ENum 52))

BTW, does anyone know a source (books, papers, blogs, whatever) about
algorithms for automatic left-factoring? I searched with google and found
some interesting papers on eliminating left recursion but nothing so far on
left-factoring. Have these problems all been solved before the internet
age?

Cheers
Ben

(*) One of these days I really should get my hands dirty and set up a
weblog; suggestions for how to proceed are appreciated. I would especially
like something where I can just upload a literate Haskell file and it gets
formatted automagically. Bonus points for beautifying operator symbols a la
lhs2tex ;-)



More information about the Haskell-Cafe mailing list