[Haskell-cafe] Re: What *is* a DSL?

Ben Franksen ben.franksen at online.de
Sat Oct 10 14:12:03 EDT 2009


Robert Atkey wrote:
> A deep embedding of a parsing DSL (really a context-sensitive grammar
> DSL) would look something like the following. I think I saw something
> like this in the Agda2 code somewhere, but I stumbled across it when I
> was trying to work out what "free" applicative functors were.
>
> [snip code & explanation]

This is extremely cool. I tried to understand in my head how this all works
but it just didn't click. It all seemed like magic.

Then I went ahead and tried to write a printer for your example grammar and
now everything is much clearer. Although I had to fight the type checker
quite a bit. This is the generic part:

> class Print f where
>     pr :: f a -> String

> instance Print nt => Print (Production nt) where
>     pr = printProduction

> printProduction :: Print nt => Production nt a -> String
> printProduction (Stop _) = ""
> printProduction (Terminal t (Stop _)) = show t
> printProduction (Terminal t p) = show t ++ " " ++ printProduction p
> printProduction (NonTerminal nt (Stop _)) = pr nt
> printProduction (NonTerminal nt p) = pr nt ++ " " ++ printProduction p

> instance Print nt => Print (Rule nt) where
>     pr (Rule ps) = printPs ps where
>       printPs []     = ""
>       printPs [p]    = printProduction p
>       printPs (p:ps) = printProduction p ++ " | " ++ printPs ps

> data Any f = forall a. Any (f a)

> class Enumerable f where
>     enumeration :: [Any f]

> printRule :: Print nt => (nt a -> Rule nt a) -> nt a -> String
> printRule g nt = pr nt ++ " ::= " ++ pr (g nt)

> printGrammar :: (Print nt, Enumerable nt) => Grammar nt -> String
> printGrammar g = foldr (++) "" (intersperse "\n" rules) where
>     rules = map printAnyRule enumeration
>     printAnyRule (Any nt) = printRule g nt

We must also provide instances for the concrete types:

> instance Enumerable NT where
>     enumeration = [Any Sum, Any Product, Any Value]

> instance Print NT where
>     pr Value   = "Value"
>     pr Product = "Product"
>     pr Sum     = "Sum"

So far so good. This even works... almost ;-)

*Main> putStrLn $ printGrammar myGrm
Sum ::= Product '+' Sum | Product
Product ::= Value '*' Product | Value
Value ::= Interrupted. -- had to hit Ctrl-C here

When I replace 'posInt' with 'digit' in the rule for Value

> myGrm Value   =     ENum     <$> digit
>                 <|> id       <$  char '(' <*> nt Sum <* char ')'

then the printer terminates just fine:

*Main> putStrLn $ printGrammar myGrm
Sum ::= Product '+' Sum | Product
Product ::= Value '*' Product | Value
Value ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '('
Sum ')'

I found that the problem is the use of function 'some' from
Control.Applicative in

> posInt :: Rule nt Int
> posInt = fix 1 . reverse <$> some digit where
>   fix n []     = 0
>   fix n (d:ds) = d*n + fix (n*10) ds

Since 'some' is defined recursively, this creates an infinite production for
numbers that you can neither print nor otherwise analyse in finite time.

I can see at least two solutions: One is to parameterize everything over the
type of terminals, too. A type suitable for the example would be

> data T = TNum Int | TPlus | TMult | TOParen | TCParen

and leave token recognition to a separate scanner.

The second solution (which I followed) is to break the recursion by adding
another nonterminal to the NT type:

> data NT a where
>     Sum     :: NT Expr
>     Product :: NT Expr
>     Value   :: NT Expr
>     Number  :: NT [Int]
>     Digit   :: NT Int

> instance Enumerable NT where
>     enumeration = [Any Sum, Any Product, Any Value, Any Number, Any Digit]

> instance Print NT where
>     pr Sum     = "Sum"
>     pr Product = "Product"
>     pr Value   = "Value"
>     pr Number  = "Number"
>     pr Digit   = "Digit"

(Adding Digit /and/ Number is not strictly necessary, but it makes for a
nicer presentation.)

> myGrm :: Grammar NT
> myGrm Sum     =     ESum     <$> nt Product <* char '+' <*> nt Sum
>                 <|> id       <$> nt Product
> 
> myGrm Product =     EProduct <$> nt Value <* char '*' <*> nt Product
>                 <|> id       <$> nt Value
> 
> myGrm Value   =     (ENum . toNat) <$> nt Number
>                 <|> id       <$  char '(' <*> nt Sum <* char ')'
> 
> myGrm Number  =     extend   <$> nt Digit <*> optional (nt Number)
> 
> myGrm Digit   =     digit

> extend d Nothing = [d]
> extend d (Just ds) = d:ds

> toNat :: [Int] -> Int
> toNat = fix 1 . reverse where
>   fix n []     = 0
>   fix n (d:ds) = d*n + fix (n*10) ds

With this I get

*Main> putStrLn $ printGrammar myGrm
Sum ::= Product '+' Sum | Product
Product ::= Value '*' Product | Value
Value ::= Number | '(' Sum ')'
Number ::= Digit Number | Digit
Digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'

Morale: Be careful with recursive functions when constructing a data
representation (e.g. for a deep DSL). You might get an infinite
representation which isn't what you want in this case.

Oh, and another point: there should be a distinguished "start" nonterminal,
otherwise this is not really a grammar. This suggests something like

> type Grammar nt a = (nt a, forall b. nt b -> Rule nt b)

Next thing I'll try is to transform such a grammar into an actual parser...

Cheers
Ben



More information about the Haskell-Cafe mailing list