[Haskell-cafe] Re: What *is* a DSL?
Ben Franksen
ben.franksen at online.de
Sat Oct 10 14:12:03 EDT 2009
Robert Atkey wrote:
> A deep embedding of a parsing DSL (really a context-sensitive grammar
> DSL) would look something like the following. I think I saw something
> like this in the Agda2 code somewhere, but I stumbled across it when I
> was trying to work out what "free" applicative functors were.
>
> [snip code & explanation]
This is extremely cool. I tried to understand in my head how this all works
but it just didn't click. It all seemed like magic.
Then I went ahead and tried to write a printer for your example grammar and
now everything is much clearer. Although I had to fight the type checker
quite a bit. This is the generic part:
> class Print f where
> pr :: f a -> String
> instance Print nt => Print (Production nt) where
> pr = printProduction
> printProduction :: Print nt => Production nt a -> String
> printProduction (Stop _) = ""
> printProduction (Terminal t (Stop _)) = show t
> printProduction (Terminal t p) = show t ++ " " ++ printProduction p
> printProduction (NonTerminal nt (Stop _)) = pr nt
> printProduction (NonTerminal nt p) = pr nt ++ " " ++ printProduction p
> instance Print nt => Print (Rule nt) where
> pr (Rule ps) = printPs ps where
> printPs [] = ""
> printPs [p] = printProduction p
> printPs (p:ps) = printProduction p ++ " | " ++ printPs ps
> data Any f = forall a. Any (f a)
> class Enumerable f where
> enumeration :: [Any f]
> printRule :: Print nt => (nt a -> Rule nt a) -> nt a -> String
> printRule g nt = pr nt ++ " ::= " ++ pr (g nt)
> printGrammar :: (Print nt, Enumerable nt) => Grammar nt -> String
> printGrammar g = foldr (++) "" (intersperse "\n" rules) where
> rules = map printAnyRule enumeration
> printAnyRule (Any nt) = printRule g nt
We must also provide instances for the concrete types:
> instance Enumerable NT where
> enumeration = [Any Sum, Any Product, Any Value]
> instance Print NT where
> pr Value = "Value"
> pr Product = "Product"
> pr Sum = "Sum"
So far so good. This even works... almost ;-)
*Main> putStrLn $ printGrammar myGrm
Sum ::= Product '+' Sum | Product
Product ::= Value '*' Product | Value
Value ::= Interrupted. -- had to hit Ctrl-C here
When I replace 'posInt' with 'digit' in the rule for Value
> myGrm Value = ENum <$> digit
> <|> id <$ char '(' <*> nt Sum <* char ')'
then the printer terminates just fine:
*Main> putStrLn $ printGrammar myGrm
Sum ::= Product '+' Sum | Product
Product ::= Value '*' Product | Value
Value ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '('
Sum ')'
I found that the problem is the use of function 'some' from
Control.Applicative in
> posInt :: Rule nt Int
> posInt = fix 1 . reverse <$> some digit where
> fix n [] = 0
> fix n (d:ds) = d*n + fix (n*10) ds
Since 'some' is defined recursively, this creates an infinite production for
numbers that you can neither print nor otherwise analyse in finite time.
I can see at least two solutions: One is to parameterize everything over the
type of terminals, too. A type suitable for the example would be
> data T = TNum Int | TPlus | TMult | TOParen | TCParen
and leave token recognition to a separate scanner.
The second solution (which I followed) is to break the recursion by adding
another nonterminal to the NT type:
> data NT a where
> Sum :: NT Expr
> Product :: NT Expr
> Value :: NT Expr
> Number :: NT [Int]
> Digit :: NT Int
> instance Enumerable NT where
> enumeration = [Any Sum, Any Product, Any Value, Any Number, Any Digit]
> instance Print NT where
> pr Sum = "Sum"
> pr Product = "Product"
> pr Value = "Value"
> pr Number = "Number"
> pr Digit = "Digit"
(Adding Digit /and/ Number is not strictly necessary, but it makes for a
nicer presentation.)
> myGrm :: Grammar NT
> myGrm Sum = ESum <$> nt Product <* char '+' <*> nt Sum
> <|> id <$> nt Product
>
> myGrm Product = EProduct <$> nt Value <* char '*' <*> nt Product
> <|> id <$> nt Value
>
> myGrm Value = (ENum . toNat) <$> nt Number
> <|> id <$ char '(' <*> nt Sum <* char ')'
>
> myGrm Number = extend <$> nt Digit <*> optional (nt Number)
>
> myGrm Digit = digit
> extend d Nothing = [d]
> extend d (Just ds) = d:ds
> toNat :: [Int] -> Int
> toNat = fix 1 . reverse where
> fix n [] = 0
> fix n (d:ds) = d*n + fix (n*10) ds
With this I get
*Main> putStrLn $ printGrammar myGrm
Sum ::= Product '+' Sum | Product
Product ::= Value '*' Product | Value
Value ::= Number | '(' Sum ')'
Number ::= Digit Number | Digit
Digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
Morale: Be careful with recursive functions when constructing a data
representation (e.g. for a deep DSL). You might get an infinite
representation which isn't what you want in this case.
Oh, and another point: there should be a distinguished "start" nonterminal,
otherwise this is not really a grammar. This suggests something like
> type Grammar nt a = (nt a, forall b. nt b -> Rule nt b)
Next thing I'll try is to transform such a grammar into an actual parser...
Cheers
Ben
More information about the Haskell-Cafe
mailing list