[Haskell-cafe] monadic DSL for compile-time parser generator, not possible?

oleg at okmij.org oleg at okmij.org
Wed Mar 13 08:05:09 CET 2013


Jeremy Shaw wrote:
> It would be pretty damn cool if you could create a data type for
> generically describing a monadic parser, and then use template haskell
> to generate a concrete parser from that data type. That would allow
> you to create your specification in a generic way and then target
> different parsers like parsec, attoparsec, etc. There is some code
> coming up in a few paragraphs that should describe this idea more
> clearly.

After rather mild and practical restrictions, the problem is
solvable. (BTW, even the problem of lifting arbitrary functional
values, let alone handles and other stuff, is solvable in realistic
settings -- or even completely, although less practically.) 

Rather than starting from the top -- implementing monadic or
applicative parser, let's start from the bottom and figure out what we
really need. It seems that many real-life parsers aren't using the
full power of Applicative, let alone monad. So why to pay, a whole
lot, for what we don't use.

Any parser combinator library has to be able to combine parsers. It
seems the applicative rule 
        <*> :: Parser (a->b) -> Parser a -> Parser b
is very popular. It is indeed very useful -- although not the only
thing possible. One can come up with a set of combinators that are
used for realistic parsing. For example,
        *> :: Parser a -> Parser b -> Parser b
for sequential composition, although expressible via <*>, could be
defined as primitive. Many other such combinators can be defined as
primitives. 

In other words: the great advantage of Applicative parser combinators
is letting the user supply semantic actions, and executing those
actions as parsing progresses. There is also a traditional approach:
the parser produces an AST or a stream of parsing events, which the
user consumes and semantically processes any way they wish.  Think of
XML parsing: often people parse XML and get a DOM tree, and process it
afterwards. An XML parser can be incremental: SAX.  Parsers that
produce AST need only a small fixed set of combinators. We never need
to lift arbitrary functions since those parsers don't accept arbitrary
semantic actions from the user. For that reason, these parsers are
also much easy to analyze.

Let's take the high road however, applicative parsers. The <*> rule
is not problematic: it neatly maps to code. Consider

        newtype Code a = Code Exp
which is the type-annotated TH Code. We can easily define

        app_code :: Code (a->b) -> Code a -> Code b
        app_code (Code f) (Code x) = Code $ AppE f x

So, Code is almost applicative. Almost -- because we only have a
restricted pure:
        pureR :: Lift a => a -> Code a
with a Lift constraint. Alas, this is not sufficient for realistic
parsers, because often we have to lift functions, as in the example of
parsing a pair of characters:

        pure (\x y -> (x,y)) <*> anyChar <*> anyChar

But aren't functions really unliftable? They are unliftable by value,
but we can also lift by reference. 

Here is an example, using tagless final framework, since it is
extensible. We define the basic minor Applicative

> class Sym repr where
>     pureR :: Lift a => a -> repr a
>     app   :: repr (a->b) -> repr a -> repr b
>
> infixl 4 `app`


And a primitive parser, with only one primitive parser.

> class Sym repr => Parser repr where
>     anychar :: repr Char

For our example, parsing two characters and returning them as a pair,
we need pairs. So, we extend our parser with three higher-order
_constants_.

> class Sym repr => Pair repr where
>     pair :: repr (a -> b -> (a,b))
>     prj1 :: repr ((a,b) -> a)
>     prj2 :: repr ((a,b) -> b)


And here is the example.

> test1 = pair `app` anychar `app` anychar

One interpretation of Sym is to generate code (another one could
analyze the parsers)

> data C a = C{unC :: Q Exp} 

Most interesting is the instance of pairs. Actually, it is not that
interesting: we just lift functions by reference.

> pair0 x y = (x,y)
> 
> instance Pair C where
>     pair = C [e| pure pair0 |]
>     prj1 = C [e| pure fst |]
>     prj2 = C [e| pure snd |]

Because tagless-final is so extensible, any time we need a new
functional constant, we can easily introduce it and define its code,
either by building a code expression or by referring to a global name
that is bound to the desired value. The latter is `lift by reference'
(which is what dynamic linking does).


The obvious limitation of this approach is that all functions to
lift must be named -- because we lift by reference. We can also build
anonymous functions, if we just add lambda to our language. If we go
this way we obtain something like

        http://okmij.org/ftp/meta-programming/index.html#meta-haskell

(which has lam, let, arrays, loops, etc.)

Sample code, for reference

{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE NoMonomorphismRestriction #-}

module P where

import Language.Haskell.TH
import Language.Haskell.TH.Syntax
import Language.Haskell.TH.Ppr

import Control.Applicative

import Text.ParserCombinators.ReadP


class Sym repr where
    pureR :: Lift a => a -> repr a
    app   :: repr (a->b) -> repr a -> repr b

infixl 4 `app`

class Sym repr => Parser repr where
    anychar :: repr Char

-- Higher-order constants
class Sym repr => Pair repr where
    pair :: repr (a -> b -> (a,b))
    prj1 :: repr ((a,b) -> a)
    prj2 :: repr ((a,b) -> b)


-- parse two characters and return them as a pair
test1 = pair `app` anychar `app` anychar



-- Implementations

-- we don't need Q monad actually, neither here
-- nor anywhere!
-- It's a bummer that lift has the signature t -> Q Exp
-- rather than t -> Exp!                         

data C a = C{unC :: Q Exp} 

instance Sym C where
    pureR   = C . lift
    app f x = C $ appE (appE (varE '(Control.Applicative.<*>)) (unC f)) (unC x)

instance Parser C where
    anychar = C . varE $ 'get


pair0 x y = (x,y)
 
instance Pair C where
    pair = C [e| pure pair0 |]
    prj1 = C [e| pure fst |]
    prj2 = C [e| pure snd |]

printC :: C a -> IO String
printC m = runQ (fmap pprint $ unC m )

test1C = printC test1
{-
"(Control.Applicative.<*>) ((Control.Applicative.<*>) 
   (Control.Applicative.pure P.pair0) 
     Text.ParserCombinators.ReadP.get) Text.ParserCombinators.ReadP.get"
-}




More information about the Haskell-Cafe mailing list