Edward Kmett ekmett at gmail.com
Tue Aug 4 15:30:03 EDT 2009

```There are a number of ways to fix this of various complexity, depending on
how many kinds of statements you have in your language and how adverse you
are to code duplication.

One option is to remove the recursion from your statement type and to make a

data Stmt a = Slf Test [a] [a] | ...

and then make that recursive by using a newtype to make the folding
explicit.

newtype Mu f = In { out :: f (Mu f) }

Now you can work on Mu Stmt which is a fixed point of your statement data
type and get your original meaning, or work with Stmt Int and have a
statement type that indexes statements by # and can look them up in a
control flow graph or some such.

You can even attach annotations at every level by using a different ADT to
wrap yourself up. (For the category-theory inclined, this gives rise to

newtype Ann f e = Ann e (f (Ann f e))

So now you can have something like:

Ann 2 (SIf ... (Ann 1 (Var "X")) (Ann 1 (Var "Y"))

if you wanted to count the number of variable references in a subtree for
instance.

On the other hand, rarely does a programming language consist solely of
statements. You often have expressions and other types floating around and
tying with an explicit Mu can sometimes get in the way of that.

Forgetting those definitions for a moment, we can try to fix the one
statement type problem.

You can use some interesting GADT based solutions to fix that, but another
approach that I've been using recently is to use explicit recursion in a
slightly different place.

type (v :> f) = f (v f)
data Var (f :: * -> *) = V String
data Exp f
= App (Exp :> f) (Exp :> f)
| Lam (Var :> f) (Exp :> f)
| Var (Var :> f)
data Stmt f
= If (Exp :> f) [Stmt :> f] [Stmt :> f]
| ...

Now we can have a lot of different kinds of expressions based on what we
substitute in for f.

data Ann a e = Ann a e
newtype Mu e = Mu e
data Free a e = Return a | Free e
newtype Base a e = Base a

now:

Stmt (Base Int) -- is a statement wrapped around integers
Stmt (Ann Int) -- is a statement wrapped around subtrees of various types
annotated with integers
Stmt Mu -- is your old statement type with newtype Mu wrappers on its
children.
Stmt (Free Int) is your old statement data type, with occasional integer
place holders for unexpanded portions of the tree, they can act as standins
for Exps, Vars, etc.

You can then borrow a trick from a recent post of mine:

with some minor modifications to extract data incrementally or return
results as you grow the syntax tree.

The design space is large and there are a lot of options to explore around
here, so don't take any of this as the one and only way to implement a

-Edward Kmett

On Sun, Aug 2, 2009 at 1:25 AM, Michal D. <michal.dobrogost at gmail.com>wrote:

> I'm in the process of writing a toy compiler but I'm having some
> trouble trying to make my datatypes general. For example, using parsec
> I parse statements as:
>
> data Stmt = SIf Test [Stmt] [Stmt]   |   ...
>
> However, when it's time to create a control flow graph it would be
> nice to represent statements as (the Int's signify the node id's for
> either case of the if statement):
>
> data Stmt = SIf Test Int Int   |   ...
>
> So, in a eureka moment I decided that this should be allowable with
> the following declaration:
>
>
> Ofcourse, the problem is trying to declare the resulting type for
> parsing: "parse -> Stmt [Stmt [Stmt ....]]". Any hints on whether
> there is a way to accomplish what I'm trying to do or do I have to
> bite the bullet and declare two seperate datatypes? I tried being
> clever and declaring a 'helper' type as "type StmtRec = Stmt [StmtRec]"
> but to no avail... GHC won't let it slide: "Cycle in type synonym
> declarations"!
>
> Cheers,
>
> Michal
> _______________________________________________