[Haskell-cafe] Problem with Python AST

Roel van Dijk vandijk.roel at gmail.com
Wed Feb 20 11:12:20 EST 2008


Hello everyone,

I am trying to create an AST for Python. My approach is to create a
data type for each syntactic construct. But I am stuck trying to
statically enforce some constraints over my statements. A very short
example to illustrate my problem:


newtype Ident = Id String

data BinOp = Add
           | Sub

data Exp = IntLit Integer
         | BinOpExp BinOp Exp Exp

data NormalCtx
data LoopCtx

data Statement ctx where
  Compound :: Compound -> Statement ctx
  Pass     :: Statement ctx
  Break    :: Statement LoopCtx

newtype Global = Global [Ident]

data Suite ctx = Suite [Global] [Statement ctx]

type Else = Suite NormalCtx

data Compound = If [(Exp, Suite NormalCtx)] (Maybe Else)
              | While Exp (Suite LoopCtx) (Maybe Else)

newtype Program = Program [Statement NormalCtx]


The "global" statement makes an identifier visible in the local scope.
It holds for the entire current code block. So it also works
backwards, which is why I didn't make it a statement but part of a
suite (= block of statements).

Some statements may occur in any context, such as the "pass"
statement. But others are only allowed in certain situations, such as
the "break" statement. This is why I defined the Statement as a GADT.
I just supply the context in which the statement may be used and the
typechecker magically does the rest.

Feeling very content with this solution I tried a slightly more
complex program and discovered that my AST can not represent this
Python program:

for i in range(10):
  if i == 6:
    break

The compound if statement is perfectly valid nested in the loop
because the Compound constructor of Statement allows any context. But
the suites inside the clauses of the if statement only allow normal
contexts. Since Break has a LoopCtx the typechecker complains.

Is there some other way to statically enforce that break statements
can only occur _nested_ inside a loop? There is a similar problem with
return statements that may only occur in functions. These nested
statements should somehow 'inherit' a context, if that makes any sense
:-)

I know I can simply create separate data types statements that can
occur inside loops and function bodies. But that would make the AST a
lot more complex, something I try to avoid. Python's syntax is already
complex enough!

Most of these constraints are not in the EBNF grammar which can be
found in the language reference, but they are specified in the
accompanying text. The cpython interpreter will generate SyntaxError's
when you violate these constraints.

See also Python's language reference:
http://docs.python.org/ref/ref.html (see sections 6 and 7)


More information about the Haskell-Cafe mailing list