[Haskell-cafe] Tiger compiler: variable escaping analysis phase
José Romildo Malaquias
j.romildo at gmail.com
Thu Jun 24 17:18:30 EDT 2010
On Tue, Jun 22, 2010 at 04:44:09PM +0200, Vo Minh Thu wrote:
> 2010/6/22 José Romildo Malaquias <j.romildo at gmail.com>:
> > On Tue, Jun 22, 2010 at 02:54:08PM +0200, Vo Minh Thu wrote:
> >> 2010/6/22 José Romildo Malaquias <j.romildo at gmail.com>:
> >> > Hello.
> >> >
> >> > I have been teaching an introductory course on compiler construction to
> >> > our undergraduates students using Appel's "Modern Compiler
> >> > Implementation in Java". There are also versions of the book in ML and
> >> > C. The books explain how to write a compiler for the Tiger programming
> >> > language.
> >> >
> >> > Now I want to implement a Tiger compiler in Haskell.
> >> >
> >> > The lexer and parser (built with the help of alex and happy) and the
> >> > type checker are already working.
> >> >
> >> > Next step is implementing a variable escape analysis phase, needed for
> >> > the intermediate code generator. The objective of this phase is to find
> >> > out if each variable escapes or not. In general an escaping variable
> >> > should be allocated in the stack frame (in memory). Non escaping
> >> > variables may be allocated in the stack frame or in a register.
> >> >
> >> > Generally, a variable escapes if it is passed by reference, its address
> >> > is taken (using C's & operator, for instance), or it is accessed from a
> >> > nested function. Only the last is possible with Tiger.
> >> >
> >> > The approach adopted by Appel in his books is easy: a muttable field in
> >> > the abstract syntax of variable declarations, of for expressions, and of
> >> > function formal parameters, which introduce new variables, is used to
> >> > collect the escaping information of the variable. In the ML version of
> >> > the book this is a reference to boolean (bool ref). Initially, in a
> >> > conservative approach, the reference is initialized to true.
> >> >
> >> > In the variable escaping analysis phase of the compiler, a function
> >> > findEscape looks for escaping variables and record this information in
> >> > the escape fields of the abstract syntax. To do this the entire abstract
> >> > syntax tree is traversed looking for escaping uses of every variable,
> >> > and, when found, the field is set to true.
> >> >
> >> > findEscape uses a symbol table to accomplish its work, binding variable
> >> > names to a reference to boolean (the same reference used in the abstract
> >> > syntax tree). When processing a variable declaraction, for instance, it
> >> > inserts the variable name into the symbol table, binding it to its
> >> > escaping field. When processing an expression which is a simple
> >> > variable, if the variable occurs in a nested function, its binding in
> >> > the symbol table is set to true. This reflects directly in the abstract
> >> > syntax tree of the variable declaration, as the escape field in the
> >> > variable declaration and the binding in the symbol table are the same
> >> > reference to bool.
> >> >
> >> > I am look for good ways to implement the variable escaping analysis
> >> > phase in Haskell, which is a pure language. I see two alternatives:
> >> >
> >> > 1) adopt the same approach as Appel used in his ML version of the
> >> > compiler: use a mutable reference in the IO monad (Data.IORef) to
> >> > hold the variable escaping information, and write everything inside
> >> > the IO monad
> >> >
> >> > 2) build a new abstract syntax tree with updated information regarding
> >> > variable escaping
> >> >
> >> > The second option is more elegant in my point of view, but would be much
> >> > less efficient than the first one.
> >> >
> >> > So I want to know what advices Haskell programmers has to me about
> >> > implementing this phase of the Tiger compiler in Haskell.
> >> Hi,
> >> I think there is a third way to do what you describe (if I understood
> >> everything). You can use a Writer monad (basically a state monad where
> >> the state is never read, only written to).
> >> Essentially you walk the tree and record the information you want (a
> >> mapping from variable name to a boolean 'does-escape'). That
> >> information is threaded through the tree-walking functions.
> >> The information you record is the underlying monoid of the Writer monad.
> > The 'does-escape' information should be available for each variable at
> > the point the variable is introduced (a variable declaration or a formal
> > parameter in a function declaration, for instance). If this information
> > is collected in another data structure that is not the abstract syntax
> > tree, it may be difficult to access the 'does-escape' information when
> > needed.
> I was thinking 'accessing when needed' was just a lookup into that
> other datastructure (i.e. a map). As someone said, and related to your
> second solution, this analysis and the resulting mapping can be put
> pack into the tree.
> I.e. instead of
> data Tree = ...
> you have
> data Tree a = ...
> and the 'a' can be whatever you want, including the 'does-escape'
> boolean value.
> > In this line I thought about using a mapping from a pair consisting of
> > the variable name and its position in the source code, to the
> > 'does-escape' boolean. The findEscape function would construct this
> > mapping while traversing the abstract syntax tree.
> Yes, that was what I had in mind. But don't use (variable name,
> position) to index the map. Instead have a renaming pass that ensure
> that every name is unique.
The Tiger compiler would not need a variable renaming pass
This renaming pass (renameVars :: Exp -> Exp) would build a new AST.
For each variable declaration and for each parameter in every function
declaration, a new (unique) name should be invented, in every use of the
old name should be replaced by the new name. A state monad to thread an
integer counter would be helpful here. The counter would be incremented
each time a new variable name is needed, and it would be used to form
But would all this extra work be worth just for using the unique names
to index the 'does-escape' map (or set)? The overhead of using the pair
(variable_name,position) is that bad?
> > The function that generates the intermediate representation of the
> > program would be given the abstract syntax tree and the escaping mapping
> > as arguments. When traversing the abstract syntax tree to generate code,
> > it would consult the escaping mapping to determine if a varable escapes
> > or not, when allocating a variable. The variable name and its position
> > are available from the abstract syntax tree.
> Yes. So it appears you don't need the 'data Tree a' if you don't plan
> to manipulate it beyond what you state here, just the pair
> (Tree,Mapping) will do it.
> > Are you suggesting that the Writer monad would be used to construct a
> > data structure like the escaping mapping I mentioned above?
> Yes. The data structure is just a mapping name -> bool (or you can do
> it with just a set, if a name belongs to the set, it escapes).
> Constructing it seems possible with a Writer monad.
> You would use the 'tell' method of the Writer monad everytime you want
> to record a new name.
> Alternatively you can explicitely thread the datastructure if it seems
How would that be?
More information about the Haskell-Cafe