[Haskell-cafe] Tiger compiler: variable escaping analysis phase

Thu Jun 24 17:44:17 EDT 2010

2010/6/24 José Romildo Malaquias <j.romildo at gmail.com>:
> On Tue, Jun 22, 2010 at 04:44:09PM +0200, Vo Minh Thu wrote:
>> 2010/6/22 José Romildo Malaquias <j.romildo at gmail.com>:
>> > On Tue, Jun 22, 2010 at 02:54:08PM +0200, Vo Minh Thu wrote:
>> >> 2010/6/22 José Romildo Malaquias <j.romildo at gmail.com>:
>> >> > Hello.
>> >> >
>> >> > I have been teaching an introductory course on compiler construction to
>> >> > our undergraduates students using Appel's "Modern Compiler
>> >> > Implementation in Java". There are also versions of the book in ML and
>> >> > C. The books explain how to write a compiler for the Tiger programming
>> >> > language.
>> >> >
>> >> > Now I want to implement a Tiger compiler in Haskell.
>> >> >
>> >> > The lexer and parser (built with the help of alex and happy) and the
>> >> > type checker are already working.
>> >> >
>> >> > Next step is implementing a variable escape analysis phase, needed for
>> >> > the intermediate code generator. The objective of this phase is to find
>> >> > out if each variable escapes or not. In general an escaping variable
>> >> > should be allocated in the stack frame (in memory). Non escaping
>> >> > variables may be allocated in the stack frame or in a register.
>> >> >
>> >> > Generally, a variable escapes if it is passed by reference, its address
>> >> > is taken (using C's & operator, for instance), or it is accessed from a
>> >> > nested function. Only the last is possible with Tiger.
>> >> >
>> >> > The approach adopted by Appel in his books is easy: a muttable field in
>> >> > the abstract syntax of variable declarations, of for expressions, and of
>> >> > function formal parameters, which introduce new variables, is used to
>> >> > collect the escaping information of the variable. In the ML version of
>> >> > the book this is a reference to boolean (bool ref). Initially, in a
>> >> > conservative approach, the reference is initialized to true.
>> >> >
>> >> > In the variable escaping analysis phase of the compiler, a function
>> >> > findEscape looks for escaping variables and record this information in
>> >> > the escape fields of the abstract syntax. To do this the entire abstract
>> >> > syntax tree is traversed looking for escaping uses of every variable,
>> >> > and, when found, the field is set to true.
>> >> >
>> >> > findEscape uses a symbol table to accomplish its work, binding variable
>> >> > names to a reference to boolean (the same reference used in the abstract
>> >> > syntax tree). When processing a variable declaraction, for instance, it
>> >> > inserts the variable name into the symbol table, binding it to its
>> >> > escaping field. When processing an expression which is a simple
>> >> > variable, if the variable occurs in a nested function, its binding in
>> >> > the symbol table is set to true. This reflects directly in the abstract
>> >> > syntax tree of the variable declaration, as the escape field in the
>> >> > variable declaration and the binding in the symbol table are the same
>> >> > reference to bool.
>> >> >
>> >> > I am look for good ways to implement the variable escaping analysis
>> >> > phase in Haskell, which is a pure language. I see two alternatives:
>> >> >
>> >> > 1) adopt the same approach as Appel used in his ML version of the
>> >> >   compiler: use a mutable reference in the IO monad (Data.IORef) to
>> >> >   hold the variable escaping information, and write everything inside
>> >> >   the IO monad
>> >> >
>> >> > 2) build a new abstract syntax tree with updated information regarding
>> >> >   variable escaping
>> >> >
>> >> > The second option is more elegant in my point of view, but would be much
>> >> > less efficient than the first one.
>> >> >
>> >> > So I want to know what advices Haskell programmers has to me about
>> >> > implementing this phase of the Tiger compiler in Haskell.
>> >>
>> >> Hi,
>> >>
>> >> I think there is a third way to do what you describe (if I understood
>> >> everything). You can use a Writer monad (basically a state monad where
>> >> the state is never read, only written to).
>> >>
>> >> Essentially you walk the tree and record the information you want (a
>> >> mapping from variable name to a boolean 'does-escape'). That
>> >> information is threaded through the tree-walking functions.
>> >>
>> >> The information you record is the underlying monoid of the Writer monad.
>> >
>> > The 'does-escape' information should be available for each variable at
>> > the point the variable is introduced (a variable declaration or a formal
>> > parameter in a function declaration, for instance). If this information
>> > is collected in another data structure that is not the abstract syntax
>> > tree, it may be difficult to access the 'does-escape' information when
>> > needed.
>>
>> I was thinking 'accessing when needed' was just a lookup into that
>> other datastructure (i.e. a map). As someone said, and related to your
>> second solution, this analysis and the resulting mapping can be put
>> pack into the tree.
>>
>> I.e. instead of
>> data Tree = ...
>>
>> you have
>> data Tree a = ...
>>
>> and the 'a' can be whatever you want, including the 'does-escape'
>> boolean value.
>>
>> > In this line I thought about using a mapping from a pair consisting of
>> > the variable name and its position in the source code, to the
>> > 'does-escape' boolean. The findEscape function would construct this
>> > mapping while traversing the abstract syntax tree.
>>
>> Yes, that was what I had in mind. But don't use (variable name,
>> position) to index the map. Instead have a renaming pass that ensure
>> that every name is unique.
>
> The Tiger compiler would not need a variable renaming pass
> otherwise.

Ok. I thought you might be interested to have such unique identifiers
for other tasks. And I'm not sure (name,position) would be enough to
resolve every possible reuse of a name. Anyway, at this point I can't
help much as I don't know Tiger but it seems you have plenty of
information to make your decision.

> This renaming pass (renameVars :: Exp -> Exp) would build a new AST.
> For each variable declaration and for each parameter in every function
> declaration, a new (unique) name should be invented, in every use of the
> old name should be replaced by the new name. A state monad to thread an
> integer counter would be helpful here. The counter would be incremented
> each time a new variable name is needed, and it would be used to form
> the name.
>
> But would all this extra work be worth just for using the unique names
> to index the 'does-escape' map (or set)? The overhead of using the pair
> (variable_name,position) is that bad?
>
>> > The function that generates the intermediate representation of the
>> > program would be given the abstract syntax tree and the escaping mapping
>> > as arguments. When traversing the abstract syntax tree to generate code,
>> > it would consult the escaping mapping to determine if a varable escapes
>> > or not, when allocating a variable. The variable name and its position
>> > are available from the abstract syntax tree.
>>
>> Yes. So it appears you don't need the 'data Tree a' if you don't plan
>> to manipulate it beyond what you state here, just the pair
>> (Tree,Mapping) will do it.
>
> Exactly.
>
>> > Are you suggesting that the Writer monad would be used to construct a
>> > data structure like the escaping mapping I mentioned above?
>>
>> Yes. The data structure is just a mapping name -> bool (or you can do
>> it with just a set, if a name belongs to the set, it escapes).
>> Constructing it seems possible with a Writer monad.
>> You would use the 'tell' method of the Writer monad everytime you want
>> to record a new name.
>
> Right.
>
>> Alternatively you can explicitely thread the datastructure if it seems
>> fine.
>
> How would that be?

Using a monad has two great advantages: the syntactic one, where you
don't have to explicitely thread the state, and the more important
one, where it makes clear what you want to do (in this case,
accumulate/write some information that can be abstracted as a monoid).

If your code presentation/taste/understanding (as I understand you
want to use your implementation for didactic purpose) dictate
otherwise, and if it is not too cumbersome to explicitely thread the
state, you might want to do so.

Cheers,
Thu

p.s. don't forget (if you can of course) to put your code on hackage.