[Haskell-cafe] Tiger compiler: variable escaping analysis phase

Tue Jun 22 09:01:37 EDT 2010

On Tue, Jun 22, 2010 at 09:33:22AM -0300, José Romildo Malaquias wrote:
> In the variable escaping analysis phase of the compiler, a function
> findEscape looks for escaping variables and record this information in
> the escape fields of the abstract syntax. To do this the entire abstract
> syntax tree is traversed looking for escaping uses of every variable,
> and, when found, the field is set to true.
>
> findEscape uses a symbol table to accomplish its work, binding variable
> names to a reference to boolean (the same reference used in the abstract
> syntax tree). When processing a variable declaraction, for instance, it
> inserts the variable name into the symbol table, binding it to its
> escaping field. When processing an expression which is a simple
> variable, if the variable occurs in a nested function, its binding in
> the symbol table is set to true. This reflects directly in the abstract
> syntax tree of the variable declaration, as the escape field in the
> variable declaration and the binding in the symbol table are the same
> reference to bool.

Idea of pure algorithm:

 1) Update the symbol table itself, that is: instead of using

       (a) Map Symbol (IORef Bool)

    use

       (b) Map Symbol Bool.

    This doesn't change the complexity of the algorithm as
    searching and updating have the same complexity for many data
    structures (maps and hashtables, for example).

    In reality, you don't need to use a Map at all.  Just use

       (c) Set Symbol

    and symbols not in the set do not escape.  Using (a) gives
    you O(n log k) for this step, where n is the size of the AST
    and k is the number of symbols.  On the other hand, (c) gives
    you O(n log e), where e is the number of escaping symbols.

 2) After doing the whole analysis, use a function
    'addEscapeInfo' to process the whole pure AST again adding
    information about escaped symbols.  This is O(n log e) as
    well.

> The second option is more elegant in my point of view, but would be much
> less efficient than the first one.

While O(n log e) is better than O(n log k), probably the
constants in the pure algorithm are higher than their impure
counterparts.  I guess you could also try to write:

  1) Take an 'AST e' into 'AST (STRef s Bool, e)' in O(n).

  2) Use the impure algorithm inside ST monad in O(n log k).

  3) Take 'AST (STRef s Bool, e)' into 'AST (Bool, e)' in O(n).

  4) 'runST' on the above steps to get a pure function from
     'AST e' into 'AST (Bool, e)'.

The ST monad has the same runtime overhead as IO.  Steps 1) and
3) are used to hide the ST monad from the rest of the compiler.

Cheers,

--
Felipe.