[Haskell-cafe] STG optimization

Tue Jul 20 17:57:41 EDT 2004

Hi. I have been reading recently the STG machine paper by Simon Peyton-Jones. 
It is relatively easy to follow the explanation, though the formal description 
of operational semantics is hard to understand. In order to fully understand 
it, I have started to implement a C-based mini STG machine, implementing the 
obvious compilation schemes according to the STG paper. Of course, I have not 
implemented most of optimizations: environment management is very rudimentary, 
it has three stacks (pointer args, primitive args and continuations) for 
simplicity, etc. At the moment I have a beatiful tree-reduction STG machine.

So far so good. Nonetheless, some doubts have arisen when I have started to try 
to add updating. In the paper, implementation for "update in place" is 
extensively discussed, although it is said it could greatly increase the memory 
space usage. ¿Has this risk been measured? I think deforestation can minimize 
the risk; however, as far as I know, deforestation is only applied to lists.

But the STG paper ignores an optimization readyly available to me: whenever a 
constructor is allocated (constructor's register allocation aside, since the 
push/enter - eval/apply paper says GHC no longer supports it), the STG machine 
(the one described by the paper) takes heap space for it, although an update 
(indirection or copying) is going to be done most of times. This leads to 
potentially bad heap space usage and more garbage collections.

I think update frames may be detected by the constructor allocation code, so it 
could determine whether to allocate a new constructor or build it over the 
closure to be updated. This could be achieved by marking continuations and 
update frames in the stack, so constructor allocation code could distinguish 
between them. This is inconvenient because common continuations should remove 
the mark too, so I do not like it.

Another (my favourite) similar solution should be introduce a new stack only 
for update frames, but still pushing code labels for updating in the 
continuation stack (which could still be merged with the primitive args stack): 
this way, constructor's code can still simply jump to the next "continuation" 
(the update code should be responsible of popping the update stack), and 
constructor allocation code can simply test if the "continuation" on top of the 
stack is a real continuation, so it can safely take newly allocated heap space 
for the constructor, or it is an update (perhaps by testing for equality with 
the top of the update stack), so the code should test if the constructor 
closure can be overwritten over the closure to be updated.

To ilustrate my thoughts:

\begin{code}

some_func <some_args> = let b = <some_expression> in
  SomeConstructor { b, some_stuff }

some_other_func <some_other_args> = let a = some_func { some_other_stuff }
in case a of <some_alternatives>

\end{code}

In the above piece of core STG language, when the "a" closure is entered and 
finally "SomeConstructor" must be allocated, the code of "some_func" should 
check if the continuation leads to an update (in this case, an update of "a") 
or an continuation. In this case it leads to an update, so then the code should 
check if the update closure ("a") is large enough, and, in that case, build the 
constructor closure over it.

Maybe this technique is irrelevant for performance, too expensive to implement, 
or having a separated update stack is too inconvenient, or these effects are 
achieved by other ways?

Thanks in advance

Jose David

---------------------------------------------
Este mensaje lo ha enviado un Alumno de la Universidad de Malaga.
http://www.alumnos.uma.es/