[Haskell-cafe] STG optimization

Wed Jul 21 07:17:41 EDT 2004

On 20 July 2004 22:58, 7253146 at informatica.alumnos.uma.es wrote:

> So far so good. Nonetheless, some doubts have arisen when I have
> started to try to add updating. In the paper, implementation for
> "update in place" is extensively discussed, although it is said it
> could greatly increase the memory space usage. ¿Has this risk been
> measured? I think deforestation can minimize the risk; however, as
> far as I know, deforestation is only applied to lists. 

Nowadays GHC doesn't do update-in-place.  We removed update-in-place when we re-implemented the code generator and RTS back in version 4.00.  There were two reasons:

 - update-in-place (and its friend, return-in-registers) turned out to
   be hugely complex when combined with all the other optimisations
   that GHC does.

 - we achieve much of the same benefit by adding unboxed tuples to
   the language and using the "Constructed Product Result"
   optimisation.  CPR is now done by the strictness analyser.

Nevertheless, for a while we still did update-in-place when the code genreator could statically determine that there was an update frame on the top of the stack when building a constructor application.  But this could only be done when the constructor application contained no pointers - because otherwise the update-in-place code has to deal with the complexities of mutation in a generational garbage collector.  

But when combined with concurrency, even this kind of update-in-place is difficult, because it might be that another thread is waiting on the result of the thunk being evaluated, and the update code has to wake up that thread.  So now, all updates go through one carefully optimised piece of code in the runtime.

> But the STG paper ignores an optimization readyly available to me:
> whenever a constructor is allocated (constructor's register
> allocation aside, since the push/enter - eval/apply paper says GHC no
> longer supports it), the STG machine (the one described by the paper)
> takes heap space for it, although an update (indirection or copying)
> is going to be done most of times. This leads to potentially bad heap
> space usage and more garbage collections. 
> 
> I think update frames may be detected by the constructor allocation
> code, so it could determine whether to allocate a new constructor or
> build it over the closure to be updated. This could be achieved by
> marking continuations and update frames in the stack, so constructor
> allocation code could distinguish between them. This is inconvenient
> because common continuations should remove the mark too, so I do not
> like it. 

Yes you could do this.  Don't forget to take into account the complications I mentioned above, though (much easier if you don't have generational collection and concurrency).  Also you have to know whether the thunk is big enough to be updated with the constructor application.

Cheers,
	Simon