[Haskell-cafe] Estimating the time to garbage collect

Fri May 1 04:14:13 EDT 2009

Hi

With the discussion on threads and priority, and given that (in  
Stats.c) there are lots of useful pieces of information that the run  
time system is collecting, some of which is already visible (like the  
total amount of memory mutated) and it is easy to make other measures  
available - it has raised this question in my mind:

Given that you have access to that information (the stuff that comes  
out at the end of a run if you use +RTS -S) is it possible to estimate  
the time a GC will take before asking for one?

Ignoring, at least for the moment, all the issues of paging, processor  
cache occupancy etc, what are the complexity drivers for the time to GC?

I realise that it is going to depend on things like, volume of data  
mutated, count of objects mutated, what fraction of them are live etc  
- and even if it turns out that these things are very program specific  
then I have a follow-on question - what properties do you need from  
your program to be able to construct a viable estimate of GC time from  
a past history of such garbage collections?

Why am I interested? There are all manners of 'real time' in systems,  
there is a vast class where a statistical bound (ie some sort of 'time  
to complete' CDF) is more than adequate for production use. If this is  
possible then it opens up areas where all the lovely properties of  
haskell can be exploited if only you had confidence in the timing  
behaviour.

Cheers
Neil