Behavior of the -H RTS option, possible doc/impl mismatch

Thu Feb 17 13:36:31 CET 2011

On 16/02/2011 08:24, Akio Takano wrote:
> Hi,
>
> I have questions regarding to the -H RTS option. I use GHC 7.0.1 on
> Linux x86-64.
>
> The User's Guide says:
>
> -Hsize  [Default: 0] This option provides a “suggested heap size” for
> the garbage collector. The garbage collector will use about this much
> memory until the program residency grows and the heap size needs to be
> expanded to retain reasonable performance.
>
> However the actual behavior seems to be quite different. As an
> example, for a particular program:
>
> ./a.out +RTS -N7 -A256M -H2G uses around 7 GBytes of memory
> ./a.out +RTS -N7 -A256M -H6G uses around 13 GBytes of memory
>
> If the User's Guide is correct, changing -H2G to -H6G should not
> increase the heap usage beyond 6 GBytes.
>
> In the rts source, I see the parameter value
> (RtsFlags.GcFlags.heapSizeSuggestion) is used only to adjust the size
> of the allocation areas, not the entire heap.
>
> How is the -H option supposed to behave? How does it behave currently?

It works by estimating how much memory will be required by the next GC, 
subtracting that from the -H value, and dividing up the remainder 
between the allocation areas (that's why it only affects the allocation 
area sizes).

You could easily exceed the -H size by allocating huge arrays, for example.

There is room for error in the "estimating" part. In the worst case the 
next GC could need to copy the entire heap, but that never happens in 
practice, so we estimate how much of the heap will be copied.  If we get 
it wrong, then we end up exceeding the -H size.  If we were too 
conservative, then we would end up using less than the -H size in most 
cases.  I did actually try this: it gave strange results, e.g. when 
specifying -H64m on the command line the RTS would use only 40m or so, 
and run slower than the current -H algorithm.

Anyway, with -N2 and above I don't recommend using -H, generally I've 
found it results in lower performance.  -A1m might be good if your CPUs 
have larger L2 caches.  I have some local patches that implement an 
option like -H but which applies to the old generation sizing rather 
than the nursery, which tends to work better with -N2 and above.

Cheers,
	Simon