Questions about "time and space profiling for non-strict langs" paper and cost centre impl. of GHC

Thu May 15 15:34:54 UTC 2014

Ömer Sinan Ağacan wrote:
> To me it looks like that there should be two costs attributed in
> application rules. First cost is "evaluation of the function"
> part(which should be attributed in `app1` rule) and second is
> "substitution" part, which should be attributed in `app2` rule but
> according to the paper we only have the former cost, and the latter is
> free(e.g. no costs are attributed in second rule).

Well, sort of. "app1" and "app2" are just the two stages of applying a
function: First the function expression gets evaluated, then the actual
function gets called. In this case "A" stands for both costs, even if
they don't necessarily happen right after each other. As we are only
interested in the cost sum, this is the easiest way of doing it.

> It would be appreciated if someone could help me clarifying this. I'm
> also looking for more recent papers about GHC's profiler
> implementation. I'm especially interested profiling in multi-threaded
> RTS.

There have been no new papers that I know of, but we had a talk by Simon
Marlow[1] about improvements to cost-centre stacks, as well as a more
precise description of the modern semantics by Edward Z. Yang[2].

[1] https://www.youtube.com/watch?v=J0c4L-AURDQ
[2] http://blog.ezyang.com/2013/09/cost-semantics-for-stg-in-modern-ghc

> Lastly, it'd be appreciated if someone could explain me two Bool
> fields in `StgSCC` constructor of STG syntax.

Cost centres have two somewhat separate profiling mechanisms:
1. Cost gets attributed
2. Entry counts are counted

Sometimes it can be beneficial for the optimiser to separate the two.
For example, if we have something like

  case e of
    D -> scc<...> let f = e1 in e2

and we want to float the "f" binding outwards we would do:

  let f = scc<...> e1 in
    case e of
      D -> scc<...> e2

However, if we just implemented it like this, we would see the
entry-count to the cost-centre increase quite a bit, because now we are
also counting every entry to "f". This is why the compiler can mark the
duplicated tick as "non-counting".

> UPDATE: I was reading the paper one more time before sending my
> question and I found one more confusing part. In 4.1 it says "Lexical
> scoping will subsume all the costs of applying `and1` to its call
> sites..." but in 2.4 it says "Lexical scoping: Attribute the cost to
> "fun", the cost centre which lexically encloses the abstraction." so
> this two definitions of same thing is actually different. To me it
> looks like the definition in 4.1 is wrong,, it should be "definition
> site", not "call site". What am I missing here?

This is a special case introduced in 2.3.2: Top-level functions get the
cost-centre "SUB", which always makes them take the cost-centre of the
caller. You are right in that this doesn't quite match the "literal"
interpretation of lexical scoping.

Greetings,
  Peter Wortmann