[GHC] #10613: Mechanism for checking that we only enter single-entry thunks once

Tue Dec 22 10:17:08 UTC 2015

#10613: Mechanism for checking that we only enter single-entry thunks once
-------------------------------------+-------------------------------------
        Reporter:  bgamari           |                Owner:
            Type:  feature request   |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  7.10.1
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  Other             |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by nomeata):

 This is me thinking out aloud about this feature. Ignore me if you want,
 but if you are curious, feel free to jump in.

 Widening the scope of this ticket a bit (or shifting it?): Wouldn’t it be
 nice if we not only link the single-entry thunks, but also find out about
 missed opportunities for single-entry thunks? Such a counting would
 subsume the linting above.

 I’m considering to build on [Debugging/TickyTicky ticky-ticky debugging],
 as it already provides a lot of the necessary infrastructure. In
 particular, it counts entries to thunks, when `-ticky-dyn-thunk` is
 enabled. The number is not quite what we want, though: It seems that if a
 certain static thunk is created ''n'' times, then it will report ''n''
 entries; whereas in the scope of this ticket we want to know that each
 allocated thunk was entered once.

 Also, currently a dynamic (i.e. an instance of a static) thunk that is not
 marked as single-entry will always be reported to be entered 0 or 1 times,
 as the result is shared. How do we get better numbers here? One way might
 be a special heap object, let’s call it a “counting indirection”. It
 behaves mostly like a regular indirection, i.e. it’s entry code would
 count a tick and then jump to the entry code of the indirectee, but the
 crucial difference would be that the garbage collector will 'not' erase it
 when it evacuates memory. It will, however, evacuate the indirectee as
 usual. This way, the sharing semantics are preserved, but every entry via
 the thunk will be counted, not just the first.

 These counting indirections would be created by the code that creates a
 thunk. This way, this counting can be done independent of `-ticky-dyn-
 thunk`. These heap objects can also carry additional data used in
 counting, e.g. the name of the thunk (in `CorePrep` names) and whether GHC
 created this thunk as single-entry or not (which would not affect the
 behaviour of the counting indirection, but be useful for the reporting).

 I’m not sure yet what’s the best data structure to do the counting, if we
 indeed want to count uses of each dynamic instance of the thunk (or
 rather, of the counting indirection) separately. Maybe statically allocate
 'm' (maybe 1024) counters per thunk, and an index, and the first 'm'
 instances of the thunk use the these counter (each, upon first entry,
 incrementing the index to allocate its slot), and if there are more than
 1024 entries, then simply stop counting.

 I kind-of like this design, but I’m sure I’m missing some rather ugly
 details.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10613#comment:4>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler