[GHC] #15333: Weird cachegrind results for binary-trees

GHC ghc-devs at haskell.org
Thu Aug 30 15:45:45 UTC 2018


#15333: Weird cachegrind results for binary-trees
-------------------------------------+-------------------------------------
        Reporter:  sgraf             |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  Research
       Component:  NoFib benchmark   |  needed
  suite                              |              Version:  8.5
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by sgraf):

 I think I figured this out. This is due to nursery size, specifically the
 points in time after which the GC kicks in. For benchmarks with a working
 set of varying size, GC Gen 0 collection performance heavily depends on
 the size of the working set within the nursery ''at the time collection is
 triggered''. There obviously is no way to predict this reliably.

 In this case, the default nursery size seemed to be particularly
 unfortunate, as `+RTS -S` reveals:

 {{{
     Alloc    Copied     Live     GC     GC      TOT      TOT  Page Flts
     bytes     bytes     bytes   user   elap     user     elap
   1097344    490864    548288  0.001  0.001    0.002    0.002    0    0
 (Gen:  0)
   1044752    645640    706848  0.001  0.001    0.003    0.003    0    0
 (Gen:  0)
   1046272    592608    650032  0.001  0.001    0.004    0.004    0    0
 (Gen:  1)
   1079040     17808    699952  0.000  0.000    0.005    0.005    0    0
 (Gen:  0)
   1046272     17808    717104  0.000  0.000    0.006    0.006    0    0
 (Gen:  0)
   1046272     17808    734256  0.000  0.000    0.006    0.006    0    0
 (Gen:  0)
   1079040     17808    784176  0.000  0.000    0.007    0.007    0    0
 (Gen:  0)
   1046272     17808    801328  0.000  0.000    0.007    0.007    0    0
 (Gen:  0)
   1078160    111840    912488  0.000  0.000    0.007    0.007    0    0
 (Gen:  0)
   1036312     96056    911464  0.000  0.000    0.008    0.008    0    0
 (Gen:  0)
   1036288      6608    915560  0.000  0.000    0.008    0.008    0    0
 (Gen:  0)
   1036288      6608    919656  0.000  0.000    0.009    0.009    0    0
 (Gen:  0)
   1036288      6608    923752  0.000  0.000    0.009    0.009    0    0
 (Gen:  0)
   1036288      6608    927848  0.000  0.000    0.009    0.009    0    0
 (Gen:  0)
   1036288      6608    931944  0.000  0.000    0.010    0.010    0    0
 (Gen:  0)
   1036288      6608    936040  0.000  0.000    0.010    0.010    0    0
 (Gen:  0)
   1036288      6608    940136  0.000  0.000    0.011    0.011    0    0
 (Gen:  0)
   1044920     29264    966864  0.000  0.000    0.011    0.011    0    0
 (Gen:  0)
   1045504     25496    963536  0.000  0.000    0.011    0.011    0    0
 (Gen:  0)
   1045504      6160    964560  0.000  0.000    0.012    0.012    0    0
 (Gen:  0)
   1045504      6160    965584  0.000  0.000    0.012    0.012    0    0
 (Gen:  0)
   1045504      6160    966608  0.000  0.000    0.013    0.013    0    0
 (Gen:  0)
   1045504      6160    967632  0.000  0.000    0.013    0.013    0    0
 (Gen:  0)
   1045504      6160    968656  0.000  0.000    0.014    0.014    0    0
 (Gen:  0)
   1045504      6160    969680  0.000  0.000    0.014    0.014    0    0
 (Gen:  0)
   1047464     59208   1023728  0.000  0.000    0.015    0.015    0    0
 (Gen:  0)
   1047808     57968   1022896  0.000  0.000    0.015    0.015    0    0
 (Gen:  0)
   1047808     52912   1023120  0.000  0.000    0.016    0.016    0    0
 (Gen:  0)
   1047808     52944   1023376  0.000  0.000    0.016    0.016    0    0
 (Gen:  0)
   1047808     52944   1023632  0.000  0.000    0.017    0.017    0    0
 (Gen:  0)
   1047808     52944   1023888  0.000  0.000    0.017    0.017    0    0
 (Gen:  0)
   1047808     52944   1024144  0.000  0.000    0.018    0.018    0    0
 (Gen:  0)
   1047808     52944   1024400  0.000  0.000    0.018    0.018    0    0
 (Gen:  0)
   1048296    176376   1148064  0.000  0.000    0.018    0.018    0    0
 (Gen:  0)
   1048384    175792   1147856  0.000  0.000    0.019    0.019    0    0
 (Gen:  0)
   1048384    174288   1147856  0.000  0.000    0.019    0.019    0    0
 (Gen:  0)
   1048384    174288   1147856  0.000  0.000    0.020    0.020    0    0
 (Gen:  0)
   1048384    174288   1147856  0.000  0.000    0.020    0.020    0    0
 (Gen:  0)
   1048384    174288   1147856  0.000  0.000    0.021    0.021    0    0
 (Gen:  0)
   1048384    174288   1147856  0.000  0.000    0.021    0.021    0    0
 (Gen:  0)
   1048384    174288   1147856  0.000  0.000    0.022    0.022    0    0
 (Gen:  0)
    116160      3488     44576  0.000  0.000    0.022    0.022    0    0
 (Gen:  1)
         0                      0.000  0.000
 }}}

 This is the log for `bt1` (e.g. without the call to `trace`). Note the
 higher numbers in the `Copied bytes` column compared to `bt2` (with the
 redundant `trace` call), in particular the whole second half:

 {{{
     Alloc    Copied     Live     GC     GC      TOT      TOT  Page Flts
     bytes     bytes     bytes   user   elap     user     elap

   1097448    488888    546312  0.001  0.001    0.002    0.002    0    0
 (Gen:  0)
   1044664    645424    706840  0.001  0.001    0.004    0.004    0    0
 (Gen:  0)
   1046272    591480    648904  0.001  0.001    0.005    0.005    0    0
 (Gen:  1)
   1079040     17488    698824  0.000  0.000    0.006    0.006    0    0
 (Gen:  0)
   1046272     17488    715976  0.000  0.000    0.007    0.007    0    0
 (Gen:  0)
   1046272     17488    733128  0.000  0.000    0.008    0.008    0    0
 (Gen:  0)
   1079040     17488    783048  0.000  0.000    0.008    0.008    0    0
 (Gen:  0)
   1046272     17488    800200  0.000  0.000    0.009    0.009    0    0
 (Gen:  0)
   1078376    108336    908176  0.000  0.000    0.010    0.010    0    0
 (Gen:  0)
   1036312     96600    911232  0.000  0.000    0.011    0.011    0    0
 (Gen:  0)
   1036288      7088    915248  0.000  0.000    0.012    0.012    0    0
 (Gen:  0)
   1036288      7120    919344  0.000  0.000    0.012    0.012    0    0
 (Gen:  0)
   1036288      7120    923440  0.000  0.000    0.013    0.013    0    0
 (Gen:  0)
   1036288      7120    927536  0.000  0.000    0.014    0.014    0    0
 (Gen:  0)
   1036288      7120    931632  0.000  0.000    0.014    0.014    0    0
 (Gen:  0)
   1036288      7120    935728  0.000  0.000    0.015    0.015    0    0
 (Gen:  0)
   1036288      7120    939824  0.000  0.000    0.015    0.015    0    0
 (Gen:  0)
   1045264     24608    961384  0.000  0.000    0.015    0.015    0    0
 (Gen:  0)
   1045504     20824    958056  0.000  0.000    0.016    0.016    0    0
 (Gen:  0)
   1045504      1488    959080  0.000  0.000    0.016    0.016    0    0
 (Gen:  0)
   1045504      1488    960104  0.000  0.000    0.016    0.016    0    0
 (Gen:  0)
   1045504      1488    961128  0.000  0.000    0.017    0.017    0    0
 (Gen:  0)
   1045504      1488    962152  0.000  0.000    0.017    0.017    0    0
 (Gen:  0)
   1045504      1488    963176  0.000  0.000    0.017    0.017    0    0
 (Gen:  0)
   1045504      1488    964200  0.000  0.000    0.018    0.018    0    0
 (Gen:  0)
   1047600     53192   1016904  0.000  0.000    0.018    0.018    0    0
 (Gen:  0)
   1047808     51952   1016072  0.000  0.000    0.019    0.019    0    0
 (Gen:  0)
   1047808     46896   1016296  0.000  0.000    0.019    0.019    0    0
 (Gen:  0)
   1047808     46928   1016552  0.000  0.000    0.019    0.019    0    0
 (Gen:  0)
   1047808     46928   1016808  0.000  0.000    0.020    0.020    0    0
 (Gen:  0)
   1047808     46928   1017064  0.000  0.000    0.020    0.020    0    0
 (Gen:  0)
   1047808     46928   1017320  0.000  0.000    0.020    0.020    0    0
 (Gen:  0)
   1047808     46928   1017576  0.000  0.000    0.021    0.021    0    0
 (Gen:  0)
   1048176    168760   1139640  0.000  0.000    0.021    0.021    0    0
 (Gen:  0)
   1048384    168176   1139432  0.000  0.000    0.022    0.022    0    0
 (Gen:  0)
   1048384    166672   1139432  0.000  0.000    0.022    0.022    0    0
 (Gen:  0)
   1048384    166672   1139432  0.000  0.000    0.023    0.023    0    0
 (Gen:  0)
   1048384    166672   1139432  0.000  0.000    0.023    0.023    0    0
 (Gen:  0)
   1048384    166672   1139432  0.000  0.000    0.023    0.023    0    0
 (Gen:  0)
   1048384    166672   1139432  0.000  0.000    0.024    0.024    0    0
 (Gen:  0)
   1048384    166672   1139432  0.000  0.000    0.024    0.024    0    0
 (Gen:  0)
    123776      2504     43592  0.000  0.000    0.024    0.025    0    0
 (Gen:  1)
         0                      0.000  0.000
 }}}

 I see no easy way around this other than 'deactivating' the GC by choosing
 a big enough nursery, such as `-A11M` in this case. Taking the GC out of
 the equation in this way may be somewhat misrepresentative, but at least
 we get cachegrind numbers that can be trusted.

 There are other cases I hit this, like `bspt`. I improved allocation a
 bit, triggering GC at later points when the working set was bigger. Result
 was an alleged regression of 6% in counted instructions, even though the
 relevant optimized snippet in isolation got faster.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15333#comment:6>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list