[GHC] #15333: Weird cachegrind results for binary-trees
GHC
ghc-devs at haskell.org
Thu Aug 30 15:45:45 UTC 2018
#15333: Weird cachegrind results for binary-trees
-------------------------------------+-------------------------------------
Reporter: sgraf | Owner: (none)
Type: bug | Status: new
Priority: normal | Milestone: Research
Component: NoFib benchmark | needed
suite | Version: 8.5
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by sgraf):
I think I figured this out. This is due to nursery size, specifically the
points in time after which the GC kicks in. For benchmarks with a working
set of varying size, GC Gen 0 collection performance heavily depends on
the size of the working set within the nursery ''at the time collection is
triggered''. There obviously is no way to predict this reliably.
In this case, the default nursery size seemed to be particularly
unfortunate, as `+RTS -S` reveals:
{{{
Alloc Copied Live GC GC TOT TOT Page Flts
bytes bytes bytes user elap user elap
1097344 490864 548288 0.001 0.001 0.002 0.002 0 0
(Gen: 0)
1044752 645640 706848 0.001 0.001 0.003 0.003 0 0
(Gen: 0)
1046272 592608 650032 0.001 0.001 0.004 0.004 0 0
(Gen: 1)
1079040 17808 699952 0.000 0.000 0.005 0.005 0 0
(Gen: 0)
1046272 17808 717104 0.000 0.000 0.006 0.006 0 0
(Gen: 0)
1046272 17808 734256 0.000 0.000 0.006 0.006 0 0
(Gen: 0)
1079040 17808 784176 0.000 0.000 0.007 0.007 0 0
(Gen: 0)
1046272 17808 801328 0.000 0.000 0.007 0.007 0 0
(Gen: 0)
1078160 111840 912488 0.000 0.000 0.007 0.007 0 0
(Gen: 0)
1036312 96056 911464 0.000 0.000 0.008 0.008 0 0
(Gen: 0)
1036288 6608 915560 0.000 0.000 0.008 0.008 0 0
(Gen: 0)
1036288 6608 919656 0.000 0.000 0.009 0.009 0 0
(Gen: 0)
1036288 6608 923752 0.000 0.000 0.009 0.009 0 0
(Gen: 0)
1036288 6608 927848 0.000 0.000 0.009 0.009 0 0
(Gen: 0)
1036288 6608 931944 0.000 0.000 0.010 0.010 0 0
(Gen: 0)
1036288 6608 936040 0.000 0.000 0.010 0.010 0 0
(Gen: 0)
1036288 6608 940136 0.000 0.000 0.011 0.011 0 0
(Gen: 0)
1044920 29264 966864 0.000 0.000 0.011 0.011 0 0
(Gen: 0)
1045504 25496 963536 0.000 0.000 0.011 0.011 0 0
(Gen: 0)
1045504 6160 964560 0.000 0.000 0.012 0.012 0 0
(Gen: 0)
1045504 6160 965584 0.000 0.000 0.012 0.012 0 0
(Gen: 0)
1045504 6160 966608 0.000 0.000 0.013 0.013 0 0
(Gen: 0)
1045504 6160 967632 0.000 0.000 0.013 0.013 0 0
(Gen: 0)
1045504 6160 968656 0.000 0.000 0.014 0.014 0 0
(Gen: 0)
1045504 6160 969680 0.000 0.000 0.014 0.014 0 0
(Gen: 0)
1047464 59208 1023728 0.000 0.000 0.015 0.015 0 0
(Gen: 0)
1047808 57968 1022896 0.000 0.000 0.015 0.015 0 0
(Gen: 0)
1047808 52912 1023120 0.000 0.000 0.016 0.016 0 0
(Gen: 0)
1047808 52944 1023376 0.000 0.000 0.016 0.016 0 0
(Gen: 0)
1047808 52944 1023632 0.000 0.000 0.017 0.017 0 0
(Gen: 0)
1047808 52944 1023888 0.000 0.000 0.017 0.017 0 0
(Gen: 0)
1047808 52944 1024144 0.000 0.000 0.018 0.018 0 0
(Gen: 0)
1047808 52944 1024400 0.000 0.000 0.018 0.018 0 0
(Gen: 0)
1048296 176376 1148064 0.000 0.000 0.018 0.018 0 0
(Gen: 0)
1048384 175792 1147856 0.000 0.000 0.019 0.019 0 0
(Gen: 0)
1048384 174288 1147856 0.000 0.000 0.019 0.019 0 0
(Gen: 0)
1048384 174288 1147856 0.000 0.000 0.020 0.020 0 0
(Gen: 0)
1048384 174288 1147856 0.000 0.000 0.020 0.020 0 0
(Gen: 0)
1048384 174288 1147856 0.000 0.000 0.021 0.021 0 0
(Gen: 0)
1048384 174288 1147856 0.000 0.000 0.021 0.021 0 0
(Gen: 0)
1048384 174288 1147856 0.000 0.000 0.022 0.022 0 0
(Gen: 0)
116160 3488 44576 0.000 0.000 0.022 0.022 0 0
(Gen: 1)
0 0.000 0.000
}}}
This is the log for `bt1` (e.g. without the call to `trace`). Note the
higher numbers in the `Copied bytes` column compared to `bt2` (with the
redundant `trace` call), in particular the whole second half:
{{{
Alloc Copied Live GC GC TOT TOT Page Flts
bytes bytes bytes user elap user elap
1097448 488888 546312 0.001 0.001 0.002 0.002 0 0
(Gen: 0)
1044664 645424 706840 0.001 0.001 0.004 0.004 0 0
(Gen: 0)
1046272 591480 648904 0.001 0.001 0.005 0.005 0 0
(Gen: 1)
1079040 17488 698824 0.000 0.000 0.006 0.006 0 0
(Gen: 0)
1046272 17488 715976 0.000 0.000 0.007 0.007 0 0
(Gen: 0)
1046272 17488 733128 0.000 0.000 0.008 0.008 0 0
(Gen: 0)
1079040 17488 783048 0.000 0.000 0.008 0.008 0 0
(Gen: 0)
1046272 17488 800200 0.000 0.000 0.009 0.009 0 0
(Gen: 0)
1078376 108336 908176 0.000 0.000 0.010 0.010 0 0
(Gen: 0)
1036312 96600 911232 0.000 0.000 0.011 0.011 0 0
(Gen: 0)
1036288 7088 915248 0.000 0.000 0.012 0.012 0 0
(Gen: 0)
1036288 7120 919344 0.000 0.000 0.012 0.012 0 0
(Gen: 0)
1036288 7120 923440 0.000 0.000 0.013 0.013 0 0
(Gen: 0)
1036288 7120 927536 0.000 0.000 0.014 0.014 0 0
(Gen: 0)
1036288 7120 931632 0.000 0.000 0.014 0.014 0 0
(Gen: 0)
1036288 7120 935728 0.000 0.000 0.015 0.015 0 0
(Gen: 0)
1036288 7120 939824 0.000 0.000 0.015 0.015 0 0
(Gen: 0)
1045264 24608 961384 0.000 0.000 0.015 0.015 0 0
(Gen: 0)
1045504 20824 958056 0.000 0.000 0.016 0.016 0 0
(Gen: 0)
1045504 1488 959080 0.000 0.000 0.016 0.016 0 0
(Gen: 0)
1045504 1488 960104 0.000 0.000 0.016 0.016 0 0
(Gen: 0)
1045504 1488 961128 0.000 0.000 0.017 0.017 0 0
(Gen: 0)
1045504 1488 962152 0.000 0.000 0.017 0.017 0 0
(Gen: 0)
1045504 1488 963176 0.000 0.000 0.017 0.017 0 0
(Gen: 0)
1045504 1488 964200 0.000 0.000 0.018 0.018 0 0
(Gen: 0)
1047600 53192 1016904 0.000 0.000 0.018 0.018 0 0
(Gen: 0)
1047808 51952 1016072 0.000 0.000 0.019 0.019 0 0
(Gen: 0)
1047808 46896 1016296 0.000 0.000 0.019 0.019 0 0
(Gen: 0)
1047808 46928 1016552 0.000 0.000 0.019 0.019 0 0
(Gen: 0)
1047808 46928 1016808 0.000 0.000 0.020 0.020 0 0
(Gen: 0)
1047808 46928 1017064 0.000 0.000 0.020 0.020 0 0
(Gen: 0)
1047808 46928 1017320 0.000 0.000 0.020 0.020 0 0
(Gen: 0)
1047808 46928 1017576 0.000 0.000 0.021 0.021 0 0
(Gen: 0)
1048176 168760 1139640 0.000 0.000 0.021 0.021 0 0
(Gen: 0)
1048384 168176 1139432 0.000 0.000 0.022 0.022 0 0
(Gen: 0)
1048384 166672 1139432 0.000 0.000 0.022 0.022 0 0
(Gen: 0)
1048384 166672 1139432 0.000 0.000 0.023 0.023 0 0
(Gen: 0)
1048384 166672 1139432 0.000 0.000 0.023 0.023 0 0
(Gen: 0)
1048384 166672 1139432 0.000 0.000 0.023 0.023 0 0
(Gen: 0)
1048384 166672 1139432 0.000 0.000 0.024 0.024 0 0
(Gen: 0)
1048384 166672 1139432 0.000 0.000 0.024 0.024 0 0
(Gen: 0)
123776 2504 43592 0.000 0.000 0.024 0.025 0 0
(Gen: 1)
0 0.000 0.000
}}}
I see no easy way around this other than 'deactivating' the GC by choosing
a big enough nursery, such as `-A11M` in this case. Taking the GC out of
the equation in this way may be somewhat misrepresentative, but at least
we get cachegrind numbers that can be trusted.
There are other cases I hit this, like `bspt`. I improved allocation a
bit, triggering GC at later points when the working set was bigger. Result
was an alleged regression of 6% in counted instructions, even though the
relevant optimized snippet in isolation got faster.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15333#comment:6>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list