[GHC] #15333: Weird cachegrind results for binary-trees
GHC
ghc-devs at haskell.org
Mon Jul 2 15:14:33 UTC 2018
#15333: Weird cachegrind results for binary-trees
-------------------------------------+-------------------------------------
Reporter: sgraf | Owner: (none)
Type: bug | Status: new
Priority: normal | Milestone: Research
Component: NoFib benchmark | needed
suite | Version: 8.5
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Description changed by sgraf:
Old description:
> I'm currently investigating an alleged regression in my branch of the
> late lambda lift and hit a confusing data point. Note that I'm very much
> relying on cachegrinds counted instructions/memory accesses for my
> findings.
>
> Check out the most recent version of `nofib` and run the following
> script:
>
> {{{#!sh
> #! /usr/bin/env zsh
> sed -i 's/import Debug.Trace//g' Main.hs # Make the following line
> idempotent
> echo "import Debug.Trace" | cat - Main.hs > Main.tmp && mv Main.tmp
> Main.hs # add the import for trace
> sed -i 's/trace "" \$ bit/bit/g' Main.hs # strip `trace $ ` prefixes in
> the call to `bit`
> ghc -O2 -XBangPatterns Main.hs -o bt1
> sed -i 's/bit/trace "" $ bit/g' Main.hs # prepend `trace $ ` to the call
> to `bit`
> ghc -O2 -XBangPatterns Main.hs -o bt2
> valgrind --tool=cachegrind ./bt1 12 2>&1 > /dev/null # without trace
> valgrind --tool=cachegrind ./bt2 12 2>&1 > /dev/null # with trace
> }}}
>
> This will compile two versions of `binary-trees`, one with an extra
> `trace "" $` call before the only call to the `bit` function. One would
> expect the version with the `trace` call (`bt1`) to allocate more than
> the version without (`bt2`). Indeed, the output of `+RTS -s` suggests
> that:
>
> {{{
> $ ./bt1 12 +RTS -s
> ...
> 43,107,560 bytes allocated in the heap
> ...
> $ ./bt2 12 +RTS -s
> ...
> 43,116,888 bytes allocated in the heap
> ...
> }}}
>
> That's fine. A few benchmark runs by hand also suggested the tracing
> version is a little slower (probably due to IO).
>
> Compare that to the output of the above cachegrind calls:
>
> {{{
> $ valgrind --tool=cachegrind ./bt1 12 > /dev/null
> ...
> I refs: 118,697,583
> ...
> D refs: 43,475,212
> ...
> $ valgrind --tool=cachegrind ./bt2 12 > /dev/null
> ...
> I refs: 116,340,710
> ...
> D refs: 42,523,369
> ...
> }}}
>
> It's the other way round here! How's that possible?
>
> Even if this isn't strictly a bug in GHC or NoFib, it's relevant
> nonetheless, as our benchmark infrastructure currently relies on
> instruction counts. I couldn't reproduce this by writing my own no-op
> `trace _ a = a; {-# NOINLINE trace #-}`, btw.
>
> I checked this on GHC 8.2.2 and a semi-recent HEAD commit
> (bb539cfe335eeec7989332b859b1f3162c5e105a).
New description:
I'm currently investigating an alleged regression in my branch of the late
lambda lift and hit a confusing data point. Note that I'm very much
relying on cachegrinds counted instructions/memory accesses for my
findings.
Check out the most recent version of `nofib` and run the following script:
{{{#!sh
#! /bin/sh
sed -i 's/import Debug.Trace//g' Main.hs # Make the following line
idempotent
echo "import Debug.Trace" | cat - Main.hs > Main.tmp && mv Main.tmp
Main.hs # add the import for trace
sed -i 's/trace "" \$ bit/bit/g' Main.hs # strip `trace $ ` prefixes in
the call to `bit`
ghc -O2 -XBangPatterns Main.hs -o bt1
sed -i 's/bit/trace "" $ bit/g' Main.hs # prepend `trace $ ` to the call
to `bit`
ghc -O2 -XBangPatterns Main.hs -o bt2
valgrind --tool=cachegrind ./bt1 12 2>&1 > /dev/null # without trace
valgrind --tool=cachegrind ./bt2 12 2>&1 > /dev/null # with trace
}}}
This will compile two versions of `binary-trees`, one with an extra `trace
"" $` call before the only call to the `bit` function. One would expect
the version with the `trace` call (`bt1`) to allocate more than the
version without (`bt2`). Indeed, the output of `+RTS -s` suggests that:
{{{
$ ./bt1 12 +RTS -s
...
43,107,560 bytes allocated in the heap
...
$ ./bt2 12 +RTS -s
...
43,116,888 bytes allocated in the heap
...
}}}
That's fine. A few benchmark runs by hand also suggested the tracing
version is a little slower (probably due to IO).
Compare that to the output of the above cachegrind calls:
{{{
$ valgrind --tool=cachegrind ./bt1 12 > /dev/null
...
I refs: 118,697,583
...
D refs: 43,475,212
...
$ valgrind --tool=cachegrind ./bt2 12 > /dev/null
...
I refs: 116,340,710
...
D refs: 42,523,369
...
}}}
It's the other way round here! How's that possible?
Even if this isn't strictly a bug in GHC or NoFib, it's relevant
nonetheless, as our benchmark infrastructure currently relies on
instruction counts. I couldn't reproduce this by writing my own no-op
`trace _ a = a; {-# NOINLINE trace #-}`, btw.
I checked this on GHC 8.2.2 and a semi-recent HEAD commit
(bb539cfe335eeec7989332b859b1f3162c5e105a).
--
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15333#comment:1>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list