[GHC] #15999: Stabilise nofib runtime measurements
GHC
ghc-devs at haskell.org
Thu Dec 13 05:26:46 UTC 2018
#15999: Stabilise nofib runtime measurements
-------------------------------------+-------------------------------------
Reporter: sgraf | Owner: (none)
Type: task | Status: new
Priority: normal | Milestone: ⊥
Component: NoFib benchmark | Version: 8.6.2
suite |
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: #5793 #9476 | Differential Rev(s): Phab:D5438
#15333 #15357 |
Wiki Page: |
-------------------------------------+-------------------------------------
Changes (by osa1):
* cc: osa1 (added)
* differential: => Phab:D5438
Comment:
Thanks for doing this!
I think running the benchmarks multiple times is a good idea. That's what
`criterion` does, and it provides quite reliable results, even for very
fast
programs.
That said, looking at the patch and your `paraffins` example, I have some
questions:
- I wonder if it'd be better to run the process multiple times, instead of
running the `main` function multiple times in the program. Why? That way
we
know GHC won't fuse or somehow optimize the `replicateM_ 100` call in
the
program, and we properly reset all the resources/global state (both the
program's and the runtime system's, e.g. weaks pointers, threads, stable
names). It just seems more reliable.
- Of course this would make the analysis harder as each run will print
GC
stats which we need to parse and somehow combine ...
- That said, I wonder if GC numbers are important for the purposes of
nofib.
In nofib we care about allocations and runtimes, as long as these
numbers
are stable it should be fine. So perhaps it's not too hard to repeat
the
process run instead of `main` function.
- You say "GC wibbles", but I'm not sure if these are actually GC wibbles.
I
just checked paraffins: it doesn't do any IO (other than printing the
results), and it's not even threaded (does not use threaded runtime,
does not
do `forkIO`). So I think it should be quite deterministic, and I think
any
wibbles are due to OS side of things. In other words, if we could have
an OS
that only runs `paraffins` and nothing else I think the results would be
quite
deterministic.
Of course this doesn't change the fact that we're getting non-
deterministic
results and we should do something about it, I'm just trying to
understand the
root cause here.
On my first point: if a solution for benchmarking "processes" (instead of
"functions") using criterion-style iteration (by which I mean "provides
stable
results") I think it may worth trying. Few years back we used `hsbencher`
for
this purpose at IU, but IIRC it's a bit too heavy (lots of dependencies),
and it
seems unmaintained now. I vaguely recall another program for this purpose
but I
can't remember the name...
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15999#comment:2>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list