[ANNOUNCE] GHC 8.2.1 release candidate 1

Tue Apr 11 21:39:57 UTC 2017

> However, I do see there might be room for a project on the statistical
> profiler itself or its associated tooling. We just need to come to a
> conclusion on which direction is most appropriate for GHC.

You mean - given a choice between somehow reusing perf or going down
the route of fully custom tooling?

> For this having some concrete use-cases would be quite helpful. How do
> you envision using statistical profiling on Haskell projects? What is
> the minimal set of features that would make for a useful profiler?

That sounds like a good way to approach this. Here goes...

I'd really prefer seeing a Haskell program as a black box, that I can
profile using the same tools as C programs or native code generated
from any other language. It shouldn't matter that the source is
Haskell. In my ideal workflow, I have a *vanilla* Haskell program
compiled with debug symbols by a *vanilla* GHC (no special ./configure
options as prereqs), that I can hook up perf to, e.g.

$ perf record -g ./mybinary

Then I should be able to use perf report to analyze the results. Or
indeed use existing pipelines to obtain other visualizations (flame
graphs etc).

I'm not particularly interested in integration with the event log,
though others might have a need for that.

I'm also interested in hotspot analysis, à la perf annotate.

As Brendan Gregg says, "perf isn't some random tool: it's part of the
Linux kernel, and is actively developed and enhanced."

I need accurate and informative stack samples (no STG internal details
in the output that I can't connect back to source locations) for
programs that include all manner of FFI calls. Better still if time
spent in the GC doesn't pollute my stack samples.

The tricky part is that for flame graphs you need to sample stacks,
and for that you need to teach perf how to collect that data somehow,
since the C stack isn't used for haskell activation frames and we have
a call-by-need evaluation strategy anyways. But the slow option you
mention in the status page sounds okayish to me, and using eBPF to
perform stack sampling entirely from the kernel looks like a promising
direction.