[Git][ghc/ghc][wip/T22965] 6 commits: Refresh profiling docs
Ben Gamari (@bgamari)
gitlab at gitlab.haskell.org
Mon Feb 13 19:51:29 UTC 2023
Ben Gamari pushed to branch wip/T22965 at Glasgow Haskell Compiler / GHC
Commits:
1e9eac1c by Matthew Pickering at 2023-02-13T11:36:41+01:00
Refresh profiling docs
I went through the whole of the profiling docs and tried to amend them
to reflect current best practices and tooling. In particular I removed
some old references to tools such as hp2any and replaced them with
references to eventlog2html.
- - - - -
da208b9a by Matthew Pickering at 2023-02-13T11:36:41+01:00
docs: Add section about profiling and foreign calls
Previously there was no documentation for how foreign calls interacted
with the profiler. This can be quite confusing for users so getting it
into the user guide is the first step to a potentially better solution.
See the ticket for more insightful discussion.
Fixes #21764
- - - - -
081640f1 by Bodigrim at 2023-02-13T12:51:52-05:00
Document that -fproc-alignment was introduced only in GHC 8.6
- - - - -
b585225b by Ben Gamari at 2023-02-13T14:51:20-05:00
rts: Statically assert alignment of Capability
In #22965 we noticed that changes in the size of `Capability` can result
in unsound behavior due to the `align` pragma claiming an alignment
which we don't in practice observe. Avoid this by statically asserting
that the size is a multiple of the alignment.
- - - - -
04336d2f by Ben Gamari at 2023-02-13T14:51:20-05:00
rts: Introduce stgMallocAlignedBytes
- - - - -
4af27fea by Ben Gamari at 2023-02-13T14:51:20-05:00
rts: Correctly align Capability allocations
Previously we failed to tell the C allocator that `Capability`s needed
to be aligned, resulting in #22965.
- - - - -
7 changed files:
- docs/users_guide/debugging.rst
- + docs/users_guide/images/eventlog_profile.png
- docs/users_guide/profiling.rst
- rts/Capability.c
- rts/Capability.h
- rts/RtsUtils.c
- rts/RtsUtils.h
Changes:
=====================================
docs/users_guide/debugging.rst
=====================================
@@ -1046,6 +1046,8 @@ Checking for consistency
:shortdesc: Align functions at given boundary.
:type: dynamic
+ :since: 8.6.1
+
Align functions to multiples of the given value. Only valid values are powers
of two.
=====================================
docs/users_guide/images/eventlog_profile.png
=====================================
Binary files /dev/null and b/docs/users_guide/images/eventlog_profile.png differ
=====================================
docs/users_guide/profiling.rst
=====================================
@@ -10,17 +10,13 @@ Profiling
GHC comes with a time and space profiling system, so that you can answer
questions like "why is my program so slow?", or "why is my program using
-so much memory?".
+so much memory?". We'll start by describing how to do time profiling.
-Profiling a program is a three-step process:
+Time profiling a program is a three-step process:
1. Re-compile your program for profiling with the :ghc-flag:`-prof` option, and
probably one of the options for adding automatic annotations:
- :ghc-flag:`-fprof-auto` is the most common [1]_.
-
- If you are using external packages with :command:`cabal`, you may need to
- reinstall these packages with profiling support; typically this is
- done with ``cabal install -p package --reinstall``.
+ :ghc-flag:`-fprof-late` is the recommended option.
2. Having compiled the program for profiling, you now need to run it to
generate the profile. For example, a simple time profile can be
@@ -37,6 +33,9 @@ Profiling a program is a three-step process:
3. Examine the generated profiling information, use the information to
optimise your program, and repeat as necessary.
+The time profiler measures the CPU time taken by the Haskell code in your application.
+In particular time taken by safe foreign calls is not tracked by the profiler (see :ref:`prof-foreign-calls`).
+
.. _cost-centres:
Cost centres and cost-centre stacks
@@ -197,7 +196,10 @@ Inserting cost centres by hand
Cost centres are just program annotations. When you say ``-fprof-auto``
to the compiler, it automatically inserts a cost centre annotation
around every binding not marked INLINE in your program, but you are
-entirely free to add cost centre annotations yourself.
+entirely free to add cost centre annotations yourself. Be careful adding too many
+cost-centre annotations as the optimiser is careful to not move them around or
+remove them, which can severly affect how your program is optimised and hence the
+runtime performance!
The syntax of a cost centre annotation for expressions is ::
@@ -311,6 +313,39 @@ and become CAFs. You will probably need to consult the Core
.. index::
single: -fprof-cafs
+.. _prof-foreign-calls:
+
+Profiling and foreign calls
+---------------------------
+
+Simply put, the profiler includes time spent in unsafe foreign
+calls but ignores time taken in safe foreign calls. For example, time spent blocked on IO
+operations (e.g. ``getLine``) is not accounted for in the profile as ``getLine`` is implemented
+using a safe foreign call.
+
+The profiler estimates CPU time, for Haskell threads within the program only.
+In particular, time "taken" by the program in blocking safe foreign calls
+is not accounted for in time profiles. The runtime has the notion of a virtual
+processor which is known as a "capability". Haskell threads are run on capabilities,
+and the profiler samples the capabilities in order to determine what is being
+executed at a certain time. When a safe foreign call is executed, it's run outside
+the context of a capability; hence the sampling does not account for the time
+taken. Whilst the safe call is executed, other
+Haskell threads are free to run on the capability, and their cost will be attributed
+to the profiler. When the safe call is finished, the blocked, descheduled thread can
+be resumed and rescheduled.
+
+However, the time taken by blocking on unsafe foreign calls is accounted for in the profile.
+This happens because unsafe foreign calls are executed by the same capability
+their calling Haskell thread is running on. Therefore, an unsafe foreign call will
+block the entire capability whilst it is running, and any time the capability is
+sampled the "cost" of the foreign call will be attributed to the calling cost-centre stack.
+
+However, do note that you are not supposed to use unsafe foreign calls for any
+operations which do block! Do not be tempted to replace your safe foreign calls
+with unsafe calls just so they appear in the profile. This prevents GC from
+happening until the foreign call returns, which can be catastrophic for performance.
+
.. _prof-compiler-options:
Compiler options for profiling
@@ -356,7 +391,9 @@ Automatically placing cost-centres
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
GHC has a number of flags for automatically inserting cost-centres into the
-compiled program.
+compiled program. Use these options carefully because inserting too many cost-centres
+in the wrong places will mean the optimiser will be less effective and the runtime behaviour
+of your profiled program will be different to that of the unprofiled one.
.. ghc-flag:: -fprof-callers=⟨name⟩
:shortdesc: Auto-add ``SCC``\\ s to all call-sites of the named function.
@@ -618,8 +655,10 @@ enclosed between ``+RTS ... -RTS`` as usual):
JSON profile format
~~~~~~~~~~~~~~~~~~~
-When invoked with the :rts-flag:`-pj` flag the runtime will emit the cost-centre
-profile in a machine-readable JSON format. The top-level object of this format
+profile in a machine-readable JSON format. The JSON file can be directly loaded
+into `speedscope.app <https://www.speedscope.app/>`_ to interactively view the profile.
+
+The top-level object of this format
has the following properties,
``program`` (string)
@@ -758,8 +797,12 @@ For instance, a simple profile might look like this,
}
}
+Eventlog profile format
+~~~~~~~~~~~~~~~~~~~~~~~
-
+In addition to the ``.prof`` and ``.json`` formats the cost centre definitions
+and samples are also emitted to the :ref:`eventlog <rts-eventlog>`. The format
+of the events is specified in the :ref:`eventlog encodings <eventlog-encodings>` section.
.. _prof-heap:
@@ -774,18 +817,35 @@ program holds on to more memory at run-time that it needs to. Space
leaks lead to slower execution due to heavy garbage collector activity,
and may even cause the program to run out of memory altogether.
+Heap profiling differs from time profiling in the fact that is not always
+necessary to use the profiling runtime to generate a heap profile. There
+are two heap profiling modes (:rts-flag:`-hT` and :rts-flag:`-hi` [1]_) which are always
+available.
+
To generate a heap profile from your program:
-1. Compile the program for profiling (:ref:`prof-compiler-options`).
+1. Assuming you need the profiling runtime, compile the program for profiling (:ref:`prof-compiler-options`).
2. Run it with one of the heap profiling options described below (eg.
- :rts-flag:`-hc` for a basic producer profile). This generates the file
- :file:`{prog}.hp`.
+ :rts-flag:`-hc` for a basic producer profile) and enable the eventlog using :rts-flag:`-l <-l ⟨flags⟩>`.
- If the :ref:`event log <rts-eventlog>` is enabled (with the :rts-flag:`-l ⟨flags⟩`
- runtime system flag) heap samples will additionally be emitted to the GHC
+ Heap samples will be emitted to the GHC
event log (see :ref:`heap-profiler-events` for details about event format).
+3. Render the heap profile using `eventlog2html <https://hackage.haskell.org/package/eventlog2html>`_.
+ This produces an HTML file which contains the visualised profile.
+
+4. Open the rendered interactive profile in your web browser.
+
+For example, here is a heap profile produced of using eventlog profiling on GHC
+compiling the Cabal library. You can read a lot more about eventlog2html on the website.
+
+.. image:: images/eventlog_profile.*
+
+Note that there is the legacy :file:`{prog}.hp` format which has been deprecated
+in favour of eventlog based profiling. In order to render the legacy format, the
+steps are as follows.
+
3. Run :command:`hp2ps` to produce a Postscript file, :file:`{prog}.ps`. The
:command:`hp2ps` utility is described in detail in :ref:`hp2ps`.
@@ -797,10 +857,6 @@ from GHC's ``nofib`` benchmark suite,
.. image:: images/prof_scc.*
-You might also want to take a look at
-`hp2any <https://www.haskell.org/haskellwiki/Hp2any>`__, a more advanced
-suite of tools (not distributed with GHC) for displaying heap profiles.
-
Note that there might be a big difference between the OS reported memory usage
of your program and the amount of live data as reported by heap profiling.
The reasons for the difference are explained in :ref:`hints-os-memory`.
@@ -817,20 +873,14 @@ following RTS options select which break-down to use:
.. rts-flag:: -hT
- Breaks down the graph by heap closure type.
+ Breaks down the graph by heap closure type. This does not require the profiling
+ runtime.
.. rts-flag:: -hc
- -h
*Requires* :ghc-flag:`-prof`. Breaks down the graph by the cost-centre stack
which produced the data.
- .. note:: The meaning of the shortened :rts-flag:`-h` is dependent on whether
- your program was compiled for profiling. When compiled for profiling,
- :rts-flag:`-h` is equivalent to :rts-flag:`-hc`, but otherwise is
- equivalent to :rts-flag:`-hT` (see :ref:`rts-profiling`). The :rts-flag:`-h`
- is deprecated and will be removed in a future release.
-
.. rts-flag:: -hm
*Requires* :ghc-flag:`-prof`. Break down the live heap by the module
@@ -863,7 +913,7 @@ following RTS options select which break-down to use:
Break down the graph by the address of the info table of a closure. For this
to produce useful output the program must have been compiled with
- :ghc-flag:`-finfo-table-map`.
+ :ghc-flag:`-finfo-table-map` but it does not require the profiling runtime.
.. rts-flag:: -l
:noindex:
@@ -1041,6 +1091,14 @@ This trick isn't foolproof, because there might be other ``B`` closures in
the heap which aren't the retainers we are interested in, but we've
found this to be a useful technique in most cases.
+Precise Retainer Analysis
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you want to precisely answer questions about why a certain type of closure is
+retained then it is worthwhile using `ghc-debug <https://gitlab.haskell.org/ghc/ghc-debug>`_ which
+has a terminal interface which can be used to easily answer queries such as, what is retaining
+a certain closure.
+
.. _biography-prof:
Biographical Profiling
@@ -1120,6 +1178,9 @@ reasons for this:
allocated by foreign libraries, and data allocated by the RTS), and
``mmap()``\'d memory are not counted in the heap profile.
+For more discussion about understanding how understanding process residency see
+:ref:`hints-os-memory`.
+
.. _hp2ps:
``hp2ps`` -- Rendering heap profiles to PostScript
@@ -1242,123 +1303,6 @@ The flags are:
Print out usage information.
-.. _manipulating-hp:
-
-Manipulating the ``hp`` file
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-(Notes kindly offered by Jan-Willem Maessen.)
-
-The ``FOO.hp`` file produced when you ask for the heap profile of a
-program ``FOO`` is a text file with a particularly simple structure.
-Here's a representative example, with much of the actual data omitted:
-
-.. code-block:: none
-
- JOB "FOO -hC"
- DATE "Thu Dec 26 18:17 2002"
- SAMPLE_UNIT "seconds"
- VALUE_UNIT "bytes"
- BEGIN_SAMPLE 0.00
- END_SAMPLE 0.00
- BEGIN_SAMPLE 15.07
- ... sample data ...
- END_SAMPLE 15.07
- BEGIN_SAMPLE 30.23
- ... sample data ...
- END_SAMPLE 30.23
- ... etc.
- BEGIN_SAMPLE 11695.47
- END_SAMPLE 11695.47
-
-The first four lines (``JOB``, ``DATE``, ``SAMPLE_UNIT``,
-``VALUE_UNIT``) form a header. Each block of lines starting with
-``BEGIN_SAMPLE`` and ending with ``END_SAMPLE`` forms a single sample
-(you can think of this as a vertical slice of your heap profile). The
-hp2ps utility should accept any input with a properly-formatted header
-followed by a series of *complete* samples.
-
-Zooming in on regions of your profile
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-You can look at particular regions of your profile simply by loading a
-copy of the ``.hp`` file into a text editor and deleting the unwanted
-samples. The resulting ``.hp`` file can be run through ``hp2ps`` and
-viewed or printed.
-
-Viewing the heap profile of a running program
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The ``.hp`` file is generated incrementally as your program runs. In
-principle, running :command:`hp2ps` on the incomplete file should produce a
-snapshot of your program's heap usage. However, the last sample in the
-file may be incomplete, causing :command:`hp2ps` to fail. If you are using a
-machine with UNIX utilities installed, it's not too hard to work around
-this problem (though the resulting command line looks rather Byzantine):
-
-.. code-block:: sh
-
- head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
- | hp2ps > FOO.ps
-
-The command ``fgrep -n END_SAMPLE FOO.hp`` finds the end of every
-complete sample in ``FOO.hp``, and labels each sample with its ending
-line number. We then select the line number of the last complete sample
-using :command:`tail` and :command:`cut`. This is used as a parameter to :command:`head`; the
-result is as if we deleted the final incomplete sample from :file:`FOO.hp`.
-This results in a properly-formatted .hp file which we feed directly to
-:command:`hp2ps`.
-
-Viewing a heap profile in real time
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The :command:`gv` and :command:`ghostview` programs have a "watch file" option
-can be used to view an up-to-date heap profile of your program as it runs.
-Simply generate an incremental heap profile as described in the previous
-section. Run :command:`gv` on your profile:
-
-.. code-block:: sh
-
- gv -watch -orientation=seascape FOO.ps
-
-If you forget the ``-watch`` flag you can still select "Watch file" from
-the "State" menu. Now each time you generate a new profile ``FOO.ps``
-the view will update automatically.
-
-This can all be encapsulated in a little script:
-
-.. code-block:: sh
-
- #!/bin/sh
- head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
- | hp2ps > FOO.ps
- gv -watch -orientation=seascape FOO.ps &
- while [ 1 ] ; do
- sleep 10 # We generate a new profile every 10 seconds.
- head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
- | hp2ps > FOO.ps
- done
-
-Occasionally :command:`gv` will choke as it tries to read an incomplete copy of
-:file:`FOO.ps` (because :command:`hp2ps` is still running as an update occurs). A
-slightly more complicated script works around this problem, by using the
-fact that sending a SIGHUP to gv will cause it to re-read its input
-file:
-
-.. code-block:: sh
-
- #!/bin/sh
- head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
- | hp2ps > FOO.ps
- gv FOO.ps &
- gvpsnum=$!
- while [ 1 ] ; do
- sleep 10
- head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
- | hp2ps > FOO.ps
- kill -HUP $gvpsnum
- done
-
.. _prof-threaded:
Profiling Parallel and Concurrent Programs
@@ -1968,10 +1912,9 @@ Notes about ticky profiling
in some columns. For this reason using an eventlog-based approach should be prefered if
possible.
-
.. [1]
- :ghc-flag:`-fprof-auto` was known as ``-auto-all`` prior to
- GHC 7.4.1.
+ :rts-flag:`-hi` profiling is avaible with the normal runtime but you will need to
+ compile with :ghc-flag:`-finfo-table-map` to interpret the results.
.. [2]
Note that this policy has changed slightly in GHC 7.4.1 relative to
=====================================
rts/Capability.c
=====================================
@@ -438,8 +438,9 @@ moreCapabilities (uint32_t from USED_IF_THREADS, uint32_t to USED_IF_THREADS)
{
for (uint32_t i = 0; i < to; i++) {
if (i >= from) {
- capabilities[i] = stgMallocBytes(sizeof(Capability),
- "moreCapabilities");
+ capabilities[i] = stgMallocAlignedBytes(sizeof(Capability),
+ CAPABILITY_ALIGNMENT,
+ "moreCapabilities");
initCapability(capabilities[i], i);
}
}
=====================================
rts/Capability.h
=====================================
@@ -28,6 +28,16 @@
#include "BeginPrivate.h"
+// We never want a Capability to overlap a cache line with
+// anything else, so round it up to a cache line size:
+#if defined(s390x_HOST_ARCH)
+#define CAPABILITY_ALIGNMENT 256
+#elif !defined(mingw32_HOST_OS)
+#define CAPABILITY_ALIGNMENT 64
+#else
+#define CAPABILITY_ALIGNMENT 1
+#endif
+
/* N.B. This must be consistent with CapabilityPublic in RtsAPI.h */
struct Capability_ {
// State required by the STG virtual machine when running Haskell
@@ -169,14 +179,12 @@ struct Capability_ {
StgTRecHeader *free_trec_headers;
uint32_t transaction_tokens;
} // typedef Capability is defined in RtsAPI.h
- // We never want a Capability to overlap a cache line with anything
- // else, so round it up to a cache line size:
-#if defined(s390x_HOST_ARCH)
- ATTRIBUTE_ALIGNED(256)
-#elif !defined(mingw32_HOST_OS)
- ATTRIBUTE_ALIGNED(64)
-#endif
- ;
+ ATTRIBUTE_ALIGNED(CAPABILITY_ALIGNMENT)
+;
+
+// We allocate arrays of Capabilities therefore we must ensure that the size is
+// a multiple of the claimed alignment
+GHC_STATIC_ASSERT(sizeof(struct Capability_) % CAPABILITY_ALIGNMENT == 0, "Capability size does not match cache size");
#if defined(THREADED_RTS)
#define ASSERT_TASK_ID(task) ASSERT(task->id == osThreadId())
=====================================
rts/RtsUtils.c
=====================================
@@ -56,10 +56,45 @@ extern char *ctime_r(const time_t *, char *);
void *
stgMallocBytes (size_t n, char *msg)
+{
+ void *space = malloc(n);
+
+ if (space == NULL) {
+ /* Quoting POSIX.1-2008 (which says more or less the same as ISO C99):
+ *
+ * "Upon successful completion with size not equal to 0, malloc() shall
+ * return a pointer to the allocated space. If size is 0, either a null
+ * pointer or a unique pointer that can be successfully passed to free()
+ * shall be returned. Otherwise, it shall return a null pointer and set
+ * errno to indicate the error."
+ *
+ * Consequently, a NULL pointer being returned by `malloc()` for a 0-size
+ * allocation is *not* to be considered an error.
+ */
+ if (n == 0) return NULL;
+
+ /* don't fflush(stdout); WORKAROUND bug in Linux glibc */
+ rtsConfig.mallocFailHook((W_) n, msg);
+ stg_exit(EXIT_INTERNAL_ERROR);
+ }
+ IF_DEBUG(zero_on_gc, memset(space, 0xbb, n));
+ return space;
+}
+
+void *
+stgMallocAlignedBytes (size_t n, size_t align, char *msg)
{
void *space;
- if ((space = malloc(n)) == NULL) {
+#if defined(mingw32_HOST_OS)
+ space = _aligned_malloc(n, align);
+#else
+ if (posix_memalign(&space, align, n)) {
+ space = NULL; // Allocation failed
+ }
+#endif
+
+ if (space == NULL) {
/* Quoting POSIX.1-2008 (which says more or less the same as ISO C99):
*
* "Upon successful completion with size not equal to 0, malloc() shall
=====================================
rts/RtsUtils.h
=====================================
@@ -29,6 +29,10 @@ void *stgMallocBytes(size_t n, char *msg)
* See: https://gitlab.haskell.org/ghc/ghc/-/issues/22380
*/
+void *stgMallocAlignedBytes(size_t n, size_t align, char *msg)
+ STG_MALLOC STG_MALLOC1(stgFree)
+ STG_ALLOC_SIZE1(1);
+
void *stgReallocBytes(void *p, size_t n, char *msg)
STG_MALLOC1(stgFree)
STG_ALLOC_SIZE1(2)
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/d5f765ce080f5f6877a32665a7b0a7c443805a84...4af27feabf482cf6b611951443e05ee7e53acb39
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/d5f765ce080f5f6877a32665a7b0a7c443805a84...4af27feabf482cf6b611951443e05ee7e53acb39
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230213/3b09684a/attachment-0001.html>
More information about the ghc-commits
mailing list