[Git][ghc/ghc][wip/profiling-docs-refresh] 2 commits: Refresh profiling docs

Matthew Pickering (@mpickering) gitlab at gitlab.haskell.org
Mon Jan 30 10:01:10 UTC 2023



Matthew Pickering pushed to branch wip/profiling-docs-refresh at Glasgow Haskell Compiler / GHC


Commits:
0defd10d by Matthew Pickering at 2023-01-27T14:54:12+00:00
Refresh profiling docs

I went through the whole of the profiling docs and tried to amend them
to reflect current best practices and tooling. In particular I removed
some old references to tools such as hp2any and replaced them with
references to eventlog2html.

- - - - -
e6580297 by Matthew Pickering at 2023-01-30T10:00:55+00:00
docs: Add section about profiling and foreign calls

Previously there was no documentation for how foreign calls interacted
with the profiler. This can be quite confusing for users so getting it
into the user guide is the first step to a potentially better solution.
See the ticket for more insightful discussion.

Fixes #21764

- - - - -


2 changed files:

- + docs/users_guide/images/eventlog_profile.png
- docs/users_guide/profiling.rst


Changes:

=====================================
docs/users_guide/images/eventlog_profile.png
=====================================
Binary files /dev/null and b/docs/users_guide/images/eventlog_profile.png differ


=====================================
docs/users_guide/profiling.rst
=====================================
@@ -10,17 +10,13 @@ Profiling
 
 GHC comes with a time and space profiling system, so that you can answer
 questions like "why is my program so slow?", or "why is my program using
-so much memory?".
+so much memory?". We'll start by describing how to do time profiling.
 
-Profiling a program is a three-step process:
+Time profiling a program is a three-step process:
 
 1. Re-compile your program for profiling with the :ghc-flag:`-prof` option, and
    probably one of the options for adding automatic annotations:
-   :ghc-flag:`-fprof-auto` is the most common [1]_.
-
-   If you are using external packages with :command:`cabal`, you may need to
-   reinstall these packages with profiling support; typically this is
-   done with ``cabal install -p package --reinstall``.
+   :ghc-flag:`-fprof-late` is the recommended option.
 
 2. Having compiled the program for profiling, you now need to run it to
    generate the profile. For example, a simple time profile can be
@@ -37,6 +33,9 @@ Profiling a program is a three-step process:
 3. Examine the generated profiling information, use the information to
    optimise your program, and repeat as necessary.
 
+The time profiler measures the CPU time taken by the Haskell code in your application.
+In particular time taken by some foreign calls is not tracked by the profiler (see :ref:`prof-foreign-calls`).
+
 .. _cost-centres:
 
 Cost centres and cost-centre stacks
@@ -197,7 +196,10 @@ Inserting cost centres by hand
 Cost centres are just program annotations. When you say ``-fprof-auto``
 to the compiler, it automatically inserts a cost centre annotation
 around every binding not marked INLINE in your program, but you are
-entirely free to add cost centre annotations yourself.
+entirely free to add cost centre annotations yourself. Be careful adding too many
+cost-centre annotations as the optimiser is careful to not move them around or
+remove them, which can severly affect how your program is optimised and hence the
+runtime performance!
 
 The syntax of a cost centre annotation for expressions is ::
 
@@ -311,6 +313,39 @@ and become CAFs. You will probably need to consult the Core
 .. index::
    single: -fprof-cafs
 
+.. _prof-foreign-calls:
+
+Profiling and foreign calls
+---------------------------
+
+Simply put, the profiler includes time spent in unsafe foreign
+calls but ignores time taken in safe foreign calls. For example, time spent blocked on IO
+operations (e.g. ``getLine``) is not accounted for in the profile as ``getLine`` is implemented
+using a safe foreign call.
+
+The profiler estimates CPU time and only CPU time for the Haskell portion of the
+program. In particular, time "taken" by the program by blocking foreign calls
+is not accounted for in time profiles. The runtime has the notion of a virtual
+processor which is known as a "capability". Haskell code is run on capabilities,
+and the profiler samples the capabilities in order to determine what is being
+executed at a certain time. When a safe foreign call is executed it's run outside
+the context of a capability and hence the sampling does not account for the time
+taken. Whilst the safe call is executed, other
+Haskell threads are free to run on the capability, and their cost will be attributed
+to the profiler. When the safe call is finished, the blocked, descheduled thread can
+be resumed and rescheduled.
+
+However, the time taken by blocking on unsafe foreign calls is accounted for in the profile.
+This can be explained because an unsafe foreign call is executed directly in the
+same capability as the Haskell code is being executed. Therefore, an unsafe foreign
+call will block the entire capability whilst it is running, and any time that
+capability is sampled then the "cost" of the foreign call will be attributed to the
+calling cost-centre stack. Note however, you are not supposed to use unsafe foreign
+calls for any operations which do block! Do not be tempted to replace your safe foreign
+calls with unsafe calls just so they appear in the profile.
+
+
+
 .. _prof-compiler-options:
 
 Compiler options for profiling
@@ -356,7 +391,9 @@ Automatically placing cost-centres
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 GHC has a number of flags for automatically inserting cost-centres into the
-compiled program.
+compiled program. Use these options carefully because inserting too many cost-centres
+in the wrong places will mean the optimiser will be less effective and the runtime behaviour
+of your profiled program will be different to that of the unprofiled one.
 
 .. ghc-flag:: -fprof-callers=⟨name⟩
     :shortdesc: Auto-add ``SCC``\\ s to all call-sites of the named function.
@@ -618,8 +655,10 @@ enclosed between ``+RTS ... -RTS`` as usual):
 JSON profile format
 ~~~~~~~~~~~~~~~~~~~
 
-When invoked with the :rts-flag:`-pj` flag the runtime will emit the cost-centre
-profile in a machine-readable JSON format. The top-level object of this format
+profile in a machine-readable JSON format. The JSON file can be directly loaded
+into `speedscope.app <https://www.speedscope.app/>`_ to interactively view the profile.
+
+The top-level object of this format
 has the following properties,
 
 ``program`` (string)
@@ -758,8 +797,12 @@ For instance, a simple profile might look like this,
       }
     }
 
+Eventlog profile format
+~~~~~~~~~~~~~~~~~~~~~~~
 
-
+In addition to the ``.prof`` and ``.json`` formats the cost centre definitions
+and samples are also emitted to the :ref:`eventlog <rts-eventlog>`. The format
+of the events is specified in the :ref:`eventlog encodings <eventlog-encodings>` section.
 
 
 .. _prof-heap:
@@ -774,18 +817,35 @@ program holds on to more memory at run-time that it needs to. Space
 leaks lead to slower execution due to heavy garbage collector activity,
 and may even cause the program to run out of memory altogether.
 
+Heap profiling differs from time profiling in the fact that is not always
+necessary to use the profiling runtime to generate a heap profile. There
+are two heap profiling modes (:rts-flag:`-hT` and :rts-flag:`-hi` [1]_) which are always
+available.
+
 To generate a heap profile from your program:
 
-1. Compile the program for profiling (:ref:`prof-compiler-options`).
+1. Assuming you need the profiling runtime, compile the program for profiling (:ref:`prof-compiler-options`).
 
 2. Run it with one of the heap profiling options described below (eg.
-   :rts-flag:`-hc` for a basic producer profile). This generates the file
-   :file:`{prog}.hp`.
+   :rts-flag:`-hc` for a basic producer profile) and enable the eventlog using :rts-flag:`-l <-l ⟨flags⟩>`.
 
-   If the :ref:`event log <rts-eventlog>` is enabled (with the :rts-flag:`-l ⟨flags⟩`
-   runtime system flag) heap samples will additionally be emitted to the GHC
+   Heap samples will be emitted to the GHC
    event log (see :ref:`heap-profiler-events` for details about event format).
 
+3. Render the heap profile using `eventlog2html <https://mpickering.github.io/eventlog2html/>`_.
+   This produces an HTML file which contains the visualised profile.
+
+4. Open the rendered interactive profile in your web browser.
+
+For example, here is a heap profile produced of using eventlog profiling on GHC
+compiling the Cabal library. You can read a lot more about eventlog2html on the website.
+
+.. image:: images/eventlog_profile.*
+
+Note that there is the legacy :file:`{prog}.hp` format which has been deprecated
+in favour of eventlog based profiling. In order to render the legacy format, the
+steps are as follows.
+
 3. Run :command:`hp2ps` to produce a Postscript file, :file:`{prog}.ps`. The
    :command:`hp2ps` utility is described in detail in :ref:`hp2ps`.
 
@@ -797,10 +857,6 @@ from GHC's ``nofib`` benchmark suite,
 
 .. image:: images/prof_scc.*
 
-You might also want to take a look at
-`hp2any <https://www.haskell.org/haskellwiki/Hp2any>`__, a more advanced
-suite of tools (not distributed with GHC) for displaying heap profiles.
-
 Note that there might be a big difference between the OS reported memory usage
 of your program and the amount of live data as reported by heap profiling.
 The reasons for the difference are explained in :ref:`hints-os-memory`.
@@ -817,20 +873,14 @@ following RTS options select which break-down to use:
 
 .. rts-flag:: -hT
 
-    Breaks down the graph by heap closure type.
+    Breaks down the graph by heap closure type. This does not require the profiling
+    runtime.
 
 .. rts-flag:: -hc
-              -h
 
     *Requires* :ghc-flag:`-prof`. Breaks down the graph by the cost-centre stack
     which produced the data.
 
-    .. note:: The meaning of the shortened :rts-flag:`-h` is dependent on whether
-              your program was compiled for profiling. When compiled for profiling,
-              :rts-flag:`-h` is equivalent to :rts-flag:`-hc`, but otherwise is
-              equivalent to :rts-flag:`-hT` (see :ref:`rts-profiling`). The :rts-flag:`-h`
-              is deprecated and will be removed in a future release.
-
 .. rts-flag:: -hm
 
     *Requires* :ghc-flag:`-prof`. Break down the live heap by the module
@@ -863,7 +913,7 @@ following RTS options select which break-down to use:
 
     Break down the graph by the address of the info table of a closure. For this
     to produce useful output the program must have been compiled with
-    :ghc-flag:`-finfo-table-map`.
+    :ghc-flag:`-finfo-table-map` but it does not require the profiling runtime.
 
 .. rts-flag:: -l
     :noindex:
@@ -1041,6 +1091,14 @@ This trick isn't foolproof, because there might be other ``B`` closures in
 the heap which aren't the retainers we are interested in, but we've
 found this to be a useful technique in most cases.
 
+Precise Retainer Analysis
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you want to precisely answer questions about why a certain type of closure is
+retained then it is worthwhile using `ghc-debug <https://gitlab.haskell.org/ghc/ghc-debug>`_ which
+has a terminal interface which can be used to easily answer queries such as, what is retaining
+a certain closure.
+
 .. _biography-prof:
 
 Biographical Profiling
@@ -1120,6 +1178,9 @@ reasons for this:
    allocated by foreign libraries, and data allocated by the RTS), and
    ``mmap()``\'d memory are not counted in the heap profile.
 
+For more discussion about understanding how understanding process residency see
+:ref:`hints-os-memory`.
+
 .. _hp2ps:
 
 ``hp2ps`` -- Rendering heap profiles to PostScript
@@ -1242,123 +1303,6 @@ The flags are:
 
     Print out usage information.
 
-.. _manipulating-hp:
-
-Manipulating the ``hp`` file
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-(Notes kindly offered by Jan-Willem Maessen.)
-
-The ``FOO.hp`` file produced when you ask for the heap profile of a
-program ``FOO`` is a text file with a particularly simple structure.
-Here's a representative example, with much of the actual data omitted:
-
-.. code-block:: none
-
-    JOB "FOO -hC"
-    DATE "Thu Dec 26 18:17 2002"
-    SAMPLE_UNIT "seconds"
-    VALUE_UNIT "bytes"
-    BEGIN_SAMPLE 0.00
-    END_SAMPLE 0.00
-    BEGIN_SAMPLE 15.07
-      ... sample data ...
-    END_SAMPLE 15.07
-    BEGIN_SAMPLE 30.23
-      ... sample data ...
-    END_SAMPLE 30.23
-    ... etc.
-    BEGIN_SAMPLE 11695.47
-    END_SAMPLE 11695.47
-
-The first four lines (``JOB``, ``DATE``, ``SAMPLE_UNIT``,
-``VALUE_UNIT``) form a header. Each block of lines starting with
-``BEGIN_SAMPLE`` and ending with ``END_SAMPLE`` forms a single sample
-(you can think of this as a vertical slice of your heap profile). The
-hp2ps utility should accept any input with a properly-formatted header
-followed by a series of *complete* samples.
-
-Zooming in on regions of your profile
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-You can look at particular regions of your profile simply by loading a
-copy of the ``.hp`` file into a text editor and deleting the unwanted
-samples. The resulting ``.hp`` file can be run through ``hp2ps`` and
-viewed or printed.
-
-Viewing the heap profile of a running program
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The ``.hp`` file is generated incrementally as your program runs. In
-principle, running :command:`hp2ps` on the incomplete file should produce a
-snapshot of your program's heap usage. However, the last sample in the
-file may be incomplete, causing :command:`hp2ps` to fail. If you are using a
-machine with UNIX utilities installed, it's not too hard to work around
-this problem (though the resulting command line looks rather Byzantine):
-
-.. code-block:: sh
-
-    head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
-        | hp2ps > FOO.ps
-
-The command ``fgrep -n END_SAMPLE FOO.hp`` finds the end of every
-complete sample in ``FOO.hp``, and labels each sample with its ending
-line number. We then select the line number of the last complete sample
-using :command:`tail` and :command:`cut`. This is used as a parameter to :command:`head`; the
-result is as if we deleted the final incomplete sample from :file:`FOO.hp`.
-This results in a properly-formatted .hp file which we feed directly to
-:command:`hp2ps`.
-
-Viewing a heap profile in real time
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The :command:`gv` and :command:`ghostview` programs have a "watch file" option
-can be used to view an up-to-date heap profile of your program as it runs.
-Simply generate an incremental heap profile as described in the previous
-section. Run :command:`gv` on your profile:
-
-.. code-block:: sh
-
-      gv -watch -orientation=seascape FOO.ps
-
-If you forget the ``-watch`` flag you can still select "Watch file" from
-the "State" menu. Now each time you generate a new profile ``FOO.ps``
-the view will update automatically.
-
-This can all be encapsulated in a little script:
-
-.. code-block:: sh
-
-      #!/bin/sh
-      head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
-        | hp2ps > FOO.ps
-      gv -watch -orientation=seascape FOO.ps &
-      while [ 1 ] ; do
-        sleep 10 # We generate a new profile every 10 seconds.
-        head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
-          | hp2ps > FOO.ps
-      done
-
-Occasionally :command:`gv` will choke as it tries to read an incomplete copy of
-:file:`FOO.ps` (because :command:`hp2ps` is still running as an update occurs). A
-slightly more complicated script works around this problem, by using the
-fact that sending a SIGHUP to gv will cause it to re-read its input
-file:
-
-.. code-block:: sh
-
-      #!/bin/sh
-      head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
-        | hp2ps > FOO.ps
-      gv FOO.ps &
-      gvpsnum=$!
-      while [ 1 ] ; do
-        sleep 10
-        head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
-          | hp2ps > FOO.ps
-        kill -HUP $gvpsnum
-      done
-
 .. _prof-threaded:
 
 Profiling Parallel and Concurrent Programs
@@ -1968,10 +1912,9 @@ Notes about ticky profiling
   in some columns. For this reason using an eventlog-based approach should be prefered if
   possible.
 
-
 .. [1]
-   :ghc-flag:`-fprof-auto` was known as ``-auto-all`` prior to
-   GHC 7.4.1.
+   :rts-flag:`-hi` profiling is avaible with the normal runtime but you will need to
+   compile with :ghc-flag:`-finfo-table-map` to interpret the results.
 
 .. [2]
    Note that this policy has changed slightly in GHC 7.4.1 relative to



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/278911c00f133bcfc8c0edb4a71aa6ec4404dd59...e65802971a399f0fee9a9dd79c89483dcb57f699

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/278911c00f133bcfc8c0edb4a71aa6ec4404dd59...e65802971a399f0fee9a9dd79c89483dcb57f699
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230130/a9eab823/attachment-0001.html>


More information about the ghc-commits mailing list