[Git][ghc/ghc][wip/js-rts-fixmes] 8 commits: Refresh profiling docs

Josh Meredith (@JoshMeredith) gitlab at gitlab.haskell.org
Tue Feb 14 04:54:26 UTC 2023



Josh Meredith pushed to branch wip/js-rts-fixmes at Glasgow Haskell Compiler / GHC


Commits:
1e9eac1c by Matthew Pickering at 2023-02-13T11:36:41+01:00
Refresh profiling docs

I went through the whole of the profiling docs and tried to amend them
to reflect current best practices and tooling. In particular I removed
some old references to tools such as hp2any and replaced them with
references to eventlog2html.

- - - - -
da208b9a by Matthew Pickering at 2023-02-13T11:36:41+01:00
docs: Add section about profiling and foreign calls

Previously there was no documentation for how foreign calls interacted
with the profiler. This can be quite confusing for users so getting it
into the user guide is the first step to a potentially better solution.
See the ticket for more insightful discussion.

Fixes #21764

- - - - -
081640f1 by Bodigrim at 2023-02-13T12:51:52-05:00
Document that -fproc-alignment was introduced only in GHC 8.6

- - - - -
2e2a7f5e by Josh Meredith at 2023-02-14T04:54:01+00:00
Factor JS Rts generation for h$c{_,0,1,2} into h$c{n}

- - - - -
a2319f3c by Josh Meredith at 2023-02-14T04:54:01+00:00
Cache names used commonly in JS backend RTS generation

- - - - -
0ea1d661 by Sylvain Henry at 2023-02-14T04:54:01+00:00
Apply 1 suggestion(s) to 1 file(s)
- - - - -
3c73b0d1 by Josh Meredith at 2023-02-14T04:54:01+00:00
JS/Make: reduce cache sizes

- - - - -
c33a70fc by Josh Meredith at 2023-02-14T04:54:01+00:00
JS RTS: use jsClosureCount for closureConstructors and cache sizes

- - - - -


7 changed files:

- compiler/GHC/JS/Make.hs
- compiler/GHC/StgToJS/DataCon.hs
- compiler/GHC/StgToJS/Expr.hs
- compiler/GHC/StgToJS/Rts/Rts.hs
- docs/users_guide/debugging.rst
- + docs/users_guide/images/eventlog_profile.png
- docs/users_guide/profiling.rst


Changes:

=====================================
compiler/GHC/JS/Make.hs
=====================================
@@ -129,7 +129,11 @@ module GHC.JS.Make
   -- * Miscellaneous
   -- $misc
   , allocData, allocClsA
-  , dataFieldName, dataFieldNames
+  , dataName
+  , clsName
+  , dataFieldName
+  , varName
+  , jsClosureCount
   )
 where
 
@@ -142,10 +146,8 @@ import Control.Arrow ((***))
 import Data.Array
 import qualified Data.Map as M
 
-import GHC.Utils.Outputable (Outputable (..))
 import GHC.Data.FastString
 import GHC.Utils.Monad.State.Strict
-import GHC.Utils.Panic
 import GHC.Utils.Misc
 import GHC.Types.Unique.Map
 
@@ -642,30 +644,48 @@ dataFieldCache :: Array Int FastString
 dataFieldCache = listArray (0,nFieldCache) (map (mkFastString . ('d':) . show) [(0::Int)..nFieldCache])
 
 nFieldCache :: Int
-nFieldCache  = 16384
+nFieldCache  = 255
+
+jsClosureCount :: Int
+jsClosureCount  = 24
 
 dataFieldName :: Int -> FastString
 dataFieldName i
-  | i < 1 || i > nFieldCache = panic "dataFieldName" (ppr i)
+  | i < 0 || i > nFieldCache = mkFastString ('d' : show i)
   | otherwise                = dataFieldCache ! i
 
-dataFieldNames :: [FastString]
-dataFieldNames = fmap dataFieldName [1..nFieldCache]
-
-
 -- | Cache "h$dXXX" names
 dataCache :: Array Int FastString
-dataCache = listArray (0,1024) (map (mkFastString . ("h$d"++) . show) [(0::Int)..1024])
+dataCache = listArray (0,jsClosureCount) (map (mkFastString . ("h$d"++) . show) [(0::Int)..jsClosureCount])
+
+dataName :: Int -> FastString
+dataName i
+  | i < 0 || i > nFieldCache = mkFastString ("h$d" ++ show i)
+  | otherwise                = dataCache ! i
 
 allocData :: Int -> JExpr
-allocData i = toJExpr (TxtI (dataCache ! i))
+allocData i = toJExpr (TxtI (dataName i))
 
 -- | Cache "h$cXXX" names
 clsCache :: Array Int FastString
-clsCache = listArray (0,1024) (map (mkFastString . ("h$c"++) . show) [(0::Int)..1024])
+clsCache = listArray (0,jsClosureCount) (map (mkFastString . ("h$c"++) . show) [(0::Int)..jsClosureCount])
+
+clsName :: Int -> FastString
+clsName i
+  | i < 0 || i > jsClosureCount = mkFastString ("h$c" ++ show i)
+  | otherwise                   = clsCache ! i
 
 allocClsA :: Int -> JExpr
-allocClsA i = toJExpr (TxtI (clsCache ! i))
+allocClsA i = toJExpr (TxtI (clsName i))
+
+-- | Cache "xXXX" names
+varCache :: Array Int Ident
+varCache = listArray (0,jsClosureCount) (map (TxtI . mkFastString . ('x':) . show) [(0::Int)..jsClosureCount])
+
+varName :: Int -> Ident
+varName i
+  | i < 0 || i > jsClosureCount = TxtI $ mkFastString ('x' : show i)
+  | otherwise                   = varCache ! i
 
 
 --------------------------------------------------------------------------------


=====================================
compiler/GHC/StgToJS/DataCon.hs
=====================================
@@ -97,7 +97,7 @@ allocDynamicE :: Bool          -- ^ csInlineAlloc from StgToJSConfig
               -> Maybe JExpr
               -> JExpr
 allocDynamicE  inline_alloc entry free cc
-  | inline_alloc || length free > 24 = newClosure $ Closure
+  | inline_alloc || length free > jsClosureCount = newClosure $ Closure
       { clEntry  = entry
       , clField1 = fillObj1
       , clField2 = fillObj2


=====================================
compiler/GHC/StgToJS/Expr.hs
=====================================
@@ -1006,7 +1006,7 @@ allocDynAll haveDecl middle cls = do
                         ]
             (ex:es)  -> mconcat
                         [ toJExpr i .^ closureField1_ |= toJExpr ex
-                        , toJExpr i .^ closureField2_ |= toJExpr (jhFromList (zip dataFieldNames es))
+                        , toJExpr i .^ closureField2_ |= toJExpr (jhFromList (zip (map dataFieldName [1..]) es))
                         ]
       | otherwise = case es of
             []      -> mempty


=====================================
compiler/GHC/StgToJS/Rts/Rts.hs
=====================================
@@ -81,36 +81,8 @@ resetResultVar r = toJExpr r |= null_
 -- JIT can optimize better.
 closureConstructors :: StgToJSConfig -> JStat
 closureConstructors s = BlockStat
-  [ declClsConstr "h$c" ["f"] $ Closure
-      { clEntry  = var "f"
-      , clField1 = null_
-      , clField2 = null_
-      , clMeta   = 0
-      , clCC     = ccVal
-      }
-  , declClsConstr "h$c0" ["f"] $ Closure
-      { clEntry  = var "f"
-      , clField1 = null_
-      , clField2 = null_
-      , clMeta   = 0
-      , clCC     = ccVal
-      }
-  , declClsConstr "h$c1" ["f", "x1"] $ Closure
-      { clEntry  = var "f"
-      , clField1 = var "x1"
-      , clField2 = null_
-      , clMeta   = 0
-      , clCC     = ccVal
-      }
-  , declClsConstr "h$c2" ["f", "x1", "x2"] $ Closure
-      { clEntry  = var "f"
-      , clField1 = var "x1"
-      , clField2 = var "x2"
-      , clMeta   = 0
-      , clCC     = ccVal
-      }
-  , mconcat (map mkClosureCon [3..24])
-  , mconcat (map mkDataFill [1..24])
+  [ mconcat (map mkClosureCon (Nothing : map Just [0..jsClosureCount]))
+  , mconcat (map mkDataFill [1..jsClosureCount])
   ]
   where
     prof = csProf s
@@ -118,19 +90,8 @@ closureConstructors s = BlockStat
       -- the cc argument happens to be named just like the cc field...
       | prof      = ([TxtI closureCC_], Just (var closureCC_))
       | otherwise = ([], Nothing)
-    addCCArg as = map TxtI as ++ ccArg
     addCCArg' as = as ++ ccArg
 
-    declClsConstr i as cl = TxtI i ||= ValExpr (JFunc (addCCArg as)
-      ( jVar $ \x ->
-          [ checkC
-          , x |= newClosure cl
-          , notifyAlloc x
-          , traceAlloc x
-          , returnS x
-          ]
-         ))
-
     traceAlloc x | csTraceRts s = appS "h$traceAlloc" [x]
                  | otherwise    = mempty
 
@@ -172,26 +133,36 @@ closureConstructors s = BlockStat
 
            | otherwise = mempty
 
-    mkClosureCon :: Int -> JStat
-    mkClosureCon n = funName ||= toJExpr fun
+    mkClosureCon :: Maybe Int -> JStat
+    mkClosureCon n0 = funName ||= toJExpr fun
       where
-        funName = TxtI $ mkFastString ("h$c" ++ show n)
+        n | Just n' <- n0 = n'
+          | Nothing <- n0 = 0
+        funName | Just n' <- n0 = TxtI $ clsName n'
+                | Nothing <- n0 = TxtI $ mkFastString "h$c"
         -- args are: f x1 x2 .. xn [cc]
-        args   = TxtI "f" : addCCArg' (map (TxtI . mkFastString . ('x':) . show) [(1::Int)..n])
+        args   = TxtI "f" : addCCArg' (map varName [1..n])
         fun    = JFunc args funBod
         -- x1 goes into closureField1. All the other args are bundled into an
         -- object in closureField2: { d1 = x2, d2 = x3, ... }
         --
-        extra_args = ValExpr . JHash . listToUniqMap $ zip
-                   (map (mkFastString . ('d':) . show) [(1::Int)..])
-                   (map (toJExpr . TxtI . mkFastString . ('x':) . show) [2..n])
+        vars   = map (toJExpr . varName) [1..n]
+
+        x1     = case vars of
+                   []  -> null_
+                   x:_ -> x
+        x2     = case vars of
+                   []     -> null_
+                   [_]    -> null_
+                   [_,x]  -> x
+                   _:x:xs -> ValExpr . JHash . listToUniqMap $ zip (map dataFieldName [1..]) (x:xs)
 
         funBod = jVar $ \x ->
             [ checkC
             , x |= newClosure Closure
                { clEntry  = var "f"
-               , clField1 = var "x1"
-               , clField2 = extra_args
+               , clField1 = x1
+               , clField2 = x2
                , clMeta   = 0
                , clCC     = ccVal
                }
@@ -203,8 +174,8 @@ closureConstructors s = BlockStat
     mkDataFill :: Int -> JStat
     mkDataFill n = funName ||= toJExpr fun
       where
-        funName    = TxtI $ mkFastString ("h$d" ++ show n)
-        ds         = map (mkFastString . ('d':) . show) [(1::Int)..n]
+        funName    = TxtI $ dataName n
+        ds         = map dataFieldName [1..n]
         extra_args = ValExpr . JHash . listToUniqMap . zip ds $ map (toJExpr . TxtI) ds
         fun        = JFunc (map TxtI ds) (checkD <> returnS extra_args)
 
@@ -215,7 +186,7 @@ stackManip = mconcat (map mkPush [1..32]) <>
   where
     mkPush :: Int -> JStat
     mkPush n = let funName = TxtI $ mkFastString ("h$p" ++ show n)
-                   as      = map (TxtI . mkFastString . ('x':) . show) [1..n]
+                   as      = map varName [1..n]
                    fun     = JFunc as ((sp |= sp + toJExpr n)
                                        <> mconcat (zipWith (\i a -> stack .! (sp - toJExpr (n-i)) |= toJExpr a)
                                                    [1..] as))
@@ -228,7 +199,7 @@ stackManip = mconcat (map mkPush [1..32]) <>
                       bits    = bitsIdx sig
                       n       = length bits
                       h       = last bits
-                      args    = map (TxtI . mkFastString . ('x':) . show) [1..n]
+                      args    = map varName [1..n]
                       fun     = JFunc args $
                         mconcat [ sp |= sp + toJExpr (h+1)
                                 , mconcat (zipWith (\b a -> stack .! (sp - toJExpr (h-b)) |= toJExpr a) bits args)
@@ -288,7 +259,7 @@ loadRegs :: JStat
 loadRegs = mconcat $ map mkLoad [1..32]
   where
     mkLoad :: Int -> JStat
-    mkLoad n = let args   = map (TxtI . mkFastString . ("x"++) . show) [1..n]
+    mkLoad n = let args   = map varName [1..n]
                    assign = zipWith (\a r -> toJExpr r |= toJExpr a)
                               args (reverse $ take n regsFromR1)
                    fname  = TxtI $ mkFastString ("h$l" ++ show n)


=====================================
docs/users_guide/debugging.rst
=====================================
@@ -1046,6 +1046,8 @@ Checking for consistency
     :shortdesc: Align functions at given boundary.
     :type: dynamic
 
+    :since: 8.6.1
+
     Align functions to multiples of the given value. Only valid values are powers
     of two.
 


=====================================
docs/users_guide/images/eventlog_profile.png
=====================================
Binary files /dev/null and b/docs/users_guide/images/eventlog_profile.png differ


=====================================
docs/users_guide/profiling.rst
=====================================
@@ -10,17 +10,13 @@ Profiling
 
 GHC comes with a time and space profiling system, so that you can answer
 questions like "why is my program so slow?", or "why is my program using
-so much memory?".
+so much memory?". We'll start by describing how to do time profiling.
 
-Profiling a program is a three-step process:
+Time profiling a program is a three-step process:
 
 1. Re-compile your program for profiling with the :ghc-flag:`-prof` option, and
    probably one of the options for adding automatic annotations:
-   :ghc-flag:`-fprof-auto` is the most common [1]_.
-
-   If you are using external packages with :command:`cabal`, you may need to
-   reinstall these packages with profiling support; typically this is
-   done with ``cabal install -p package --reinstall``.
+   :ghc-flag:`-fprof-late` is the recommended option.
 
 2. Having compiled the program for profiling, you now need to run it to
    generate the profile. For example, a simple time profile can be
@@ -37,6 +33,9 @@ Profiling a program is a three-step process:
 3. Examine the generated profiling information, use the information to
    optimise your program, and repeat as necessary.
 
+The time profiler measures the CPU time taken by the Haskell code in your application.
+In particular time taken by safe foreign calls is not tracked by the profiler (see :ref:`prof-foreign-calls`).
+
 .. _cost-centres:
 
 Cost centres and cost-centre stacks
@@ -197,7 +196,10 @@ Inserting cost centres by hand
 Cost centres are just program annotations. When you say ``-fprof-auto``
 to the compiler, it automatically inserts a cost centre annotation
 around every binding not marked INLINE in your program, but you are
-entirely free to add cost centre annotations yourself.
+entirely free to add cost centre annotations yourself. Be careful adding too many
+cost-centre annotations as the optimiser is careful to not move them around or
+remove them, which can severly affect how your program is optimised and hence the
+runtime performance!
 
 The syntax of a cost centre annotation for expressions is ::
 
@@ -311,6 +313,39 @@ and become CAFs. You will probably need to consult the Core
 .. index::
    single: -fprof-cafs
 
+.. _prof-foreign-calls:
+
+Profiling and foreign calls
+---------------------------
+
+Simply put, the profiler includes time spent in unsafe foreign
+calls but ignores time taken in safe foreign calls. For example, time spent blocked on IO
+operations (e.g. ``getLine``) is not accounted for in the profile as ``getLine`` is implemented
+using a safe foreign call.
+
+The profiler estimates CPU time, for Haskell threads within the program only.
+In particular, time "taken" by the program in blocking safe foreign calls
+is not accounted for in time profiles. The runtime has the notion of a virtual
+processor which is known as a "capability". Haskell threads are run on capabilities,
+and the profiler samples the capabilities in order to determine what is being
+executed at a certain time. When a safe foreign call is executed, it's run outside
+the context of a capability; hence the sampling does not account for the time
+taken. Whilst the safe call is executed, other
+Haskell threads are free to run on the capability, and their cost will be attributed
+to the profiler. When the safe call is finished, the blocked, descheduled thread can
+be resumed and rescheduled.
+
+However, the time taken by blocking on unsafe foreign calls is accounted for in the profile.
+This happens because unsafe foreign calls are executed by the same capability
+their calling Haskell thread is running on. Therefore, an unsafe foreign call will
+block the entire capability whilst it is running, and any time the capability is
+sampled the "cost" of the foreign call will be attributed to the calling cost-centre stack.
+
+However, do note that you are not supposed to use unsafe foreign calls for any
+operations which do block! Do not be tempted to replace your safe foreign calls
+with unsafe calls just so they appear in the profile. This prevents GC from
+happening until the foreign call returns, which can be catastrophic for performance.
+
 .. _prof-compiler-options:
 
 Compiler options for profiling
@@ -356,7 +391,9 @@ Automatically placing cost-centres
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 GHC has a number of flags for automatically inserting cost-centres into the
-compiled program.
+compiled program. Use these options carefully because inserting too many cost-centres
+in the wrong places will mean the optimiser will be less effective and the runtime behaviour
+of your profiled program will be different to that of the unprofiled one.
 
 .. ghc-flag:: -fprof-callers=⟨name⟩
     :shortdesc: Auto-add ``SCC``\\ s to all call-sites of the named function.
@@ -618,8 +655,10 @@ enclosed between ``+RTS ... -RTS`` as usual):
 JSON profile format
 ~~~~~~~~~~~~~~~~~~~
 
-When invoked with the :rts-flag:`-pj` flag the runtime will emit the cost-centre
-profile in a machine-readable JSON format. The top-level object of this format
+profile in a machine-readable JSON format. The JSON file can be directly loaded
+into `speedscope.app <https://www.speedscope.app/>`_ to interactively view the profile.
+
+The top-level object of this format
 has the following properties,
 
 ``program`` (string)
@@ -758,8 +797,12 @@ For instance, a simple profile might look like this,
       }
     }
 
+Eventlog profile format
+~~~~~~~~~~~~~~~~~~~~~~~
 
-
+In addition to the ``.prof`` and ``.json`` formats the cost centre definitions
+and samples are also emitted to the :ref:`eventlog <rts-eventlog>`. The format
+of the events is specified in the :ref:`eventlog encodings <eventlog-encodings>` section.
 
 
 .. _prof-heap:
@@ -774,18 +817,35 @@ program holds on to more memory at run-time that it needs to. Space
 leaks lead to slower execution due to heavy garbage collector activity,
 and may even cause the program to run out of memory altogether.
 
+Heap profiling differs from time profiling in the fact that is not always
+necessary to use the profiling runtime to generate a heap profile. There
+are two heap profiling modes (:rts-flag:`-hT` and :rts-flag:`-hi` [1]_) which are always
+available.
+
 To generate a heap profile from your program:
 
-1. Compile the program for profiling (:ref:`prof-compiler-options`).
+1. Assuming you need the profiling runtime, compile the program for profiling (:ref:`prof-compiler-options`).
 
 2. Run it with one of the heap profiling options described below (eg.
-   :rts-flag:`-hc` for a basic producer profile). This generates the file
-   :file:`{prog}.hp`.
+   :rts-flag:`-hc` for a basic producer profile) and enable the eventlog using :rts-flag:`-l <-l ⟨flags⟩>`.
 
-   If the :ref:`event log <rts-eventlog>` is enabled (with the :rts-flag:`-l ⟨flags⟩`
-   runtime system flag) heap samples will additionally be emitted to the GHC
+   Heap samples will be emitted to the GHC
    event log (see :ref:`heap-profiler-events` for details about event format).
 
+3. Render the heap profile using `eventlog2html <https://hackage.haskell.org/package/eventlog2html>`_.
+   This produces an HTML file which contains the visualised profile.
+
+4. Open the rendered interactive profile in your web browser.
+
+For example, here is a heap profile produced of using eventlog profiling on GHC
+compiling the Cabal library. You can read a lot more about eventlog2html on the website.
+
+.. image:: images/eventlog_profile.*
+
+Note that there is the legacy :file:`{prog}.hp` format which has been deprecated
+in favour of eventlog based profiling. In order to render the legacy format, the
+steps are as follows.
+
 3. Run :command:`hp2ps` to produce a Postscript file, :file:`{prog}.ps`. The
    :command:`hp2ps` utility is described in detail in :ref:`hp2ps`.
 
@@ -797,10 +857,6 @@ from GHC's ``nofib`` benchmark suite,
 
 .. image:: images/prof_scc.*
 
-You might also want to take a look at
-`hp2any <https://www.haskell.org/haskellwiki/Hp2any>`__, a more advanced
-suite of tools (not distributed with GHC) for displaying heap profiles.
-
 Note that there might be a big difference between the OS reported memory usage
 of your program and the amount of live data as reported by heap profiling.
 The reasons for the difference are explained in :ref:`hints-os-memory`.
@@ -817,20 +873,14 @@ following RTS options select which break-down to use:
 
 .. rts-flag:: -hT
 
-    Breaks down the graph by heap closure type.
+    Breaks down the graph by heap closure type. This does not require the profiling
+    runtime.
 
 .. rts-flag:: -hc
-              -h
 
     *Requires* :ghc-flag:`-prof`. Breaks down the graph by the cost-centre stack
     which produced the data.
 
-    .. note:: The meaning of the shortened :rts-flag:`-h` is dependent on whether
-              your program was compiled for profiling. When compiled for profiling,
-              :rts-flag:`-h` is equivalent to :rts-flag:`-hc`, but otherwise is
-              equivalent to :rts-flag:`-hT` (see :ref:`rts-profiling`). The :rts-flag:`-h`
-              is deprecated and will be removed in a future release.
-
 .. rts-flag:: -hm
 
     *Requires* :ghc-flag:`-prof`. Break down the live heap by the module
@@ -863,7 +913,7 @@ following RTS options select which break-down to use:
 
     Break down the graph by the address of the info table of a closure. For this
     to produce useful output the program must have been compiled with
-    :ghc-flag:`-finfo-table-map`.
+    :ghc-flag:`-finfo-table-map` but it does not require the profiling runtime.
 
 .. rts-flag:: -l
     :noindex:
@@ -1041,6 +1091,14 @@ This trick isn't foolproof, because there might be other ``B`` closures in
 the heap which aren't the retainers we are interested in, but we've
 found this to be a useful technique in most cases.
 
+Precise Retainer Analysis
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you want to precisely answer questions about why a certain type of closure is
+retained then it is worthwhile using `ghc-debug <https://gitlab.haskell.org/ghc/ghc-debug>`_ which
+has a terminal interface which can be used to easily answer queries such as, what is retaining
+a certain closure.
+
 .. _biography-prof:
 
 Biographical Profiling
@@ -1120,6 +1178,9 @@ reasons for this:
    allocated by foreign libraries, and data allocated by the RTS), and
    ``mmap()``\'d memory are not counted in the heap profile.
 
+For more discussion about understanding how understanding process residency see
+:ref:`hints-os-memory`.
+
 .. _hp2ps:
 
 ``hp2ps`` -- Rendering heap profiles to PostScript
@@ -1242,123 +1303,6 @@ The flags are:
 
     Print out usage information.
 
-.. _manipulating-hp:
-
-Manipulating the ``hp`` file
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-(Notes kindly offered by Jan-Willem Maessen.)
-
-The ``FOO.hp`` file produced when you ask for the heap profile of a
-program ``FOO`` is a text file with a particularly simple structure.
-Here's a representative example, with much of the actual data omitted:
-
-.. code-block:: none
-
-    JOB "FOO -hC"
-    DATE "Thu Dec 26 18:17 2002"
-    SAMPLE_UNIT "seconds"
-    VALUE_UNIT "bytes"
-    BEGIN_SAMPLE 0.00
-    END_SAMPLE 0.00
-    BEGIN_SAMPLE 15.07
-      ... sample data ...
-    END_SAMPLE 15.07
-    BEGIN_SAMPLE 30.23
-      ... sample data ...
-    END_SAMPLE 30.23
-    ... etc.
-    BEGIN_SAMPLE 11695.47
-    END_SAMPLE 11695.47
-
-The first four lines (``JOB``, ``DATE``, ``SAMPLE_UNIT``,
-``VALUE_UNIT``) form a header. Each block of lines starting with
-``BEGIN_SAMPLE`` and ending with ``END_SAMPLE`` forms a single sample
-(you can think of this as a vertical slice of your heap profile). The
-hp2ps utility should accept any input with a properly-formatted header
-followed by a series of *complete* samples.
-
-Zooming in on regions of your profile
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-You can look at particular regions of your profile simply by loading a
-copy of the ``.hp`` file into a text editor and deleting the unwanted
-samples. The resulting ``.hp`` file can be run through ``hp2ps`` and
-viewed or printed.
-
-Viewing the heap profile of a running program
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The ``.hp`` file is generated incrementally as your program runs. In
-principle, running :command:`hp2ps` on the incomplete file should produce a
-snapshot of your program's heap usage. However, the last sample in the
-file may be incomplete, causing :command:`hp2ps` to fail. If you are using a
-machine with UNIX utilities installed, it's not too hard to work around
-this problem (though the resulting command line looks rather Byzantine):
-
-.. code-block:: sh
-
-    head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
-        | hp2ps > FOO.ps
-
-The command ``fgrep -n END_SAMPLE FOO.hp`` finds the end of every
-complete sample in ``FOO.hp``, and labels each sample with its ending
-line number. We then select the line number of the last complete sample
-using :command:`tail` and :command:`cut`. This is used as a parameter to :command:`head`; the
-result is as if we deleted the final incomplete sample from :file:`FOO.hp`.
-This results in a properly-formatted .hp file which we feed directly to
-:command:`hp2ps`.
-
-Viewing a heap profile in real time
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The :command:`gv` and :command:`ghostview` programs have a "watch file" option
-can be used to view an up-to-date heap profile of your program as it runs.
-Simply generate an incremental heap profile as described in the previous
-section. Run :command:`gv` on your profile:
-
-.. code-block:: sh
-
-      gv -watch -orientation=seascape FOO.ps
-
-If you forget the ``-watch`` flag you can still select "Watch file" from
-the "State" menu. Now each time you generate a new profile ``FOO.ps``
-the view will update automatically.
-
-This can all be encapsulated in a little script:
-
-.. code-block:: sh
-
-      #!/bin/sh
-      head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
-        | hp2ps > FOO.ps
-      gv -watch -orientation=seascape FOO.ps &
-      while [ 1 ] ; do
-        sleep 10 # We generate a new profile every 10 seconds.
-        head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
-          | hp2ps > FOO.ps
-      done
-
-Occasionally :command:`gv` will choke as it tries to read an incomplete copy of
-:file:`FOO.ps` (because :command:`hp2ps` is still running as an update occurs). A
-slightly more complicated script works around this problem, by using the
-fact that sending a SIGHUP to gv will cause it to re-read its input
-file:
-
-.. code-block:: sh
-
-      #!/bin/sh
-      head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
-        | hp2ps > FOO.ps
-      gv FOO.ps &
-      gvpsnum=$!
-      while [ 1 ] ; do
-        sleep 10
-        head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
-          | hp2ps > FOO.ps
-        kill -HUP $gvpsnum
-      done
-
 .. _prof-threaded:
 
 Profiling Parallel and Concurrent Programs
@@ -1968,10 +1912,9 @@ Notes about ticky profiling
   in some columns. For this reason using an eventlog-based approach should be prefered if
   possible.
 
-
 .. [1]
-   :ghc-flag:`-fprof-auto` was known as ``-auto-all`` prior to
-   GHC 7.4.1.
+   :rts-flag:`-hi` profiling is avaible with the normal runtime but you will need to
+   compile with :ghc-flag:`-finfo-table-map` to interpret the results.
 
 .. [2]
    Note that this policy has changed slightly in GHC 7.4.1 relative to



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/799b247e07a81af1a54d1bf5d6e3ecd1a6f020ba...c33a70fc240e197980465cb31ec20e4853c89341

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/799b247e07a81af1a54d1bf5d6e3ecd1a6f020ba...c33a70fc240e197980465cb31ec20e4853c89341
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230213/7a45adec/attachment-0001.html>


More information about the ghc-commits mailing list