[GHC] #7596: Opportunity to improve CSE

Thu Jan 17 14:58:11 CET 2013

#7596: Opportunity to improve CSE
---------------------------------+------------------------------------------
    Reporter:  simonpj           |       Owner:                  
        Type:  bug               |      Status:  new             
    Priority:  normal            |   Milestone:                  
   Component:  Compiler          |     Version:  7.6.1           
    Keywords:                    |          Os:  Unknown/Multiple
Architecture:  Unknown/Multiple  |     Failure:  None/Unknown    
  Difficulty:  Unknown           |    Testcase:                  
   Blockedby:                    |    Blocking:                  
     Related:                    |  
---------------------------------+------------------------------------------

Comment(by simonpj@…):

 commit 0831a12ea2fc73c33652eeec1adc79fa19700578
 {{{
 Author: Simon Peyton Jones <simonpj at microsoft.com>
 Date:   Thu Jan 17 10:54:07 2013 +0000

     Major patch to implement the new Demand Analyser

     This patch is the result of Ilya Sergey's internship at MSR.  It
     constitutes a thorough overhaul and simplification of the demand
     analyser.  It makes a solid foundation on which we can now build.
     Main changes are

     * Instead of having one combined type for Demand, a Demand is
        now a pair (JointDmd) of
           - a StrDmd and
           - an AbsDmd.
        This allows strictness and absence to be though about quite
        orthogonally, and greatly reduces brain melt-down.

     * Similarly in the DmdResult type, it's a pair of
          - a PureResult (indicating only divergence/non-divergence)
          - a CPRResult (which deals only with the CPR property

     * In IdInfo, the
         strictnessInfo field contains a StrictSig, not a Maybe StrictSig
         demandInfo     field contains a Demand, not a Maybe Demand
       We don't need Nothing (to indicate no strictness/demand info)
       any more; topSig/topDmd will do.

     * Remove "boxity" analysis entirely.  This was an attempt to
       avoid "reboxing", but it added complexity, is extremely
       ad-hoc, and makes very little difference in practice.

     * Remove the "unboxing strategy" computation. This was an an
       attempt to ensure that a worker didn't get zillions of
       arguments by unboxing big tuples.  But in fact removing it
       DRAMATICALLY reduces allocation in an inner loop of the
       I/O library (where the threshold argument-count had been
       set just too low).  It's exceptional to have a zillion arguments
       and I don't think it's worth the complexity, especially since
       it turned out to have a serious performance hit.

     * Remove quite a bit of ad-hoc cruft

     * Move worthSplittingFun, worthSplittingThunk from WorkWrap to
       Demand. This allows JointDmd to be fully abstract, examined
       only inside Demand.

     Everything else really follows from these changes.

     All of this is really just refactoring, so we don't expect
     big performance changes, but acutally the numbers look quite
     good.  Here is a full nofib run with some highlights identified:

             Program           Size    Allocs   Runtime   Elapsed  TotalMem
 --------------------------------------------------------------------------------
              expert          -2.6%    -15.5%      0.00      0.00     +0.0%
               fluid          -2.4%     -7.1%      0.01      0.01     +0.0%
                  gg          -2.5%    -28.9%      0.02      0.02    -33.3%
           integrate          -2.6%     +3.2%     +2.6%     +2.6%     +0.0%
             mandel2          -2.6%     +4.2%      0.01      0.01     +0.0%
            nucleic2          -2.0%    -16.3%      0.11      0.11     +0.0%
                para          -2.6%    -20.0%    -11.8%    -11.7%     +0.0%
              parser          -2.5%    -17.9%      0.05      0.05     +0.0%
              prolog          -2.6%    -13.0%      0.00      0.00     +0.0%
              puzzle          -2.6%     +2.2%     +0.8%     +0.8%     +0.0%
             sorting          -2.6%    -35.9%      0.00      0.00     +0.0%
            treejoin          -2.6%    -52.2%     -9.8%     -9.9%     +0.0%
 --------------------------------------------------------------------------------
                 Min          -2.7%    -52.2%    -11.8%    -11.7%    -33.3%
                 Max          -1.8%     +4.2%    +10.5%    +10.5%     +7.7%
      Geometric Mean          -2.5%     -2.8%     -0.4%     -0.5%     -0.4%

     Things to note

     * Binary sizes are smaller. I don't know why, but it's good.

     * Allocation is sometiemes a *lot* smaller. I believe that all the big
 numbers
       (I checked treejoin, gg, sorting) arise from one place, namely a
 function
       GHC.IO.Encoding.UTF8.utf8_decode, which is strict in two Buffers
 both of
       which have several arugments.  Not w/w'ing both arguments (which is
 what
       we did before) has a big effect.  So the big win in actually
 somewhat
       accidental, gained by removing the "unboxing strategy" code.

     * A couple of benchmarks allocate slightly more.  This turns out
       to be due to reboxing (integrate).  But the biggest increase is
       mandel2, and *that* turned out also to be a somewhat accidental
       loss of CSE, and pointed the way to doing better CSE: see Trac
       #7596.

     * Runtimes are never very reliable, but seem to improve very slightly.

     All in all, a good piece of work.  Thank you Ilya!

  compiler/basicTypes/Demand.lhs     | 1229
 +++++++++++++++++++++++++++++-------
  compiler/basicTypes/Id.lhs         |   65 +-
  compiler/basicTypes/IdInfo.lhs     |   54 +-
  compiler/basicTypes/MkId.lhs       |   45 +-
  compiler/coreSyn/CoreArity.lhs     |    8 +-
  compiler/coreSyn/CoreLint.lhs      |    7 +-
  compiler/coreSyn/CorePrep.lhs      |   24 +-
  compiler/coreSyn/CoreTidy.lhs      |    4 +-
  compiler/coreSyn/MkCore.lhs        |   11 +-
  compiler/coreSyn/PprCore.lhs       |   12 +-
  compiler/iface/BinIface.hs         |  100 +---
  compiler/iface/IfaceSyn.lhs        |   22 +-
  compiler/iface/MkIface.lhs         |    8 +-
  compiler/iface/TcIface.lhs         |   22 +-
  compiler/main/TidyPgm.lhs          |   28 +-
  compiler/prelude/primops.txt.pp    |   10 +-
  compiler/simplCore/FloatOut.lhs    |    8 +-
  compiler/simplCore/SetLevels.lhs   |   23 +-
  compiler/simplCore/SimplCore.lhs   |   21 +-
  compiler/simplCore/Simplify.lhs    |    5 +-
  compiler/specialise/SpecConstr.lhs |   30 +-
  compiler/stranal/DmdAnal.lhs       | 1077 ++++++++++---------------------
  compiler/stranal/WorkWrap.lhs      |   58 +--
  compiler/stranal/WwLib.lhs         |  131 +++--
  24 files changed, 1658 insertions(+), 1344 deletions(-)
 }}}

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/7596#comment:1>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler