[GHC] #1600: Optimisation: CPR the results of IO

Tue Dec 10 12:35:02 UTC 2013

#1600: Optimisation: CPR the results of IO
-------------------------------------+-------------------------------------
        Reporter:  simonmar          |            Owner:  nomeata
            Type:  task              |           Status:  new
        Priority:  lowest            |        Milestone:  7.6.2
       Component:  Compiler          |          Version:  6.6.1
      Resolution:                    |         Keywords:
Operating System:  Unknown/Multiple  |     Architecture:  Unknown/Multiple
 Type of failure:  Runtime           |       Difficulty:  Moderate (less
  performance bug                    |  than a day)
       Test Case:                    |       Blocked By:
        Blocking:                    |  Related Tickets:  #8598
-------------------------------------+-------------------------------------

Comment (by nomeata):

 The numbers are a bit more interesting if I enabled nested CPR inside
 ''unboxed'' tuples, i.e. in code involving IO or ST:

 {{{
 --------------------------------------------------------------------------------
         Program           Size    Allocs   Runtime   Elapsed  TotalMem
 --------------------------------------------------------------------------------
          banner          +2.1%     +0.1%      0.00      0.00     +0.0%
       compress2          +2.0%     +0.1%    +11.5%     +9.8%    +20.8%
          expert          +2.1%     +0.1%      0.00      0.00     +0.0%
        fibheaps          +2.0%     -0.3%      0.05      0.05     +0.0%
           fluid          +2.5%     +0.1%      0.01      0.01     +0.0%
          gamteb          +2.1%     -0.2%      0.06      0.06     +0.0%
            grep          +2.0%     +0.1%      0.00      0.00     +0.0%
           infer          +2.0%     -1.2%      0.09      0.09     +0.0%
    k-nucleotide          +1.5%     -6.9%     +0.1%     +0.2%     +0.0%
        maillist          +2.1%     -0.3%      0.10      0.12     +0.0%
             pic          +2.2%     -0.5%      0.01      0.01     +0.0%
            rfib          +2.1%     +0.1%      0.03      0.03     +0.0%
             scs          +2.3%     +0.2%     +1.0%     +1.4%     +0.0%
             tak          +2.1%     -0.1%      0.02      0.02     +0.0%
        treejoin          +2.1%     +0.1%     +0.0%     +0.0%     +0.0%
       wave4main          +2.1%    +11.3%     -0.5%     +0.0%     -7.1%
 --------------------------------------------------------------------------------
             Min          +1.5%     -6.9%    -13.2%    -13.3%    -33.3%
             Max          +2.5%    +11.3%    +16.5%    +16.0%    +20.8%
  Geometric Mean          +2.0%     +0.0%     +0.2%     +0.3%     -0.3%
 }}}

 One particular good result (`k-nucleotide`), and one bad `wave4main`, and
 otherwise a slight general improvement. The change in `k-nucleotide`’s
 core is too large to spot the reason for the improvement.

 Diffing the `-ddump-simpl` of `wave4main` shows only one change. It stems
 from this function
 {{{
 #!haskell
 tabulate :: (Int -> x) -> (Int, Int) -> Array Int x
 tabulate f (l,u) = array (l,u) [(i, f i) | i <- [l..u]]
 }}}
 where in the (inlined) `array` a worker for `go` gets its return type
 changed from `(# GHC.Prim.State# s_aTM, GHC.Arr.Array GHC.Types.Int x_aqE
 #)` to `(# GHC.Prim.State# s_aTM, GHC.Prim.Int#, GHC.Prim.Int#,
 GHC.Prim.Int#, GHC.Prim.Array# x_aqE #)`. Which looks good, but the worker
 is tail-recursive, and the boxing in the wrapper is not cancelled at the
 use site of `go`, so there is nothing gain by moving the constructor
 applications from the worker to the wrapper.

 But some isolated testing indicates that this costs 96 bytes of allocation
 per run, so I doubt that this is the main cause for the 11% increase;
 there might be something hidden in the libraries.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/1600#comment:33>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler