[GHC] #15103: Speed optimizations for elimCommonBlocks

Mon Apr 30 12:25:23 UTC 2018

#15103: Speed optimizations for elimCommonBlocks
-------------------------------------+-------------------------------------
           Reporter:  AndreasK       |             Owner:  (none)
               Type:  task           |            Status:  new
           Priority:  normal         |         Milestone:  8.6.1
          Component:  Compiler       |           Version:  8.5
           Keywords:                 |  Operating System:  Unknown/Multiple
       Architecture:                 |   Type of failure:  None/Unknown
  Unknown/Multiple                   |
          Test Case:                 |        Blocked By:
           Blocking:                 |   Related Tickets:
Differential Rev(s):  Phab:D4597     |         Wiki Page:
-------------------------------------+-------------------------------------
 == Use toBlockList instead of revPostorder.

 Block elimination works on a given Cmm graph by:
  * Getting a list of blocks.
  * Looking for duplicates in these blocks.
  * Removing all but one instance of duplicates.

 There are two (reasonable) ways to get the list of blocks.
  * The fast way: `toBlockList`:
  This just flattens the underlying map into a list.
  * The convenient way: `revPostorder`:
  Start at the entry label, scan for reachable blocks and return only
  these. This has the advantage of removing all dead code.

 If there is dead code the later is better. Work done on unreachable blocks
 is clearly wasted work. However by the point we run the
 common block elimination pass the input graph already had all dead code
 removed. This is done during control flow optimization in CmmContFlowOpt
 which is our first Cmm pass.

 This means common block elimination is free to use toBlockList because
 revPostorder would return the same blocks. (Although in
 a different order).

 == Change the triemap used for grouping by a label list
   from `(TM.ListMap UniqDFM)` to `ListMap (GenMap IntMap)`.

 * Using GenMap offers leaf compression. Which is a trie optimization
 described
 by the Note [Compressed TrieMap] in CoreSyn/TrieMap.hs

 * Using a IntMap removes the overhead associated with UniqDFM.

 The reasoning why this is deterministic after the change:

 * IntMap is deterministic given the same keys.
 * Labels have a Int representation, so for the same Labels we get the
 same keys, hence the same result for each run.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15103>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler