[GHC] #14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees"

Mon Jan 8 22:27:41 UTC 2018

#14201: Implement ideas from "Compiling Pattern Matching to Good Decision Trees"
-------------------------------------+-------------------------------------
        Reporter:  AndreasK          |                Owner:  AndreasK
            Type:  feature request   |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.2.1
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by AndreasK):

 Thanks dfeuer. The imprecise exceptions paper

 ****
 == The bad ==

 * Compilation is slowed down too much currently. This should be fixable as
 I haven't optimized my code AT ALL yet.
 * For larger benchmarks advantages often disappear at -O2 (with
 microbencharms being faster). I'm not completly sure why.

 Possibly reasons I thought of:
 1. Performance reliant code is written with the current solution in mind
 2. The larger code size causes more cache misses
 3. There are some nasty edge cases

 1) This doesn't make this approach worse, but might be a reason why it
 makes little differences in performance sensitive libraries which I used
 for benchmarks.

 2/3) The larger code size is a given with this approach. I don't think
 it's a big issue outside of edge cases. However edge cases can blow up
 significantly.

 A good example the following which splits up the decision tree in a bad
 way.
 {{{
 f1 _ A E = 1
 f1 B _ _ = 2
 f1 C _ C = 3
 f1 A E A = 8
 f1 _ _ _ = 5
 }}}

 * We can't start at the third as we might not evaluate it, eg `f1 B B B`
 * We can't start at the first for the same reason, eg `f1 A A E`

 A perfect solution would switch between backtracking and tree based based
 on the pattern encountered.
 But maintaining both sounds like a nightmare and wouldn't be worth the
 trouble.

 There are also a few small things that I can still adjust which might
 enable a bit more CSE.
 But I'm not too optimistic that it will change much.

 == The good ==

 * Regular code at -O0/1 gets faster on average when the algorithm applies.
 * Strict code runs faster with tree matching even at -O2
 * The compiliation speed should be fixable
 * There are some small improvements still doable
   * Improve the heuristic which selects the column to match first
   * Change placement of the cleanup code which evaluates arguments not
 necessary for matching. (This is necessary to satisfy the
 exception/nontermination semantics)
 * I have enabled tree matching only for Vanilla Constructors as I had an
 issue with GADT's which didn't want to tackle at the time.

 == Where to go from here ==

 I plan to at least:
 * Try to enable tree matching on GADTs and see if any of the benchmarks I
 looked at change.
 * Play around with the clean up placement

 But I have to get my bachelor out of the way first in the next 1-2 Months.

 Then I will see how the above change the story at -O2.

 For O1/Strict code the speedup of a few % at time would already be worth
 it if I can reign the compile times in.

 But currently at O2 the speed is usually equal sometimes worse and
 slightly less often faster using tree matching.
 Assuming it stays that way changing it only makes sense if GHC ever gets
 profile guided optimization. As we then could chose the best column to
 match on based on actual data.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14201#comment:19>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler