compilation of pattern-matching?

Thu Mar 26 04:58:14 EDT 2009

Lennart Augustsson wrote:
> When you tried switching Nil and Cons, did you try it on many examples?
> For a single example a 2-3% could be easily attributed to random
> effects like different instruction cache hit patterns.  If you get it
> consistently over several programs then it seems likely to mean
> something, but I'm not sure what.

I've seen cases where simply inserting a couple of nops in a hot function 
improved performance by a significant margin (>10%, IIRC).  The only theory 
I could come up with was that there were more branches in a cache line than 
the branch prediction cache on this processor could cope with.  I don't 
think it was merely an alignment issue.

Cheers,
	Simon

> On Wed, Mar 25, 2009 at 6:01 PM, Claus Reinke <claus.reinke at talk21.com> wrote:
>>> Indeed GHC does not attempt to retain the order of alternatives, although
>>> a) it might be possible to do so by paying more attention in numerous
>>> places
>>> b) GHC may do so already, by accident, in certain cases
>> That adds even more unpredictability. One thing that I don't want whenever I
>> have to care about performance is small changes
>> having odd/unexplainable effects (I vaguely recall a case where removing an
>> unused parameter from a recursion made the program
>> slower, or eliminating returned constructors by using continuations
>> made one inner-loop function faster, another slower..).
>> Lennart is of course right: even if GHC would respect the ordering indicated
>> in my source program, I might not be able to tune that source to make good
>> use of even a single target processor (I tried defining a foldl over a
>> user-defined List type, so that I could switch
>> the order of Nil/Cons, and hence the test order used by GHC, and found the
>> Nil-before-Cons order to be 2-3% faster for folding a
>> very long list than the Cons-before-Nil order I wanted), but it is very
>> frustrating if I'm not even given the chance because GHC
>> sorts the alternatives, not even according to its own interpretation
>> of branching performance, but completely arbitrarily!-)
>>
>>> * The issue at stake is a small one: not the *number of tests* but *which
>>> tests branch, and which fall through*.
>> Right on the issue, but I'm not quite sure how small it is: the test
>> case source I attached a few messages ago consistently showed one ordering
>> to be 5% faster than the other for the extreme case
>> of one test nearly always failing. There may well be more profitable
>> optimizations remaining to be implemented first - what disturbs me is that
>> Haskell code is full of conditionals and matches, which I tend to arrange
>> according to expected frequency, and GHC simply ignores all those hints.
>>
>> With the hint about branch prediction, I also found this old ticket (which
>> seems to have been waiting for the new backend, and
>> indicates rather larger performance differences):
>>
>>   Offer control  over branch prediction
>>   http://hackage.haskell.org/trac/ghc/ticket/849
>>
>>> * Simply ordering the equations doesn't really work, because pattern-match
>>> compilation will match an entire column at once:
>>>  f (x:xs) True = ...
>>>  f []     True = ...
>>>  f []     False = ...
>>>  f (x:xs) False = ...
>>> Which "order" should the (:)/[] test go in?
>> In the order indicated in the source!? The usual pattern-match
>> optimizations should not change that, they will just skip the two
>> '[]' cases if the list isn't empty, or use the constructor tag to jump
>> directly to a sub-column. Haskell specifies left-to-right, top-down.
>>
>>> * Not only does GHC currently not attempt to retain order, but for a
>>> particular order it makes no guarantees about which falls through.  For
>>> example, given
>>>       case ... of { A -> e1; C -> e2; B -> e3 }
>>> We might test for A and then
>>> either fall though to e1
>>> or     fall through to the test for C
>> That is the part I missed, and which might give the UNLIKELY
>> pragma, as suggested in #849, more expressive power than
>> plain clause ordering. However, since Haskell specifies a match
>> order, I don't see why that couldn't be used as the basis for
>> mapping to branches as well, with the clauses listed in decreasing
>> likelyhood, and GHC generating the branch predictions and fallthroughs to
>> match this information to the target processor characteristics?
>>
>>> * When the number of constructors is larger, instead of a linear sequence
>>> of tests, GHC may generate a table-jump; or a balanced tree of tests.
>> The table-jump would make all alternatives equally costly/fast,
>> with no penalty for adding infrequent alternatives, right? The
>> balanced tree sounds like one of the pattern-match state machines, and there
>> would still be room for representing expected frequency in terms of
>> tree-path/-rotation/-representation.
>>
>>> * Which plan performs best is tremendously architecture dependent, and may
>>> well vary a lot between different chips implementing the same instruction
>>> set.  It's a losing battle to fix the strategy in source code.
>>> * More promising might be to say "this is the hot branch".  That
>>> information about frequency could in principle be used by the back end to
>>> generate better code.  However, I am unsure how
>>>       a) to express this info in source code
>>>       b) retain it throughout optimisation
>>
>> So it should be specified in the source, after all, just in a way
>> that gives programmers room to express their knowledge while
>> leaving GHC free to implement that knowledge on the target.
>>
>> Things like the UNLIKELY pragma would seem useful, if
>> attached to decisions: unless GHC can optimize the whole decision away, it
>> will remain throughout optimization, and come out as some form of branch,
>> with the hint still attached.
>>
>> But UNLIKELY only covers the most common case
>> (marking alternatives with minimal expected frequency) -
>> if clause ordering was respected, relative frequencies of
>> alternatives could be specified without pragmas, just by
>> ordering pattern-match or conditional clauses according
>> to expected frequency.
>>
>>> Claus, if you think this thread is worth capturing, then do write a
>>> Commentary page, and I'll check its veracity.
>> Given the existence of #849, I've just linked this thread
>> from there.
>>
>> Claus
>>
>> _______________________________________________
>> Glasgow-haskell-users mailing list
>> Glasgow-haskell-users at haskell.org
>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>>
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users