performance regressions

Wed Dec 17 15:55:57 UTC 2014

By unsubstantiated guess is that INLINEABLE would have the same effect as INLINE here, as GHC doesn't see fit to actually inline the function, even with INLINE -- the big improvement seen between (1) and (2) is actually specialization, not inlining. The jump from (2) to (3) is actual inlining. Thus, it seems that GHC's heuristics for inlining aren't working out for the best here.

I've pushed my changes, though I agree with Simon that more research may uncover even more improvements here. I didn't focus on the number of calls because that number didn't regress. Will look into this soon.

Richard

On Dec 17, 2014, at 4:15 AM, Simon Peyton Jones <simonpj at microsoft.com> wrote:

> If you use INLINEABLE, that should make the function specialisable to a particular monad, even if it's in a different module. You shouldn't need INLINE for that.
> 
> I don't understand the difference between cases (2) and (3).
> 
> I am still suspicious of why there are so many calls to this one function that it, alone, is allocating a significant proportion of compilation of the entire run of GHC.  Are you sure there isn't an algorithmic improvement to be had, to simply reduce the number of calls?
> 
> Simon
> 
> |  -----Original Message-----
> |  From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of
> |  Richard Eisenberg
> |  Sent: 16 December 2014 21:46
> |  To: Joachim Breitner
> |  Cc: ghc-devs at haskell.org
> |  Subject: Re: performance regressions
> |  
> |  I've learned several very interesting things in this analysis.
> |  
> |  - Inlining polymorphic methods is very important. Here are some data
> |  points to back up that claim:
> |     * Original implementation using zipWithAndUnzipM:    8,472,613,440
> |  bytes allocated in the heap
> |     * Adding {-# INLINE #-} to the definition thereof:   6,639,253,488
> |  bytes allocated in the heap
> |     * Using `inline` at call site to force inlining:     6,281,539,792
> |  bytes allocated in the heap
> |  
> |  The middle step above allowed GHC to specialize zipWithAndUnzipM to my
> |  particular monad, but GHC didn't see fit to actually inline the
> |  function. Using `inline` forced it, to good effect. (I did not collect
> |  data on code sizes, but it wouldn't be hard to.)
> |  
> |  By comparison:
> |     * Hand-written recursion:    6,587,809,112 bytes allocated in the
> |  heap
> |  Interestingly, this is *not* the best result!
> |  
> |  Conclusion: We should probably add INLINE pragmas to Util and
> |  MonadUtils.
> |  
> |  
> |  - I then looked at rejiggering the algorithm to keep the common case
> |  fast. This had a side effect of changing the zipWithAndUnzipM to
> |  mapAndUnzipM, from Control.Monad. To my surprise, this brought
> |  disaster!
> |     * Using `inline` and mapAndUnzipM:        7,463,047,432 bytes
> |  allocated in the heap
> |     * Hand-written recursion:                 5,848,602,848 bytes
> |  allocated in the heap
> |  
> |  That last number is better than the numbers above because of the
> |  algorithm streamlining. But, the inadequacy of mapAndUnzipM surprised
> |  me -- it already has an INLINE pragma in Control.Monad of course.
> |  Looking at -ddump-simpl, it seems that mapAndUnzipM was indeed getting
> |  inlined, but a call to `map` remained, perhaps causing extra
> |  allocation.
> |  
> |  Conclusion: We should examine the implementation of mapAndUnzipM (and
> |  similar functions) in Control.Monad. Is it as fast as possible?
> |  
> |  
> |  
> |  In the end, I was unable to bring the allocation numbers down to where
> |  they were before my work. This is because the flattener now deals in
> |  roles. Most of its behavior is the same between nominal and
> |  representational roles, so it seems silly (though very possible) to
> |  specialize the code to nominal to keep that path fast. Instead, I
> |  identified one key spot and made that go fast.
> |  
> |  Thus, there is a 7% bump to memory usage on very-type-family-heavy
> |  code, compared to before my commit on Friday. (On more ordinary code,
> |  there is no noticeable change.)
> |  
> |  Validating my patch locally now; will push when that's done.
> |  
> |  Thanks,
> |  Richard
> |  
> |  On Dec 16, 2014, at 10:41 AM, Joachim Breitner <mail at joachim-
> |  breitner.de> wrote:
> |  
> |  > Hi,
> |  >
> |  >
> |  > Am Dienstag, den 16.12.2014, 09:59 -0500 schrieb Richard Eisenberg:
> |  >> On Dec 16, 2014, at 4:01 AM, Joachim Breitner <mail at joachim-
> |  breitner.de> wrote:
> |  >>
> |  >>> another guess (without looking at the code, sorry): Are they in
> |  the
> |  >>> same module? I.e., can GHC specialize the code to your particular
> |  Monad?
> |  >
> |  >> No, they're not in the same module. I could also try moving the
> |  >> zipWithAndUnzipM function to the same module, and even specializing
> |  >> it by hand to the right monad.
> |  >
> |  > I did mean zipWithAndUnzipM, so maybe yes: Try that.
> |  >
> |  > (I find it hard to believe that any polymorphic monadic code should
> |  > perform well, with those many calls to an unknown (>>=) with a
> |  > function parameter, but maybe I'm too pessimistic here.)
> |  >
> |  >
> |  >> Could that be preventing the fusing?
> |  >
> |  > There is not going to be any fusing here, at least not list fusion;
> |  > that would require your code to be written in terms of functions
> |  with
> |  > fusion rules.
> |  >
> |  > Greetings,
> |  > Joachim
> |  >
> |  > --
> |  > Joachim "nomeata" Breitner
> |  >  mail at joachim-breitner.de * http://www.joachim-breitner.de/
> |  >  Jabber: nomeata at joachim-breitner.de  * GPG-Key: 0xF0FBF51F  Debian
> |  > Developer: nomeata at debian.org
> |  >
> |  > _______________________________________________
> |  > ghc-devs mailing list
> |  > ghc-devs at haskell.org
> |  > http://www.haskell.org/mailman/listinfo/ghc-devs
> |  
> |  _______________________________________________
> |  ghc-devs mailing list
> |  ghc-devs at haskell.org
> |  http://www.haskell.org/mailman/listinfo/ghc-devs
>