performance regressions

Wed Dec 17 09:15:07 UTC 2014

If you use INLINEABLE, that should make the function specialisable to a particular monad, even if it's in a different module. You shouldn't need INLINE for that.

I don't understand the difference between cases (2) and (3).

I am still suspicious of why there are so many calls to this one function that it, alone, is allocating a significant proportion of compilation of the entire run of GHC.  Are you sure there isn't an algorithmic improvement to be had, to simply reduce the number of calls?

Simon

|  -----Original Message-----
|  From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of
|  Richard Eisenberg
|  Sent: 16 December 2014 21:46
|  To: Joachim Breitner
|  Cc: ghc-devs at haskell.org
|  Subject: Re: performance regressions
|  
|  I've learned several very interesting things in this analysis.
|  
|  - Inlining polymorphic methods is very important. Here are some data
|  points to back up that claim:
|     * Original implementation using zipWithAndUnzipM:    8,472,613,440
|  bytes allocated in the heap
|     * Adding {-# INLINE #-} to the definition thereof:   6,639,253,488
|  bytes allocated in the heap
|     * Using `inline` at call site to force inlining:     6,281,539,792
|  bytes allocated in the heap
|  
|  The middle step above allowed GHC to specialize zipWithAndUnzipM to my
|  particular monad, but GHC didn't see fit to actually inline the
|  function. Using `inline` forced it, to good effect. (I did not collect
|  data on code sizes, but it wouldn't be hard to.)
|  
|  By comparison:
|     * Hand-written recursion:    6,587,809,112 bytes allocated in the
|  heap
|  Interestingly, this is *not* the best result!
|  
|  Conclusion: We should probably add INLINE pragmas to Util and
|  MonadUtils.
|  
|  
|  - I then looked at rejiggering the algorithm to keep the common case
|  fast. This had a side effect of changing the zipWithAndUnzipM to
|  mapAndUnzipM, from Control.Monad. To my surprise, this brought
|  disaster!
|     * Using `inline` and mapAndUnzipM:        7,463,047,432 bytes
|  allocated in the heap
|     * Hand-written recursion:                 5,848,602,848 bytes
|  allocated in the heap
|  
|  That last number is better than the numbers above because of the
|  algorithm streamlining. But, the inadequacy of mapAndUnzipM surprised
|  me -- it already has an INLINE pragma in Control.Monad of course.
|  Looking at -ddump-simpl, it seems that mapAndUnzipM was indeed getting
|  inlined, but a call to `map` remained, perhaps causing extra
|  allocation.
|  
|  Conclusion: We should examine the implementation of mapAndUnzipM (and
|  similar functions) in Control.Monad. Is it as fast as possible?
|  
|  
|  
|  In the end, I was unable to bring the allocation numbers down to where
|  they were before my work. This is because the flattener now deals in
|  roles. Most of its behavior is the same between nominal and
|  representational roles, so it seems silly (though very possible) to
|  specialize the code to nominal to keep that path fast. Instead, I
|  identified one key spot and made that go fast.
|  
|  Thus, there is a 7% bump to memory usage on very-type-family-heavy
|  code, compared to before my commit on Friday. (On more ordinary code,
|  there is no noticeable change.)
|  
|  Validating my patch locally now; will push when that's done.
|  
|  Thanks,
|  Richard
|  
|  On Dec 16, 2014, at 10:41 AM, Joachim Breitner <mail at joachim-
|  breitner.de> wrote:
|  
|  > Hi,
|  >
|  >
|  > Am Dienstag, den 16.12.2014, 09:59 -0500 schrieb Richard Eisenberg:
|  >> On Dec 16, 2014, at 4:01 AM, Joachim Breitner <mail at joachim-
|  breitner.de> wrote:
|  >>
|  >>> another guess (without looking at the code, sorry): Are they in
|  the
|  >>> same module? I.e., can GHC specialize the code to your particular
|  Monad?
|  >
|  >> No, they're not in the same module. I could also try moving the
|  >> zipWithAndUnzipM function to the same module, and even specializing
|  >> it by hand to the right monad.
|  >
|  > I did mean zipWithAndUnzipM, so maybe yes: Try that.
|  >
|  > (I find it hard to believe that any polymorphic monadic code should
|  > perform well, with those many calls to an unknown (>>=) with a
|  > function parameter, but maybe I'm too pessimistic here.)
|  >
|  >
|  >> Could that be preventing the fusing?
|  >
|  > There is not going to be any fusing here, at least not list fusion;
|  > that would require your code to be written in terms of functions
|  with
|  > fusion rules.
|  >
|  > Greetings,
|  > Joachim
|  >
|  > --
|  > Joachim "nomeata" Breitner
|  >  mail at joachim-breitner.de * http://www.joachim-breitner.de/
|  >  Jabber: nomeata at joachim-breitner.de  * GPG-Key: 0xF0FBF51F  Debian
|  > Developer: nomeata at debian.org
|  >
|  > _______________________________________________
|  > ghc-devs mailing list
|  > ghc-devs at haskell.org
|  > http://www.haskell.org/mailman/listinfo/ghc-devs
|  
|  _______________________________________________
|  ghc-devs mailing list
|  ghc-devs at haskell.org
|  http://www.haskell.org/mailman/listinfo/ghc-devs