performance regressions
John Lato
jwlato at gmail.com
Wed Dec 17 17:11:11 UTC 2014
Is it possible INLINE didn't inline the function because it's recursive? If
it were my function, I'd probably try a manual worker /wrapper.
On 07:59, Wed, Dec 17, 2014 Simon Peyton Jones <simonpj at microsoft.com>
wrote:
> I still would like to understand why INLINE does not make it inline.
> That's weird.
>
> Eg way to reproduce.
>
> Simion
>
> | -----Original Message-----
> | From: Richard Eisenberg [mailto:eir at cis.upenn.edu]
> | Sent: 17 December 2014 15:56
> | To: Simon Peyton Jones
> | Cc: Joachim Breitner; ghc-devs at haskell.org
> | Subject: Re: performance regressions
> |
> | By unsubstantiated guess is that INLINEABLE would have the same effect
> | as INLINE here, as GHC doesn't see fit to actually inline the
> | function, even with INLINE -- the big improvement seen between (1) and
> | (2) is actually specialization, not inlining. The jump from (2) to (3)
> | is actual inlining. Thus, it seems that GHC's heuristics for inlining
> | aren't working out for the best here.
> |
> | I've pushed my changes, though I agree with Simon that more research
> | may uncover even more improvements here. I didn't focus on the number
> | of calls because that number didn't regress. Will look into this soon.
> |
> | Richard
> |
> | On Dec 17, 2014, at 4:15 AM, Simon Peyton Jones
> | <simonpj at microsoft.com> wrote:
> |
> | > If you use INLINEABLE, that should make the function specialisable
> | to a particular monad, even if it's in a different module. You
> | shouldn't need INLINE for that.
> | >
> | > I don't understand the difference between cases (2) and (3).
> | >
> | > I am still suspicious of why there are so many calls to this one
> | function that it, alone, is allocating a significant proportion of
> | compilation of the entire run of GHC. Are you sure there isn't an
> | algorithmic improvement to be had, to simply reduce the number of
> | calls?
> | >
> | > Simon
> | >
> | > | -----Original Message-----
> | > | From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of
> | > | Richard Eisenberg
> | > | Sent: 16 December 2014 21:46
> | > | To: Joachim Breitner
> | > | Cc: ghc-devs at haskell.org
> | > | Subject: Re: performance regressions
> | > |
> | > | I've learned several very interesting things in this analysis.
> | > |
> | > | - Inlining polymorphic methods is very important. Here are some
> | > | data points to back up that claim:
> | > | * Original implementation using zipWithAndUnzipM:
> | 8,472,613,440
> | > | bytes allocated in the heap
> | > | * Adding {-# INLINE #-} to the definition thereof:
> | 6,639,253,488
> | > | bytes allocated in the heap
> | > | * Using `inline` at call site to force inlining:
> | 6,281,539,792
> | > | bytes allocated in the heap
> | > |
> | > | The middle step above allowed GHC to specialize zipWithAndUnzipM
> | to
> | > | my particular monad, but GHC didn't see fit to actually inline
> | the
> | > | function. Using `inline` forced it, to good effect. (I did not
> | > | collect data on code sizes, but it wouldn't be hard to.)
> | > |
> | > | By comparison:
> | > | * Hand-written recursion: 6,587,809,112 bytes allocated in
> | the
> | > | heap
> | > | Interestingly, this is *not* the best result!
> | > |
> | > | Conclusion: We should probably add INLINE pragmas to Util and
> | > | MonadUtils.
> | > |
> | > |
> | > | - I then looked at rejiggering the algorithm to keep the common
> | > | case fast. This had a side effect of changing the
> | zipWithAndUnzipM
> | > | to mapAndUnzipM, from Control.Monad. To my surprise, this brought
> | > | disaster!
> | > | * Using `inline` and mapAndUnzipM: 7,463,047,432 bytes
> | > | allocated in the heap
> | > | * Hand-written recursion: 5,848,602,848 bytes
> | > | allocated in the heap
> | > |
> | > | That last number is better than the numbers above because of the
> | > | algorithm streamlining. But, the inadequacy of mapAndUnzipM
> | > | surprised me -- it already has an INLINE pragma in Control.Monad
> | of course.
> | > | Looking at -ddump-simpl, it seems that mapAndUnzipM was indeed
> | > | getting inlined, but a call to `map` remained, perhaps causing
> | > | extra allocation.
> | > |
> | > | Conclusion: We should examine the implementation of mapAndUnzipM
> | > | (and similar functions) in Control.Monad. Is it as fast as
> | possible?
> | > |
> | > |
> | > |
> | > | In the end, I was unable to bring the allocation numbers down to
> | > | where they were before my work. This is because the flattener now
> | > | deals in roles. Most of its behavior is the same between nominal
> | > | and representational roles, so it seems silly (though very
> | > | possible) to specialize the code to nominal to keep that path
> | fast.
> | > | Instead, I identified one key spot and made that go fast.
> | > |
> | > | Thus, there is a 7% bump to memory usage on very-type-family-
> | heavy
> | > | code, compared to before my commit on Friday. (On more ordinary
> | > | code, there is no noticeable change.)
> | > |
> | > | Validating my patch locally now; will push when that's done.
> | > |
> | > | Thanks,
> | > | Richard
> | > |
> | > | On Dec 16, 2014, at 10:41 AM, Joachim Breitner <mail at joachim-
> | > | breitner.de> wrote:
> | > |
> | > | > Hi,
> | > | >
> | > | >
> | > | > Am Dienstag, den 16.12.2014, 09:59 -0500 schrieb Richard
> | Eisenberg:
> | > | >> On Dec 16, 2014, at 4:01 AM, Joachim Breitner <mail at joachim-
> | > | breitner.de> wrote:
> | > | >>
> | > | >>> another guess (without looking at the code, sorry): Are they
> | in
> | > | the >>> same module? I.e., can GHC specialize the code to your
> | > | particular Monad?
> | > | >
> | > | >> No, they're not in the same module. I could also try moving
> | the
> | > | >> zipWithAndUnzipM function to the same module, and even
> | > | specializing >> it by hand to the right monad.
> | > | >
> | > | > I did mean zipWithAndUnzipM, so maybe yes: Try that.
> | > | >
> | > | > (I find it hard to believe that any polymorphic monadic code
> | > | should > perform well, with those many calls to an unknown (>>=)
> | > | with a > function parameter, but maybe I'm too pessimistic here.)
> | > | > > >> Could that be preventing the fusing?
> | > | >
> | > | > There is not going to be any fusing here, at least not list
> | > | fusion; > that would require your code to be written in terms of
> | > | functions with > fusion rules.
> | > | >
> | > | > Greetings,
> | > | > Joachim
> | > | >
> | > | > --
> | > | > Joachim "nomeata" Breitner
> | > | > mail at joachim-breitner.de * http://www.joachim-breitner.de/ >
> | > | Jabber: nomeata at joachim-breitner.de * GPG-Key: 0xF0FBF51F Debian
> | > | > Developer: nomeata at debian.org > >
> | > | _______________________________________________
> | > | > ghc-devs mailing list
> | > | > ghc-devs at haskell.org
> | > | > http://www.haskell.org/mailman/listinfo/ghc-devs
> | > |
> | > | _______________________________________________
> | > | ghc-devs mailing list
> | > | ghc-devs at haskell.org
> | > | http://www.haskell.org/mailman/listinfo/ghc-devs
> | >
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20141217/c7b3eaec/attachment-0001.html>
More information about the ghc-devs
mailing list