performance regressions

Richard Eisenberg eir at
Tue Dec 16 21:45:36 UTC 2014

I've learned several very interesting things in this analysis.

- Inlining polymorphic methods is very important. Here are some data points to back up that claim:
   * Original implementation using zipWithAndUnzipM:    8,472,613,440 bytes allocated in the heap
   * Adding {-# INLINE #-} to the definition thereof:   6,639,253,488 bytes allocated in the heap
   * Using `inline` at call site to force inlining:     6,281,539,792 bytes allocated in the heap

The middle step above allowed GHC to specialize zipWithAndUnzipM to my particular monad, but GHC didn't see fit to actually inline the function. Using `inline` forced it, to good effect. (I did not collect data on code sizes, but it wouldn't be hard to.)

By comparison:
   * Hand-written recursion:    6,587,809,112 bytes allocated in the heap
Interestingly, this is *not* the best result!

Conclusion: We should probably add INLINE pragmas to Util and MonadUtils.

- I then looked at rejiggering the algorithm to keep the common case fast. This had a side effect of changing the zipWithAndUnzipM to mapAndUnzipM, from Control.Monad. To my surprise, this brought disaster!
   * Using `inline` and mapAndUnzipM:        7,463,047,432 bytes allocated in the heap
   * Hand-written recursion:                 5,848,602,848 bytes allocated in the heap

That last number is better than the numbers above because of the algorithm streamlining. But, the inadequacy of mapAndUnzipM surprised me -- it already has an INLINE pragma in Control.Monad of course. Looking at -ddump-simpl, it seems that mapAndUnzipM was indeed getting inlined, but a call to `map` remained, perhaps causing extra allocation.

Conclusion: We should examine the implementation of mapAndUnzipM (and similar functions) in Control.Monad. Is it as fast as possible?

In the end, I was unable to bring the allocation numbers down to where they were before my work. This is because the flattener now deals in roles. Most of its behavior is the same between nominal and representational roles, so it seems silly (though very possible) to specialize the code to nominal to keep that path fast. Instead, I identified one key spot and made that go fast. 

Thus, there is a 7% bump to memory usage on very-type-family-heavy code, compared to before my commit on Friday. (On more ordinary code, there is no noticeable change.)

Validating my patch locally now; will push when that's done.


On Dec 16, 2014, at 10:41 AM, Joachim Breitner <mail at> wrote:

> Hi,
> Am Dienstag, den 16.12.2014, 09:59 -0500 schrieb Richard Eisenberg:
>> On Dec 16, 2014, at 4:01 AM, Joachim Breitner <mail at> wrote:
>>> another guess (without looking at the code, sorry): Are they in the same
>>> module? I.e., can GHC specialize the code to your particular Monad?
>> No, they're not in the same module. I could also try moving the
>> zipWithAndUnzipM function to the same module, and even specializing it
>> by hand to the right monad.
> I did mean zipWithAndUnzipM, so maybe yes: Try that.
> (I find it hard to believe that any polymorphic monadic code should
> perform well, with those many calls to an unknown (>>=) with a function
> parameter, but maybe I’m too pessimistic here.)
>> Could that be preventing the fusing?
> There is not going to be any fusing here, at least not list fusion; that
> would require your code to be written in terms of functions with fusion
> rules.
> Greetings,
> Joachim
> -- 
> Joachim “nomeata” Breitner
>  mail at joachim-breitner.de
>  Jabber: nomeata at  • GPG-Key: 0xF0FBF51F
>  Debian Developer: nomeata at
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at

More information about the ghc-devs mailing list