performance regressions

Tue Dec 16 14:59:20 UTC 2014

On Dec 16, 2014, at 2:59 AM, Jan Stolarek <jan.stolarek at p.lodz.pl> wrote:
> What does your algorithm look like when you write it directly? Something like this:
> 
> flatten_many fmode roles tys
> = unzip `liftM` mapM go (zip roles tys)
> where
>   go (Nominal,ty)          = flatten_one (fmode { fe_eq_rel = NomEq }) ty
>   go (Representational,ty) = flatten_one (fmode { fe_eq_rel = ReprEq }) ty
>   go (Phantom, ty)         = -- See Note [Phantoms in the flattener]
>                              return (ty, mkTcPhantomCo ty ty)
> 
> ?
> 
> Maybe this has something to do with `zipWithAndUnzipM` not being tail-recursive vs. direct version 
> being able to fuse intermediate lists?

My direct version is even uglier:

> flatten_many fmode roles tys
>   = go roles tys
>   where
>     go (Nominal:rs)          (ty:tys)
>       = do { (xi, co) <- flatten_one (setFEEqRel fmode NomEq)  ty
>            ; (xis, cos) <- go rs tys
>            ; return (xi:xis, co:cos) }
>     go (Representational:rs) (ty:tys)
>       = do { (xi, co) <- flatten_one (setFEEqRel fmode ReprEq) ty
>            ; (xis, cos) <- go rs tys
>            ; return (xi:xis, co:cos) }
>     go (Phantom:rs)          (ty:tys)
>       = do { (xis, cos) <- go rs tys
>            ; -- See Note [Phantoms in the flattener]
>              return (ty:xis, mkTcPhantomCo ty ty:cos) }
>     go _ _ = return ([], [])

I could refactor to make it better, but I would be worried that the version you wrote would suffer from the same problems as zipWithAndUnzipM. Will check to see, though.

On Dec 16, 2014, at 4:01 AM, Joachim Breitner <mail at joachim-breitner.de> wrote:

> another guess (without looking at the code, sorry): Are they in the same
> module? I.e., can GHC specialize the code to your particular Monad?

No, they're not in the same module. I could also try moving the zipWithAndUnzipM function to the same module, and even specializing it by hand to the right monad. Could that be preventing the fusing?

On Dec 16, 2014, at 8:49 AM, Simon Peyton Jones <simonpj at microsoft.com> wrote:

> That seems surprising.  I'd build a profiled compiler (GhcProfiled=YES) and see what it says.  
> 
> If it increases allocation by 30% overall, there must be a LOT of calls to this function.  Should there be so many?

I've been working from a profiled compiler. That's how I found that this function was the culprit -- it certainly wasn't my first guess!

Yes, there are A LOT of calls: 7,106,808 to be exact. (The test case is perf/compiler/T9872a; the function is flatten_many). The number of calls doesn't vary from before my commit, though, so the raw number isn't the problem -- it's the allocation.

I'll try turning some of these knobs to see where the difference is.

Thanks,
Richard