[GHC] #9630: compile-time performance regression (probably due to Generics)

Fri Dec 18 00:44:33 UTC 2015

#9630: compile-time performance regression (probably due to Generics)
-------------------------------------+-------------------------------------
        Reporter:  hvr               |                Owner:  simonpj
            Type:  bug               |               Status:  new
        Priority:  high              |            Milestone:  8.0.1
       Component:  Compiler          |              Version:  7.9
      Resolution:                    |             Keywords:  deriving-perf
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Compile-time      |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #9583, #10293     |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by ezyang):

 Here is a minimized test-case with no dependencies, which may be useful
 for diagnosing:

 {{{
 {-# LANGUAGE DeriveGeneric #-}
 {-# LANGUAGE TypeOperators #-}
 {-# OPTIONS_GHC -fno-warn-missing-methods #-}
 module Gen where

 import GHC.Generics
 import Control.Monad
 import Data.Monoid

 data PairS a = PairS a !(() -> ())

 newtype PutM a = Put { unPut :: PairS a }

 -- Use of this writer monad seems to be important; IO speeds it up
 type Put = PutM ()
 -- type Put = IO ()

 -- binary has INLINE pragmas on most of the instances but you can still
 -- trigger bad behavior without them.
 instance Functor PutM where
         fmap f m = Put $ let PairS a w = unPut m in PairS (f a) w

 -- Just to appease AMP
 instance Applicative PutM where
         pure  = return
         (<*>) = ap

 instance Monad PutM where
     return a = Put $ PairS a id

     m >>= k  = Put $
         let PairS a w  = unPut m
             PairS b w' = unPut (k a)
         in PairS b (w . w')

 class GBinary f where
     gput :: f t -> Put
     -- Forcing the dictionary to have two elements hurts
     -- the optimizer a lot.
     not_used :: f t

 instance GBinary a => GBinary (M1 i c a) where
     gput = gput . unM1

 instance Binary a => GBinary (K1 i a) where
     gput = put . unK1

 instance (GBinary a, GBinary b) => GBinary (a :*: b) where
     gput (x :*: y) = gput x >> gput y

 class Binary t where
     put :: t -> Put

 instance Binary () where
     put () = return ()

 data T = T () () () () () () () () () ()
            () () () () () () () () () ()
            () () () () () () () () () ()
            () () () () () () () () () ()
            () () () () () () () () () ()
            () () () () () () () () () ()
            () () () () () () () () () ()
            () () () () () () () () () ()
            () () () () () () () () () ()
            () () () () () () () () () ()
     deriving Generic

 -- Trigger specialization
 tput :: T -> Put
 tput = gput . from
 }}}

 On my machine, it takes 2.8s to build -O2, and 0.9s to build -O0. There
 are a few important ways you can tweak this:

 1. This also exhibits the "out-of-line more expensive" behavior; moving
 the code out into a separate module jumps compile time from 2.7s to 5.2s.

 2. If you replace `PutM ()` with `IO ()`, compile time goes from 2.7s to
 1.9s

 3. Removing `not_used`, pushes compile time from 2.7s to 1.5s. This DOES
 NOT stack with (2). So having to deal with dictionaries seems to make
 things work.

 I also wonder if this writer monad is actually leaking thunks, because
 apparently it's impossible to correctly implement writer without leaking.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9630#comment:22>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler