Fwd: Is anything being done to remedy the soul crushing compile times of GHC?

Wed Feb 17 10:14:00 UTC 2016

Another large culprit for performance is that the fact that ghc --make
must preprocess and parse the header of every local Haskell file:
https://ghc.haskell.org/trac/ghc/ticket/618 (as well
as https://ghc.haskell.org/trac/ghc/ticket/1290).  Neil and I
have observed that when you use something better (like Shake)
recompilation performance gets a lot better, esp. when you
have a lot of modules.

Edward

Excerpts from Ben Gamari's message of 2016-02-17 00:58:43 -0800:
> Evan Laforge <qdunkan at gmail.com> writes:
> 
> > On Wed, Feb 17, 2016 at 4:38 AM, Ben Gamari <ben at smart-cactus.org> wrote:
> >> Multiple modules aren't a problem. It is dependencies on Hackage
> >> packages that complicate matters.
> >
> > I guess the problem is when ghc breaks a bunch of hackage packages,
> > you can't build with it anymore until those packages are updated,
> > which won't happen until after the release?
> >
> This is one issue, although perhaps not the largest. Here are some of
> the issues I can think of off the top of my head,
> 
>  * The issue you point out: Hackage packages need to be updated
> 
>  * Hackage dependencies mean that the performance of the testcase is now
>    dependent upon code over which we have no control. If a test's
>    performance improves is this because the compiler improved or merely
>    because a dependency of the testcase was optimized?
> 
>    Of course, you could maintain a stable fork of the dependency, but
>    at this point you might as well just take the pieces you need and
>    fold them into the testcase.
> 
>  * Hackage dependencies greatly complicate packaging. You need to
>    somehow download and install them. The obvious approach here is to
>    use cabal-install but it is unavailable during a GHC build
> 
>  * Hackage dependencies make it much harder to determine what the
>    compiler is doing. If I have a directory of modules, I can rebuild
>    all of them with `ghc --make -fforce-recomp`. Things are quite a bit
>    trickier when packages enter the picture.
> 
> In short, the whole packaging system really acts as nothing more than a
> confounding factor for performance analysis, in addition to making
> implementation quite a bit trickier.
> 
> That being said, developing another performance testsuite consisting of
> a set of larger, dependency-ful applications may be useful at some
> point. I think the first priority, however, should be nofib.
> 
> > From a certain point of view, this could be motivation to either break
> > fewer things, or to patch breaking dependents as soon as the breaking
> > patch goes into ghc.  Which doesn't sound so bad in theory.  Of course
> > someone would need to spend time doing boring maintenance, but it
> > seems that will be required regardless.  And ultimately someone has to
> > do it eventually.
> >
> Much of the effort necessary to bring Hackage up to speed with a new GHC
> release isn't due to breakage; it's just bumping version bounds. I'm
> afraid the GHC project really doesn't have the man-power to do this work
> consistently. We already owe hvr a significant amount of gratitude for
> handling so many of these issues leading up to the release.
> 
> > Of course, said person's boring time might be better spent directly
> > addressing known performance problems.
> >
> Indeed.
> 
> > My impression from the reddit thread is that three things are going on:
> >
> > 1 - cabal has quite a bit of startup overhead
> 
> Yes, it would be great if someone could step up to look at Cabal's
> performance. Running `cabal build` on an up-to-date tree of a
> moderately-sized (10 kLoC, 8 components, 60 modules) Haskell project I
> have laying around takes over 5 seconds from start-to-finish.
> 
> `cabal build`ing just a single executable component takes 4 seconds.
> This same executable takes 48 seconds for GHC to build from scratch with
> optimization and 12 seconds without.
> 
> > 2 - ghc takes a long time on certain inputs, e.g. long list literals.
> > There are probably already tickets for these.
> >
> Indeed, there are plenty of pathological cases. For better or worse,
> these are generally the "easier" performance problems to tackle.
> 
> > 3 - and of course, ghc can be just generally slow, in the million tiny
> > cuts sense.
> >
> And this is the tricky one. Beginning to tackle this will require that
> someone perform some very careful measurements on current and previous
> releases.
> 
> Performance issues are always on my and Austin's to-do list, but we are
> unfortunately rather limited in the amount of time we can spend on these
> due to funding considerations. As Simon mentioned, if someone would like
> to see this fixed and has money to put towards the cause, we would love
> to hear from you.
> 
> Cheers,
> 
> - Ben