Loading GHC into GHCi (and ghcid)

Simon Marlow marlowsd at gmail.com
Fri Jun 8 07:29:03 UTC 2018

On 7 June 2018 at 22:25, Evan Laforge <qdunkan at gmail.com> wrote:

> On Thu, Jun 7, 2018 at 1:47 PM, Simon Marlow <marlowsd at gmail.com> wrote:
> > For loading large amounts of code into GHCi, you want to add -j<n> +RTS
> > -A128m where <n> is the number of cores on your machine. We've found that
> > parallel compilation works really well in GHCi provided you use a nice
> large
> > allocation area for the GC. This dramatically speeds up working with
> large
> > numbers of modules in GHCi. (500 is small!)
> This is a bit of a thread hijack (feel free to change the subject),
> but I also have a workflow that involves loading a lot of modules in
> ghci (500-700).  As long as I can coax ghci to load them, things are
> fast and work well, but my impression is that this isn't a common
> workflow, and specifically ghc developers don't do this, because just
> about every ghc release will break it in one way or another (e.g. by
> putting more flags in the recompile check hash), and no one seems to
> understand what I'm talking about when I suggest features to improve
> it (e.g. the recent msg about modtime and recompilation avoidance).
> Given the uphill battle, I've been thinking that linking most of those
> modules into a package and loading much fewer will be a better
> supported workflow.  It's actually less convenient, because now it's
> divided between package level (which require a restart and relink if
> they change) and ghci level (which don't), but is maybe less likely to
> be broken by ghc changes.  Also, all those loaded module consume a
> huge amount of memory, which I haven't tracked down yet, but maybe
> packages will load more efficiently.
> But ideally I would prefer to continue to not use packages, and in
> fact do per-module more aggressively for larger codebases, because the
> need to restart ghci (or the ghc API-using program) and do a lengthy
> relink every time a module in the "wrong place" changed seems like it
> could get annoying (in fact it already is, for a cabal-oriented
> workflow).
> Does the workflow at Facebook involve loading tons of individual
> modules as I do?

Yes, our workflow involves loading a large number of modules into GHCi.
However, we have run into memory issues, which was the reason for the
recent work on fixing this space leak: https://phabricator.haskell.org/D4659

As it is, this workflow is OK thanks to Bartosz' work on speedups for large
numbers of modules, tweaking the RTS flags as I mentioned and some other
fixes we've made in GHCi to avoid performance issues. (all of this is
upstream, incidentally).  There is probably low-hanging fruit to be had in
reducing the memory usage of GHCi, nobody has really attacked this with the
heap profiler for a while. However, I imagine at some point loading
everything into GHCi will become unsustainable and we'll have to explore
other strategies. There are a couple of options here:
- pre-compile modules so that GHCi is loading the .o instead of interpreted
- move some of the code into pre-compiled packages, as you mentioned


> Or do they get packed into packages?  If it's the
> many modules, do you have recommendations making that work well and
> keeping it working?  If packages are the way you're "supposed" to do
> things, then is there any idea about how hard it would be to reload
> packages at runtime?  If both modules and packages can be reloaded, is
> there an intended conceptual difference between a package and an
> unpackaged collection of modules?  To illustrate, I would put packages
> purely as a way to organize builds and distribution, and have no
> meaning at the compiler level, which is how I gather C compilers
> traditionally work (e.g. 'cc a.o b.o c.o' is the same as 'ar abc.a a.o
> b.o c.o; cc abc.a').  But that's clearly not how ghc sees it!
> thanks!
