GHC API: memory usage of loaded modules

Tue Nov 29 17:35:27 UTC 2016

On Mon, Nov 28, 2016 at 8:54 PM, Evan Laforge <qdunkan at gmail.com> wrote:
> I have a program that uses the GHC API to provide a REPL.  It winds up
> taking up 200mb in RAM, as measured by GHC.Stats.currentBytesUsed, but
> without the GHC API it's 5mb.  If I turn on verbose, I can see that
> GHC is loading 255 modules, all loaded binary ("skipping M ( M.hs,
> M.hs.o )") except the toplevel, and the memory use is zooming up as it
> loads them.
>
> I expect some memory usage from loading modules, but 195mb seems like
> a lot.  If I do a 'du' on the entire obj directory (which has 401
> *.hs.o files... the REPL doesn't expose everything), it's only 76mb on
> disk.  How do loaded modules wind up consuming space, and is there any
> way to use less space?
>
> The thing is, all those loaded modules are part of the application
> itself, so presumably they've already been linked into the binary and
> loaded into memory.  The ideal would be that I could somehow reuse
> that.  I imagine that I could by writing my own haskell interpreter
> and making a big symbol table of all the callable functions, but I'd
> rather not write my own interpreter if I can use an existing one!

You'd probably find that you also want to, for example, type check the
expressions that you are interpreting. The information needed to do so
is not contained in your executable at all; it's in the .hi files that
were built alongside your program and its dependencies, and the ones
that came with the libraries bundled into GHC. I assume the in-memory
representation of these interface files is not very efficient, and
they probably account for a lot of the space usage of your program.

I'm not sure offhand, but perhaps using -fignore-interface-pragmas
when you invoke the GHC API would reduce the amount of space used
while loading interface files, and if you're using the bytecode
interpreter then you probably don't care about any of the information
it will discard (which mostly has to do with optimizations).

If you build your executable dynamically then the GHC API should also
reuse the same shared libraries and executable image rather than
loading a second copy of the object code. If that doesn't work then it
would be helpful if you could produce a minimal reproducer of it not
working. (The potential disadvantage is that you have to load the
entirety of each of your dependencies, rather than just the parts you
actually use.)

Regards,
Reid Barton