GHC API: memory usage of loaded modules

Tue Dec 13 18:21:48 UTC 2016

Sorry about the delay, I got distracted by an unrelated memory leak.

On Tue, Nov 29, 2016 at 9:35 AM, Reid Barton <rwbarton at gmail.com> wrote:
> You'd probably find that you also want to, for example, type check the
> expressions that you are interpreting. The information needed to do so
> is not contained in your executable at all; it's in the .hi files that
> were built alongside your program and its dependencies, and the ones
> that came with the libraries bundled into GHC. I assume the in-memory
> representation of these interface files is not very efficient, and
> they probably account for a lot of the space usage of your program.

That's true, but the .hi files on disk take up about 20k of that 76mb.
If the .o files are loaded basically directly as binary, then that
would mean 20k of .hi files turn into around 124mb in memory, which is
quite an expansion.  But then there's all of the libraries I use and
then all their libraries... perhaps those need to be loaded too?  If
so, there is more than I'm counting.  I'm not sure how to count those,
since they're not included in the "upsweep" log msgs when you do a
GHC.load.

ghci itself takes about 200mb when it loads all that stuff, so I
imagine the memory use is "working as intended", not me just using the
API wrong.

> I'm not sure offhand, but perhaps using -fignore-interface-pragmas
> when you invoke the GHC API would reduce the amount of space used
> while loading interface files, and if you're using the bytecode
> interpreter then you probably don't care about any of the information
> it will discard (which mostly has to do with optimizations).

I tried it, and I recall at the time it helped, but now it's being
exactly the same, whether I try with ghci or my own GHC API using
program.  E.g. I have:

memory_used :: IO Bytes
memory_used = do
    System.Mem.performMajorGC
    stats <- GHC.Stats.getGCStats
    return $ Bytes $ fromIntegral $ GHC.Stats.currentBytesUsed stats

in a module that loads a lot of stuff.  When I run that with ghci or
ghci -fignore-interface-pragmas, memory use is about the same.

> If you build your executable dynamically then the GHC API should also
> reuse the same shared libraries and executable image rather than
> loading a second copy of the object code. If that doesn't work then it
> would be helpful if you could produce a minimal reproducer of it not
> working. (The potential disadvantage is that you have to load the
> entirety of each of your dependencies, rather than just the parts you
> actually use.)

I do build dynamically, since it's the only option nowadays to load .o
files, but I guess what you mean is link the application as a shared
library, and then link it to the Main module for the app, and pass it
to GHC.parseDynamicFlags for the REPL?  That's a good idea.  But I'd
still be loading all those .hi files, and if the majority of the
memory use is actually from those, it might not help, right?

I don't fully understand the "have to load the entirety of your
dependencies" part.  If I'm using the same code linked into the main
application, then isn't it a given that I'm loading everything in the
application in the first place?  Or do you mean load all the .hi
files, even if I'm not exposing functions from them?  If the size of
in-memory .hi files dwarfs the binary size, then that might be a net
lose.  Though if my guess is correct about most .hi files being loaded
from external packages, then maybe there won't be much difference.