GHC API: memory usage of loaded modules

Wed Dec 14 16:21:07 UTC 2016

On Tue, Dec 13, 2016 at 1:21 PM, Evan Laforge <qdunkan at gmail.com> wrote:
> Sorry about the delay, I got distracted by an unrelated memory leak.
>
> On Tue, Nov 29, 2016 at 9:35 AM, Reid Barton <rwbarton at gmail.com> wrote:
>> You'd probably find that you also want to, for example, type check the
>> expressions that you are interpreting. The information needed to do so
>> is not contained in your executable at all; it's in the .hi files that
>> were built alongside your program and its dependencies, and the ones
>> that came with the libraries bundled into GHC. I assume the in-memory
>> representation of these interface files is not very efficient, and
>> they probably account for a lot of the space usage of your program.
>
> That's true, but the .hi files on disk take up about 20k of that 76mb.
> If the .o files are loaded basically directly as binary, then that
> would mean 20k of .hi files turn into around 124mb in memory, which is
> quite an expansion.  But then there's all of the libraries I use and
> then all their libraries... perhaps those need to be loaded too?  If
> so, there is more than I'm counting.  I'm not sure how to count those,
> since they're not included in the "upsweep" log msgs when you do a
> GHC.load.

GHCi definitely needs to load some .hi files of your dependencies.
Your .hi files contain the types of your functions, needed to type
check expressions that use them. Let's say the type of one of your
functions involves ByteString. Then GHCi has to read the interface
file that defines ByteString, so that there is something in the
compiler for the type of your function to refer to.

I'm not sure how to predict what exact set of .hi files GHCi will need
to load, but you could run your program under strace (or equivalent)
to see which .hi files it is loading. Then I would guess the expansion
factor when converting into the compiler's internal types is maybe
around 10x. However there's also some kind of lazy loading of .hi
files, and I'm not sure how that works or what granularity it has.

By the way, you can use `ghc --show-iface` to examine .hi files
manually, which might be illuminating.

>> If you build your executable dynamically then the GHC API should also
>> reuse the same shared libraries and executable image rather than
>> loading a second copy of the object code. If that doesn't work then it
>> would be helpful if you could produce a minimal reproducer of it not
>> working. (The potential disadvantage is that you have to load the
>> entirety of each of your dependencies, rather than just the parts you
>> actually use.)
>
> I do build dynamically, since it's the only option nowadays to load .o
> files, but I guess what you mean is link the application as a shared
> library, and then link it to the Main module for the app, and pass it
> to GHC.parseDynamicFlags for the REPL?  That's a good idea.  But I'd
> still be loading all those .hi files, and if the majority of the
> memory use is actually from those, it might not help, right?

I'm pretty sure the old way of linking your program statically, which
will cause the RTS to use its own linker to load .o files, is still
supposed to work. It has the same limitations it has always had, of
course. The new thing is that you need to build dynamically in order
to link object files into the ghc compiler itself; but that's just
because the ghc binary shipped in the binary distribution was built
dynamically; this isn't a constraint on your own GHC API use. (And you
can choose to build ghc statically, too. Windows builds still work
that way.)

I really just meant building your executable dynamically, i.e., with
-dynamic. If the code size is a small proportion of the total memory
use then it won't make a big difference, as you say. However, I'm not
sure that is really the case considering that the GHC library itself
is already about 74 MB on-disk.

I'm not sure why you are looking at the GHC.Stats.currentBytesUsed
number; be aware that it only measures the size of the GCed heap. Many
things that contribute to the total memory usage of your program (such
as its code size, or anything allocated by malloc or mmap) will not
show up there.

> I don't fully understand the "have to load the entirety of your
> dependencies" part.  If I'm using the same code linked into the main
> application, then isn't it a given that I'm loading everything in the
> application in the first place?

Let me explain what I meant with an example. If I build a hello world
program statically, I get a 1.2M executable. Let's assume most of that
size comes from the base package. If I build the same hello world
program dynamically, I get an 18K executable dynamically linked
against an 11M base shared library! At runtime, the dynamic loader
will map that whole 11M file into my process's memory space. Whether
you want to count that as part of the space usage of your program is
up to you; the code segments will be shared between multiple
simultaneous instances of your program (or other programs compiled by
GHC), but if you only run one copy of your program at a time, that
doesn't help you. It certainly won't be counted by currentBytesUsed.

The base library is composed of many individual .o files. When I
linked the hello world statically, the linker took only the .o files
that were actually needed for my program, which is why it was only
1.2M when the base library is 11M. Your real program probably uses
most of base, but may have other dependencies that you use only a
small part of (lens?)

Now when you use the RTS linker in a statically linked program,
although some of the code you need is linked into your program
already, it's not in a usable form for the RTS linker, so it has to
load the .o files itself, effectively creating a second copy of the
code. If you used dynamic linking, then the RTS calls dlopen which
should reuse the mappings that were made when your program was loaded.
The tradeoff is that if you use very little of your dependencies then
it still might be cheaper to store two copies of only the code that
you actually do use.

Regards,
Reid Barton