GHC memory usage when typechecking from source vs. loading ModIfaces

Fri Jan 17 08:07:38 UTC 2025

Hi,

I’m using the GHC API to typecheck 35,000 modules that form a complicated
dependency graph (with multiple top-level modules, i.e. there’s no single
“god module” that would transitively depend on everything else), and I
noticed that peak memory usage is wildly different when everything is done
from scratch vs. when everything is loaded from files containing ModIfaces:
17G vs. 8G. This ratio replicates for smaller samples as well, e.g. 80M vs
33M for 407 modules.

I’m aware of https://gitlab.haskell.org/ghc/ghc/-/issues/13586 and so when
I finish typechecking a module, I take the resulting ModIface and create
the ModDetails that ends up in the HomeUnitGraph from that. My
understanding of Matt’s original GHC fix in
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5478 is that it does
the same, i.e. it only makes a fresh ModDetails only once per module, after
the ModIface is ready.

But of course that still means that ModDetails can only keep growing as
more and more parts of it are used for typechecking more and more
dependants. Could that be the cause? I tried a crude experiment of “putting
the toothpaste back in the tube” by replacing all ModDetails with a fresh
one in the HUG after each finished typechecking , but that’s a complete
disaster for memory usage: even for the small 407 module example, the
memory usage shoots up to 1.5G. I can imagine it’s because imported Ids are
probably not shared anymore between different importer modules.

Any ideas on how I could improve memory usage in the from-scratch case, so
that it's more similar to the from-ModIface case?

Thanks,
            Gergo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20250117/4b308f4e/attachment.html>