GHC memory usage when typechecking from source vs. loading ModIfaces
Erdi, Gergo
Gergo.Erdi at
Fri Mar 28 08:49:10 UTC 2025
Unfortunately, I am forced to return to this problem. Everything below
is now in the context of GHC 9.12 plus the mutable HPT patch
My test case is typechecking a tree of 2294 modules that form the
transitive closure of a single module's dependencies, all in a single
process. I have done this typechecking three times, here's what `+RTS
-s -RTS` reports for max residency:
* "cold": With no on-disk `ModIface` files, i.e. from scratch: 537 MB
* "cold-top": With all `ModIface`s already on disk, except for the
single top-level module: 302 MB
* "warm": With all `ModIface`s already on disk: 211 MB
So my stupidly naive question is, why is the "cold" case also not 302
In earlier discussion, `ModDetails` unfolding has come up. Dehydrating
`ModDetails` in the HPT all the time is disastrous for runtime, but
based on this model I would expect to see improvements from
dehydrating "every now and then". So I tried a stupid simple example
where after every 100th typechecked module, I run this function on the
topologically sorted list of modules processed so far:
dehydrateHpt :: HscEnv -> [ModuleName] -> IO ()
dehydrateHpt hsc_env mods = do
let HPT{ table = hptr } = hsc_HPT hsc_env
hpt <- readIORef hptr
for_ mods \mod -> for_ (lookupUDFM hpt mod) \(HomeModInfo iface _details _linkable) -> do
!details <- initModDetails hsc_env iface
pure ()
Buuut the max residency is still 534 MB (see "cold-dehydrate"); in
fact, the profile looks exactly the same.
Speaking of the profile, in the "cold" case I see a lot of steadily
increasing heap usage from the `ModuleGraph`. I could see this
happening if typechecking from scratch involves more `modulesUnder`
calls which in turn force more and more of the `ModuleGraph`. If so,
then maybe this could be worked around by repeatedly remaking the
`ModuleGraph` just like I remake the `ModDetails` above. I tried
getting rid of this effect by `deepseq`'ing the `ModuleGraph` at the
start, with the idea being that this should "equalize" the three
scenarios if this really is a substantial source of extra memory
usage. This pushes up the warm case's memory usage to 381 MB, which is
promising, but I still see a `Word64Map` that is steadily increasing
in the "cold-force-modulegraph" case and contributes a lot to the
memory usage. Unfortunately, I don't know where that `Word64Map` is
(it could be any `Unique`-keyed environment...).
So I am now stuck at this point. To spell out my goal explicitly, I
would like to typecheck one module after another and not keep anything
more in memory around than if I loaded them from `ModIface` files.
p.s.: I couldn't find a way in the EventLog
output HTML to turn event markers on/off or filter them, so to avoid
covering the whole graph with gray lines, I mark only every 100th
From: Matthew Pickering <matthewtpickering at>
Sent: Wednesday, February 12, 2025 7:08 PM
To: ÉRDI Gergő <gergo at>
Cc: Erdi, Gergo <Gergo.Erdi at>; Zubin Duggal <zubin at>; Montelatici, Raphael Laurent <Raphael.Montelatici at>; GHC Devs <ghc-devs at>
Subject: [External] Re: GHC memory usage when typechecking from source vs. loading ModIfaces
You do also raise a good point about rehydration costs.
In oneshot mode, you are basically rehydrating the entire transitive closure of each module when you compile it, which obviously results in a large amount of repeated work. This is why people are investigating ideas of a persistent worker to at least avoid rehydrating all external dependencies as well.
On Mon, Feb 10, 2025 at 12:13 PM Matthew Pickering <mailto:matthewtpickering at> wrote:
Sure, you can remove them once you are sure they are not used anymore.
For clients like `GHCi` that doesn't work obviously as they can be used at any point in the future but for a batch compiler it would be fine.
On Mon, Feb 10, 2025 at 11:56 AM ÉRDI Gergő <mailto:gergo at> wrote:
On Mon, 10 Feb 2025, Matthew Pickering wrote:
> I wonder if you have got your condition the wrong way around.
> The only "safe" time to perform rehydration is AFTER the point it can never be used
> again.
> If you rehydrate it just before it is used then you will repeat work which has already
> been done. If you do this, you will always have a trade-off between space used and
> runtime.
Oops. Yes, I have misunderstood the idea. I thought the idea was that
after loading a given module into the HPT, its ModDetails would start
out small (because of laziness) and then keep
growing in size as more and more of it are traversed, and thus
forced, during the typechecking of its dependees, so at some point we
would want to reset that into the small initial representation as created
by initModDetails.
But if the idea is that I should rehydrate modules when they can't be used
anymore, then that brings up the question why even do that, instead of
straight removing the HomeModInfos from the HPT?
This email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please delete all copies and notify the sender immediately. You may wish to refer to the incorporation details of Standard Chartered PLC, Standard Chartered Bank and their subsidiaries together with Standard Chartered Bank’s Privacy Policy via our public website.
This email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please delete all copies and notify the sender immediately. You may wish to refer to the incorporation details of Standard Chartered PLC, Standard Chartered Bank and their subsidiaries together with Standard Chartered Bank’s Privacy Policy via our main Standard Chartered PLC (UK) website at sc. com
-------------- next part --------------
A non-text attachment was scrubbed...
Type: application/x-zip-compressed
Size: 5312675 bytes
URL: <>
More information about the ghc-devs
mailing list