GHC memory usage when typechecking from source vs. loading ModIfaces

Matthew Pickering matthewtpickering at gmail.com
Tue Apr 1 09:44:16 UTC 2025


Hi Gergo,

I looked in the detailed pane, searched for ModuleGraph, hovered my mouse
over the "ModuleGraph` constructor, recorded the number of live bytes,
divided that by 32.

Matt

On Tue, Apr 1, 2025 at 7:04 AM Erdi, Gergo <Gergo.Erdi at sc.com> wrote:

> PUBLIC
>
> OK scratch that, I was looking at wrong ghc-debug output. Indeed there are
> 2301 ModuleGraphs in the heap at the end of typechecking :O
>
>
>
> *From:* Erdi, Gergo
> *Sent:* Tuesday, April 1, 2025 12:51 PM
> *To:* Matthew Pickering <matthewtpickering at gmail.com>
> *Cc:* GHC Devs <ghc-devs at haskell.org>; ÉRDI Gergő <gergo at erdi.hu>;
> Montelatici, Raphael Laurent <Raphael.Montelatici at sc.com>; Dijkstra, Atze
> <Atze.Dijkstra at sc.com>
> *Subject:* Re: GHC memory usage when typechecking from source vs. loading
> ModIfaces
>
>
>
> This sounds extremely interesting, but I don’t understand where you are
> getting this number from! How do you see in the eventlog HTMLs that I’ve
> included that there are ~2000 ModuleGraphs? I’ve now tried using ghc-debug
> to find all ModuleGraph constructors at two points in the run: just before
> typechecking the first module (after all the extendMG calls) and just after
> typechecking the last module, and even in the cold case I only see 1
> ModuleGraph before and 13 ModuleGraphs after.
>
>
>
> Also, what do you mean by “precisely one loaded per interface loaded into
> the EPS”? Since my repro has 2294 modules, wouldn’t that mean 2294
> ModuleGraphs by that metric?
>
>
>
> *From:* Matthew Pickering <matthewtpickering at gmail.com>
> *Sent:* Saturday, March 29, 2025 1:53 AM
> *To:* Erdi, Gergo <Gergo.Erdi at sc.com>
> *Cc:* GHC Devs <ghc-devs at haskell.org>; ÉRDI Gergő <gergo at erdi.hu>;
> Montelatici, Raphael Laurent <Raphael.Montelatici at sc.com>; Dijkstra, Atze
> <Atze.Dijkstra at sc.com>
> *Subject:* [External] Re: GHC memory usage when typechecking from source
> vs. loading ModIfaces
>
>
>
>
>
> Hi Gergo,
>
>
>
> I quickly tried building `Cabal` with the master branch. There is
> precisely 1 ModuleGraph allocated for the home session, and precisely one
> loaded per interface loaded into the EPS. No leaky behaviour like you can
> see in your eventlogs.
>
>
>
> It seems there are about 2000 live module graphs in your program, are you
> doing something with the API to create this many?
>
>
>
> Cheers,
>
>
>
> Matt
>
>
>
> On Fri, Mar 28, 2025 at 12:40 PM Matthew Pickering <
> matthewtpickering at gmail.com> wrote:
>
> HI Gergo,
>
>
>
> Do you have a (synthetic?) reproducer? You have probably identified some
> memory leak. However, without any means to reproduce it becomes very
> difficult to investigate. I feel like we are getting into very precise
> details now, where speculating is not going to be so useful.
>
>
>
> It seems like this is an important thing for you and your company. Is
> there any budget to pay for some investigation? If that was the case then
> some effort could be made to create a synthetic producer and make the
> situation more robust going into the future if your requirements were
> precisely understood.
>
>
>
> Cheers,
>
>
>
> Matt
>
>
>
> On Fri, Mar 28, 2025 at 10:12 AM Erdi, Gergo <Gergo.Erdi at sc.com> wrote:
>
> PUBLIC
>
> Just to add that I get the same "equalizing" behaviour (but in a more
> "natural" way) if instead of deepseq-ing the ModuleGraph upfront, I just
> call `hugInstancesBelow` before processing each module. So that's
> definitely one source of extra memory usage. I wonder if it would be
> possible to rebuild the ModuleGraph periodically (similar to the ModDetails
> dehydration), or if there are references to it stored all over the place
> from `HscEnv`s scattered around in closures etc. (basically the same
> problem the HPT had before it was made into a mutable reference).
>
> -----Original Message-----
> From: ghc-devs <ghc-devs-bounces at haskell.org> On Behalf Of Erdi, Gergo
> via ghc-devs
> Sent: Friday, March 28, 2025 4:49 PM
> To: Matthew Pickering <matthewtpickering at gmail.com>; GHC Devs <
> ghc-devs at haskell.org>
> Cc: ÉRDI Gergő <gergo at erdi.hu>; Montelatici, Raphael Laurent <
> Raphael.Montelatici at sc.com>; Dijkstra, Atze <Atze.Dijkstra at sc.com>
> Subject: [External] Re: GHC memory usage when typechecking from source vs.
> loading ModIfaces
>
> Hi,
>
> Unfortunately, I am forced to return to this problem. Everything below is
> now in the context of GHC 9.12 plus the mutable HPT patch backported.
>
> My test case is typechecking a tree of 2294 modules that form the
> transitive closure of a single module's dependencies, all in a single
> process. I have done this typechecking three times, here's what `+RTS -s
> -RTS` reports for max residency:
>
> * "cold": With no on-disk `ModIface` files, i.e. from scratch: 537 MB
>
> * "cold-top": With all `ModIface`s already on disk, except for the
>   single top-level module: 302 MB
>
> * "warm": With all `ModIface`s already on disk: 211 MB
>
> So my stupidly naive question is, why is the "cold" case also not 302 MB?
>
> In earlier discussion, `ModDetails` unfolding has come up. Dehydrating
> `ModDetails` in the HPT all the time is disastrous for runtime, but based
> on this model I would expect to see improvements from dehydrating "every
> now and then". So I tried a stupid simple example where after every 100th
> typechecked module, I run this function on the topologically sorted list of
> modules processed so far:
>
>
> ```
> dehydrateHpt :: HscEnv -> [ModuleName] -> IO () dehydrateHpt hsc_env mods
> = do
>     let HPT{ table = hptr } = hsc_HPT hsc_env
>     hpt <- readIORef hptr
>     for_ mods \mod -> for_ (lookupUDFM hpt mod) \(HomeModInfo iface
> _details _linkable) -> do
>         !details <- initModDetails hsc_env iface
>         pure ()
> ```
>
> Buuut the max residency is still 534 MB (see "cold-dehydrate"); in fact,
> the profile looks exactly the same.
>
> Speaking of the profile, in the "cold" case I see a lot of steadily
> increasing heap usage from the `ModuleGraph`. I could see this happening if
> typechecking from scratch involves more `modulesUnder` calls which in turn
> force more and more of the `ModuleGraph`. If so, then maybe this could be
> worked around by repeatedly remaking the `ModuleGraph` just like I remake
> the `ModDetails` above. I tried getting rid of this effect by `deepseq`'ing
> the `ModuleGraph` at the start, with the idea being that this should
> "equalize" the three scenarios if this really is a substantial source of
> extra memory usage. This pushes up the warm case's memory usage to 381 MB,
> which is promising, but I still see a `Word64Map` that is steadily
> increasing in the "cold-force-modulegraph" case and contributes a lot to
> the memory usage. Unfortunately, I don't know where that `Word64Map` is (it
> could be any `Unique`-keyed environment...).
>
> So I am now stuck at this point. To spell out my goal explicitly, I would
> like to typecheck one module after another and not keep anything more in
> memory around than if I loaded them from `ModIface` files.
>
> Thanks,
>         Gergo
>
> p.s.: I couldn't find a way in the EventLog output HTML to turn event
> markers on/off or filter them, so to avoid covering the whole graph with
> gray lines, I mark only every 100th module.
>
>
>
>
> From: Matthew Pickering <matthewtpickering at gmail.com>
> Sent: Wednesday, February 12, 2025 7:08 PM
> To: ÉRDI Gergő <gergo at erdi.hu>
> Cc: Erdi, Gergo <Gergo.Erdi at sc.com>; Zubin Duggal <zubin at well-typed.com>;
> Montelatici, Raphael Laurent <Raphael.Montelatici at sc.com>; GHC Devs <
> ghc-devs at haskell.org>
> Subject: [External] Re: GHC memory usage when typechecking from source vs.
> loading ModIfaces
>
> You do also raise a good point about rehydration costs.
>
> In oneshot mode, you are basically rehydrating the entire transitive
> closure of each module when you compile it, which obviously results in a
> large amount of repeated work. This is why people are investigating ideas
> of a persistent worker to at least avoid rehydrating all external
> dependencies as well.
>
> On Mon, Feb 10, 2025 at 12:13 PM Matthew Pickering <mailto:
> matthewtpickering at gmail.com> wrote:
> Sure, you can remove them once you are sure they are not used anymore.
>
> For clients like `GHCi` that doesn't work obviously as they can be used at
> any point in the future but for a batch compiler it would be fine.
>
> On Mon, Feb 10, 2025 at 11:56 AM ÉRDI Gergő <mailto:gergo at erdi.hu> wrote:
> On Mon, 10 Feb 2025, Matthew Pickering wrote:
>
> > I wonder if you have got your condition the wrong way around.
> >
> > The only "safe" time to perform rehydration is AFTER the point it can
> > never be used again.
> >
> > If you rehydrate it just before it is used then you will repeat work
> > which has already been done. If you do this, you will always have a
> > trade-off between space used and runtime.
>
> Oops. Yes, I have misunderstood the idea. I thought the idea was that
> after loading a given module into the HPT, its ModDetails would start out
> small (because of laziness) and then keep growing in size as more and more
> of it are traversed, and thus forced, during the typechecking of its
> dependees, so at some point we would want to reset that into the small
> initial representation as created by initModDetails.
>
> But if the idea is that I should rehydrate modules when they can't be used
> anymore, then that brings up the question why even do that, instead of
> straight removing the HomeModInfos from the HPT?
>
> ----------------------------------------------------------------------
>
> ------------------------------
> This email and any attachments are confidential and may also be
> privileged. If you are not the intended recipient, please delete all copies
> and notify the sender immediately. You may wish to refer to the
> incorporation details of Standard Chartered PLC, Standard Chartered Bank
> and their subsidiaries together with Standard Chartered Bank’s Privacy
> Policy via our public website.
> ------------------------------
> This email and any attachments are confidential and may also be
> privileged. If you are not the intended recipient, please delete all copies
> and notify the sender immediately. You may wish to refer to the
> incorporation details of Standard Chartered PLC, Standard Chartered Bank
> and their subsidiaries together with Standard Chartered Bank’s Privacy
> Policy via our main Standard Chartered PLC (UK) website at sc. com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20250401/f6a1c04f/attachment.html>


More information about the ghc-devs mailing list