GHC memory usage when typechecking from source vs. loading ModIfaces

Matthew Pickering matthewtpickering at gmail.com
Wed Apr 2 09:38:34 UTC 2025


What command do I run to generate the files from this patch file? Perhaps a
link to a git repo would be a suitable way to share the reproducer?

On Wed, Apr 2, 2025 at 10:26 AM Erdi, Gergo <Gergo.Erdi at sc.com> wrote:

> PUBLIC
>
> Hi Matt,
>
>
>
> I think I have something that might demonstrate that GHC (at least GHC
> 9.12.1) might have a similar problem!
>
>
>
> With the attached vacuous module hierarchy, I tried compiling M2294 from
> scratch, and then with `.hi` files for everything except the toplevel
> module. I did the same with our GHC-API-using compiler as well. As you can
> see from the attached event logs, while the details differ, the overall
> shape of the memory used by ModuleGraph edges (750k of GWIB and
> NodeKey_Module constructors for the 2321 ModuleNodes and ~60k direct
> dependency edges) is pretty much the same between our compiler and GHC
> 9.12, suggesting to me that GHC is duplicating ModuleGraph node information
> in the dependency edges when building the transitive closure.
>
>
>
> Based on these measurements, do you agree that this is a GHC-side problem
> of memory usage scaling quadratically with the number of dependency edges?
>
>
>
> Thanks,
>
>             Gergo
>
>
>
> p.s.: Sorry for including the reproducer module tree in this weird format
> as a patch file, but I am behind a mail server that won’t let me send mails
> with too many individual files in attached archives…
>
>
>
> *From:* Matthew Pickering <matthewtpickering at gmail.com>
> *Sent:* Friday, March 28, 2025 8:40 PM
> *To:* Erdi, Gergo <Gergo.Erdi at sc.com>
> *Cc:* GHC Devs <ghc-devs at haskell.org>; ÉRDI Gergő <gergo at erdi.hu>;
> Montelatici, Raphael Laurent <Raphael.Montelatici at sc.com>; Dijkstra, Atze
> <Atze.Dijkstra at sc.com>
> *Subject:* [External] Re: GHC memory usage when typechecking from source
> vs. loading ModIfaces
>
>
>
> HI Gergo,
>
>
>
> Do you have a (synthetic?) reproducer? You have probably identified some
> memory leak. However, without any means to reproduce it becomes very
> difficult to investigate. I feel like we are getting into very precise
> details now, where speculating is not going to be so useful.
>
>
>
> It seems like this is an important thing for you and your company. Is
> there any budget to pay for some investigation? If that was the case then
> some effort could be made to create a synthetic producer and make the
> situation more robust going into the future if your requirements were
> precisely understood.
>
>
>
> Cheers,
>
>
>
> Matt
>
>
>
> On Fri, Mar 28, 2025 at 10:12 AM Erdi, Gergo <Gergo.Erdi at sc.com> wrote:
>
> PUBLIC
>
> Just to add that I get the same "equalizing" behaviour (but in a more
> "natural" way) if instead of deepseq-ing the ModuleGraph upfront, I just
> call `hugInstancesBelow` before processing each module. So that's
> definitely one source of extra memory usage. I wonder if it would be
> possible to rebuild the ModuleGraph periodically (similar to the ModDetails
> dehydration), or if there are references to it stored all over the place
> from `HscEnv`s scattered around in closures etc. (basically the same
> problem the HPT had before it was made into a mutable reference).
>
> -----Original Message-----
> From: ghc-devs <ghc-devs-bounces at haskell.org> On Behalf Of Erdi, Gergo
> via ghc-devs
> Sent: Friday, March 28, 2025 4:49 PM
> To: Matthew Pickering <matthewtpickering at gmail.com>; GHC Devs <
> ghc-devs at haskell.org>
> Cc: ÉRDI Gergő <gergo at erdi.hu>; Montelatici, Raphael Laurent <
> Raphael.Montelatici at sc.com>; Dijkstra, Atze <Atze.Dijkstra at sc.com>
> Subject: [External] Re: GHC memory usage when typechecking from source vs.
> loading ModIfaces
>
> Hi,
>
> Unfortunately, I am forced to return to this problem. Everything below is
> now in the context of GHC 9.12 plus the mutable HPT patch backported.
>
> My test case is typechecking a tree of 2294 modules that form the
> transitive closure of a single module's dependencies, all in a single
> process. I have done this typechecking three times, here's what `+RTS -s
> -RTS` reports for max residency:
>
> * "cold": With no on-disk `ModIface` files, i.e. from scratch: 537 MB
>
> * "cold-top": With all `ModIface`s already on disk, except for the
>   single top-level module: 302 MB
>
> * "warm": With all `ModIface`s already on disk: 211 MB
>
> So my stupidly naive question is, why is the "cold" case also not 302 MB?
>
> In earlier discussion, `ModDetails` unfolding has come up. Dehydrating
> `ModDetails` in the HPT all the time is disastrous for runtime, but based
> on this model I would expect to see improvements from dehydrating "every
> now and then". So I tried a stupid simple example where after every 100th
> typechecked module, I run this function on the topologically sorted list of
> modules processed so far:
>
>
> ```
> dehydrateHpt :: HscEnv -> [ModuleName] -> IO () dehydrateHpt hsc_env mods
> = do
>     let HPT{ table = hptr } = hsc_HPT hsc_env
>     hpt <- readIORef hptr
>     for_ mods \mod -> for_ (lookupUDFM hpt mod) \(HomeModInfo iface
> _details _linkable) -> do
>         !details <- initModDetails hsc_env iface
>         pure ()
> ```
>
> Buuut the max residency is still 534 MB (see "cold-dehydrate"); in fact,
> the profile looks exactly the same.
>
> Speaking of the profile, in the "cold" case I see a lot of steadily
> increasing heap usage from the `ModuleGraph`. I could see this happening if
> typechecking from scratch involves more `modulesUnder` calls which in turn
> force more and more of the `ModuleGraph`. If so, then maybe this could be
> worked around by repeatedly remaking the `ModuleGraph` just like I remake
> the `ModDetails` above. I tried getting rid of this effect by `deepseq`'ing
> the `ModuleGraph` at the start, with the idea being that this should
> "equalize" the three scenarios if this really is a substantial source of
> extra memory usage. This pushes up the warm case's memory usage to 381 MB,
> which is promising, but I still see a `Word64Map` that is steadily
> increasing in the "cold-force-modulegraph" case and contributes a lot to
> the memory usage. Unfortunately, I don't know where that `Word64Map` is (it
> could be any `Unique`-keyed environment...).
>
> So I am now stuck at this point. To spell out my goal explicitly, I would
> like to typecheck one module after another and not keep anything more in
> memory around than if I loaded them from `ModIface` files.
>
> Thanks,
>         Gergo
>
> p.s.: I couldn't find a way in the EventLog output HTML to turn event
> markers on/off or filter them, so to avoid covering the whole graph with
> gray lines, I mark only every 100th module.
>
>
>
>
> From: Matthew Pickering <matthewtpickering at gmail.com>
> Sent: Wednesday, February 12, 2025 7:08 PM
> To: ÉRDI Gergő <gergo at erdi.hu>
> Cc: Erdi, Gergo <Gergo.Erdi at sc.com>; Zubin Duggal <zubin at well-typed.com>;
> Montelatici, Raphael Laurent <Raphael.Montelatici at sc.com>; GHC Devs <
> ghc-devs at haskell.org>
> Subject: [External] Re: GHC memory usage when typechecking from source vs.
> loading ModIfaces
>
> You do also raise a good point about rehydration costs.
>
> In oneshot mode, you are basically rehydrating the entire transitive
> closure of each module when you compile it, which obviously results in a
> large amount of repeated work. This is why people are investigating ideas
> of a persistent worker to at least avoid rehydrating all external
> dependencies as well.
>
> On Mon, Feb 10, 2025 at 12:13 PM Matthew Pickering <mailto:
> matthewtpickering at gmail.com> wrote:
> Sure, you can remove them once you are sure they are not used anymore.
>
> For clients like `GHCi` that doesn't work obviously as they can be used at
> any point in the future but for a batch compiler it would be fine.
>
> On Mon, Feb 10, 2025 at 11:56 AM ÉRDI Gergő <mailto:gergo at erdi.hu> wrote:
> On Mon, 10 Feb 2025, Matthew Pickering wrote:
>
> > I wonder if you have got your condition the wrong way around.
> >
> > The only "safe" time to perform rehydration is AFTER the point it can
> > never be used again.
> >
> > If you rehydrate it just before it is used then you will repeat work
> > which has already been done. If you do this, you will always have a
> > trade-off between space used and runtime.
>
> Oops. Yes, I have misunderstood the idea. I thought the idea was that
> after loading a given module into the HPT, its ModDetails would start out
> small (because of laziness) and then keep growing in size as more and more
> of it are traversed, and thus forced, during the typechecking of its
> dependees, so at some point we would want to reset that into the small
> initial representation as created by initModDetails.
>
> But if the idea is that I should rehydrate modules when they can't be used
> anymore, then that brings up the question why even do that, instead of
> straight removing the HomeModInfos from the HPT?
>
> ----------------------------------------------------------------------
> This email and any attachments are confidential and may also be
> privileged. If you are not the intended recipient, please delete all copies
> and notify the sender immediately. You may wish to refer to the
> incorporation details of Standard Chartered PLC, Standard Chartered Bank
> and their subsidiaries together with Standard Chartered Bank’s Privacy
> Policy via our public website.
>
> ----------------------------------------------------------------------
> This email and any attachments are confidential and may also be
> privileged. If you are not the intended recipient, please delete all copies
> and notify the sender immediately. You may wish to refer to the
> incorporation details of Standard Chartered PLC, Standard Chartered Bank
> and their subsidiaries together with Standard Chartered Bank’s Privacy
> Policy via our main Standard Chartered PLC (UK) website at sc. com
>
> ----------------------------------------------------------------------
> This email and any attachments are confidential and may also be
> privileged. If you are not the intended recipient, please delete all copies
> and notify the sender immediately. You may wish to refer to the
> incorporation details of Standard Chartered PLC, Standard Chartered Bank
> and their subsidiaries together with Standard Chartered Bank’s Privacy
> Policy via our main Standard Chartered PLC (UK) website at sc. com
>
> ------------------------------
> This email and any attachments are confidential and may also be
> privileged. If you are not the intended recipient, please delete all copies
> and notify the sender immediately. You may wish to refer to the
> incorporation details of Standard Chartered PLC, Standard Chartered Bank
> and their subsidiaries together with Standard Chartered Bank’s Privacy
> Policy via our main Standard Chartered PLC (UK) website at sc. com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20250402/1ed2a0c0/attachment.html>


More information about the ghc-devs mailing list