GHC memory usage when typechecking from source vs. loading ModIfaces
Matthew Pickering
matthewtpickering at gmail.com
Fri Mar 28 17:52:32 UTC 2025
Hi Gergo,
I quickly tried building `Cabal` with the master branch. There is precisely
1 ModuleGraph allocated for the home session, and precisely one loaded per
interface loaded into the EPS. No leaky behaviour like you can see in your
eventlogs.
It seems there are about 2000 live module graphs in your program, are you
doing something with the API to create this many?
Cheers,
Matt
On Fri, Mar 28, 2025 at 12:40 PM Matthew Pickering <
matthewtpickering at gmail.com> wrote:
> HI Gergo,
>
> Do you have a (synthetic?) reproducer? You have probably identified some
> memory leak. However, without any means to reproduce it becomes very
> difficult to investigate. I feel like we are getting into very precise
> details now, where speculating is not going to be so useful.
>
> It seems like this is an important thing for you and your company. Is
> there any budget to pay for some investigation? If that was the case then
> some effort could be made to create a synthetic producer and make the
> situation more robust going into the future if your requirements were
> precisely understood.
>
> Cheers,
>
> Matt
>
> On Fri, Mar 28, 2025 at 10:12 AM Erdi, Gergo <Gergo.Erdi at sc.com> wrote:
>
>> PUBLIC
>>
>> Just to add that I get the same "equalizing" behaviour (but in a more
>> "natural" way) if instead of deepseq-ing the ModuleGraph upfront, I just
>> call `hugInstancesBelow` before processing each module. So that's
>> definitely one source of extra memory usage. I wonder if it would be
>> possible to rebuild the ModuleGraph periodically (similar to the ModDetails
>> dehydration), or if there are references to it stored all over the place
>> from `HscEnv`s scattered around in closures etc. (basically the same
>> problem the HPT had before it was made into a mutable reference).
>>
>> -----Original Message-----
>> From: ghc-devs <ghc-devs-bounces at haskell.org> On Behalf Of Erdi, Gergo
>> via ghc-devs
>> Sent: Friday, March 28, 2025 4:49 PM
>> To: Matthew Pickering <matthewtpickering at gmail.com>; GHC Devs <
>> ghc-devs at haskell.org>
>> Cc: ÉRDI Gergő <gergo at erdi.hu>; Montelatici, Raphael Laurent <
>> Raphael.Montelatici at sc.com>; Dijkstra, Atze <Atze.Dijkstra at sc.com>
>> Subject: [External] Re: GHC memory usage when typechecking from source
>> vs. loading ModIfaces
>>
>> Hi,
>>
>> Unfortunately, I am forced to return to this problem. Everything below is
>> now in the context of GHC 9.12 plus the mutable HPT patch backported.
>>
>> My test case is typechecking a tree of 2294 modules that form the
>> transitive closure of a single module's dependencies, all in a single
>> process. I have done this typechecking three times, here's what `+RTS -s
>> -RTS` reports for max residency:
>>
>> * "cold": With no on-disk `ModIface` files, i.e. from scratch: 537 MB
>>
>> * "cold-top": With all `ModIface`s already on disk, except for the
>> single top-level module: 302 MB
>>
>> * "warm": With all `ModIface`s already on disk: 211 MB
>>
>> So my stupidly naive question is, why is the "cold" case also not 302 MB?
>>
>> In earlier discussion, `ModDetails` unfolding has come up. Dehydrating
>> `ModDetails` in the HPT all the time is disastrous for runtime, but based
>> on this model I would expect to see improvements from dehydrating "every
>> now and then". So I tried a stupid simple example where after every 100th
>> typechecked module, I run this function on the topologically sorted list of
>> modules processed so far:
>>
>>
>> ```
>> dehydrateHpt :: HscEnv -> [ModuleName] -> IO () dehydrateHpt hsc_env mods
>> = do
>> let HPT{ table = hptr } = hsc_HPT hsc_env
>> hpt <- readIORef hptr
>> for_ mods \mod -> for_ (lookupUDFM hpt mod) \(HomeModInfo iface
>> _details _linkable) -> do
>> !details <- initModDetails hsc_env iface
>> pure ()
>> ```
>>
>> Buuut the max residency is still 534 MB (see "cold-dehydrate"); in fact,
>> the profile looks exactly the same.
>>
>> Speaking of the profile, in the "cold" case I see a lot of steadily
>> increasing heap usage from the `ModuleGraph`. I could see this happening if
>> typechecking from scratch involves more `modulesUnder` calls which in turn
>> force more and more of the `ModuleGraph`. If so, then maybe this could be
>> worked around by repeatedly remaking the `ModuleGraph` just like I remake
>> the `ModDetails` above. I tried getting rid of this effect by `deepseq`'ing
>> the `ModuleGraph` at the start, with the idea being that this should
>> "equalize" the three scenarios if this really is a substantial source of
>> extra memory usage. This pushes up the warm case's memory usage to 381 MB,
>> which is promising, but I still see a `Word64Map` that is steadily
>> increasing in the "cold-force-modulegraph" case and contributes a lot to
>> the memory usage. Unfortunately, I don't know where that `Word64Map` is (it
>> could be any `Unique`-keyed environment...).
>>
>> So I am now stuck at this point. To spell out my goal explicitly, I would
>> like to typecheck one module after another and not keep anything more in
>> memory around than if I loaded them from `ModIface` files.
>>
>> Thanks,
>> Gergo
>>
>> p.s.: I couldn't find a way in the EventLog output HTML to turn event
>> markers on/off or filter them, so to avoid covering the whole graph with
>> gray lines, I mark only every 100th module.
>>
>>
>>
>>
>> From: Matthew Pickering <matthewtpickering at gmail.com>
>> Sent: Wednesday, February 12, 2025 7:08 PM
>> To: ÉRDI Gergő <gergo at erdi.hu>
>> Cc: Erdi, Gergo <Gergo.Erdi at sc.com>; Zubin Duggal <zubin at well-typed.com>;
>> Montelatici, Raphael Laurent <Raphael.Montelatici at sc.com>; GHC Devs <
>> ghc-devs at haskell.org>
>> Subject: [External] Re: GHC memory usage when typechecking from source
>> vs. loading ModIfaces
>>
>> You do also raise a good point about rehydration costs.
>>
>> In oneshot mode, you are basically rehydrating the entire transitive
>> closure of each module when you compile it, which obviously results in a
>> large amount of repeated work. This is why people are investigating ideas
>> of a persistent worker to at least avoid rehydrating all external
>> dependencies as well.
>>
>> On Mon, Feb 10, 2025 at 12:13 PM Matthew Pickering <mailto:
>> matthewtpickering at gmail.com> wrote:
>> Sure, you can remove them once you are sure they are not used anymore.
>>
>> For clients like `GHCi` that doesn't work obviously as they can be used
>> at any point in the future but for a batch compiler it would be fine.
>>
>> On Mon, Feb 10, 2025 at 11:56 AM ÉRDI Gergő <mailto:gergo at erdi.hu> wrote:
>> On Mon, 10 Feb 2025, Matthew Pickering wrote:
>>
>> > I wonder if you have got your condition the wrong way around.
>> >
>> > The only "safe" time to perform rehydration is AFTER the point it can
>> > never be used again.
>> >
>> > If you rehydrate it just before it is used then you will repeat work
>> > which has already been done. If you do this, you will always have a
>> > trade-off between space used and runtime.
>>
>> Oops. Yes, I have misunderstood the idea. I thought the idea was that
>> after loading a given module into the HPT, its ModDetails would start out
>> small (because of laziness) and then keep growing in size as more and more
>> of it are traversed, and thus forced, during the typechecking of its
>> dependees, so at some point we would want to reset that into the small
>> initial representation as created by initModDetails.
>>
>> But if the idea is that I should rehydrate modules when they can't be
>> used anymore, then that brings up the question why even do that, instead of
>> straight removing the HomeModInfos from the HPT?
>>
>> ----------------------------------------------------------------------
>> This email and any attachments are confidential and may also be
>> privileged. If you are not the intended recipient, please delete all copies
>> and notify the sender immediately. You may wish to refer to the
>> incorporation details of Standard Chartered PLC, Standard Chartered Bank
>> and their subsidiaries together with Standard Chartered Bank’s Privacy
>> Policy via our public website.
>>
>> ----------------------------------------------------------------------
>> This email and any attachments are confidential and may also be
>> privileged. If you are not the intended recipient, please delete all copies
>> and notify the sender immediately. You may wish to refer to the
>> incorporation details of Standard Chartered PLC, Standard Chartered Bank
>> and their subsidiaries together with Standard Chartered Bank’s Privacy
>> Policy via our main Standard Chartered PLC (UK) website at sc. com
>>
>> ----------------------------------------------------------------------
>> This email and any attachments are confidential and may also be
>> privileged. If you are not the intended recipient, please delete all copies
>> and notify the sender immediately. You may wish to refer to the
>> incorporation details of Standard Chartered PLC, Standard Chartered Bank
>> and their subsidiaries together with Standard Chartered Bank’s Privacy
>> Policy via our main Standard Chartered PLC (UK) website at sc. com
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20250328/9ba11ce3/attachment.html>
More information about the ghc-devs
mailing list