[External] Re: GHC memory usage when typechecking from source vs. loading ModIfaces

Matthew Pickering matthewtpickering at gmail.com
Mon Feb 10 11:49:40 UTC 2025


Gergo,

I wonder if you have got your condition the wrong way around.

The only "safe" time to perform rehydration is AFTER the point it can never
be used again.

If you rehydrate it just before it is used then you will repeat work which
has already been done. If you do this, you will always have a trade-off
between space used and runtime.

PS: The eventlog2html bug with -p rendering is fixed by
https://github.com/mpickering/eventlog2html/pull/192 but you should just
generate an eventlog anyway with `-l`.

Cheers,

Matt

On Wed, Feb 5, 2025 at 7:03 AM Erdi, Gergo <Gergo.Erdi at sc.com> wrote:

> PUBLIC
>
> Hi Matt,
>
> Thanks for your help so far!
>
> One vacation later, I am back looking at this. Unfortunately, the latest
> results I am seeing only confuse me more.
>
> I have this small test load of a 4313 module forest that I am
> typechecking. The baseline resource usage, i.e. before any tricks about
> rehydrating the ModDetails in the HPT, is 1 GB maximum residency, 113s MUT
> time and 87s  GC time. My aim is to reduce the maximum residency with as
> little disruption as possible to the total runtime.
>
> My first test was the completely brute-force approach of rehydrating every
> single ModDetails in the HPT after typechecking every single module. Of
> course, this has catastrophic runtime performance, since I end up
> re-re-re-re-rehydrating every ModDetails for a total of 8,443,380 times
> (not counting the initial rehydration just after typechecking to put it in
> the HPT). So I get 290s MUT time, 252s GC time. But, the max residency goes
> down to 490 MB, showing that the idea, at least in principle, has legs.
>
> So far so good. But then my problem starts -- how do I get this max
> residency improvement with acceptable runtime? My idea was that when
> typechecking a module, it should only unfold parts of ModDetails that are
> its transitive dependencies, so it should be enough to rehydrate only those
> ModDetails. Since this still results in 3,603,206 rehydrations, I shouldn't
> be too optimistic about its performance, but it should still cut the
> overhead in half. When I try this out, I get MUT time of 257s, GC time of
> 186s. However, the max residency is 883 MB! But how is it possible that max
> residency is not the same 490 MB?!?! Does that mean typechecking a module
> can unfold parts of ModDetails that are not transitive dependencies of it?
> How would I track this down?
>
> For reference, here is how I do the rehydration of the HPT, let me know if
> it seems fishy:
>
> ```
> recreateModDetailsInHpt :: HscEnv -> [ModuleName] -> IO ()
> recreateModDetailsInHpt hsc_env mods = do
>     hpt <- readIORef hptr
>     fixIO \hpt' -> do
>         writeIORef hptr hpt'
>         traverse recreate_hmi hpt
>     pure ()
>   where
>     hpt at HPT{ table = hptr } = hsc_HPT hsc_env
>
>     recreate_hmi hmi@(HomeModInfo iface _details linkable)
>         | moduleName mod `elem` mods
>         = do
>               !fresh_details <- genModDetails hsc_env iface
>               pure $ HomeModInfo iface fresh_details linkable
>
>         | otherwise
>         = pure hmi
>       where
>         mod = mi_module iface
> ```
>
> In summary, my questions going forward are:
>
> * How come rehydrating transitive dependencies doesn't help as much for
> max residency as rehydrating all already-loaded modules?
>
> * What exactly does GHC itself do to use this new mutable HPT feature to
> good effect? I'm sure it doesn't suffer from the above-described quadratic
> slowdown.
>
> Thanks for the tip on the other two memory usage improvement MRs -- I
> haven't had time yet to backport them. !12582 in particular seems like it
> will need quite a bit of work to be applied on 9.8.
>
> Unfortunately, I couldn't get eventlog2html to work -- if I pass an .hp
> file with the `-p` parameter, I get an HTML file that claims "This eventlog
> was generated without heap profiling.".
>
> Thanks,
>         Gergo
>
> From: Matthew Pickering <matthewtpickering at gmail.com>
> Sent: Thursday, January 23, 2025 5:51 PM
> To: Erdi, Gergo <Gergo.Erdi at sc.com>
> Cc: ÉRDI Gergő <gergo at erdi.hu>; Zubin Duggal <zubin at well-typed.com>;
> Montelatici, Raphael Laurent <Raphael.Montelatici at sc.com>; GHC Devs <
> ghc-devs at haskell.org>
> Subject: [External] Re: GHC memory usage when typechecking from source vs.
> loading ModIfaces
>
> That's good news.
>
> I don't think the first idea will do very much as there are other
> references to the final "HomeModInfo" not stored in the HPT.
>
> Have you constructed a time profile to determine why the runtime is
> higher? With the second approach you are certainly trading space usage for
> repeating work.
>
> If you actually do have a forest, then ideally you would replace the
> ModDetails after it will never be used again.
>
> You are likely also missing other patches important for memory usage.
>
> *
> https://urldefense.com/v3/__https://gitlab.haskell.org/ghc/ghc/-/merge_requests/12582__;!!ASp95G87aa5DoyK5mB3l!8j2-zkmKQghR93XL-RPF1V9V1kplxBgAdAb456h8PjDVH7dx9jPdv0xP7GyikMyzP3qbiZPYaJL0ytEl2nUOva2t$
> *
> https://urldefense.com/v3/__https://gitlab.haskell.org/ghc/ghc/-/merge_requests/12347__;!!ASp95G87aa5DoyK5mB3l!8j2-zkmKQghR93XL-RPF1V9V1kplxBgAdAb456h8PjDVH7dx9jPdv0xP7GyikMyzP3qbiZPYaJL0ytEl2kDCIO5S$
>
> I can't comment about the 17 HPT, what do the retainer stacks look like in
> ghc-debug?
>
> PS.  Please use eventlog2html so the profiles are readable! You can use it
> on .hp profiles.
>
> Cheers,
>
> Matt
>
>
>
>
>
>
>
> On Thu, Jan 23, 2025 at 3:19 AM Erdi, Gergo <mailto:Gergo.Erdi at sc.com>
> wrote:
> PUBLIC
>
> Hi Matt & Zubin,
>
> Thanks for the help on this so far!
>
> I managed to hack the linked MR onto 9.8.4 (see
> https://urldefense.com/v3/__https://gitlab.haskell.org/cactus/ghc/-/tree/cactus/backport-13675__;!!ASp95G87aa5DoyK5mB3l!8j2-zkmKQghR93XL-RPF1V9V1kplxBgAdAb456h8PjDVH7dx9jPdv0xP7GyikMyzP3qbiZPYaJL0ytEl2mon4aUz$)
> and basically it seems to do what it says on the tin on a small example
> (see attached heap profile examples for typechecking 4313 modules), but I
> am unsure how to actually use it.
>
> So my understanding of the improvement here is that since now there is
> only one single HPT [*], I should be able to avoid unnecessary ballooning
> by doing two things:
>
> • Evicting `HomeModInfo`s wholesale from the HPT that are not going to be
> needed anymore, because I am done with all modules that would transitively
> depend on them. This of course only makes sense when typechecking a forest.
> • Replacing remaining `HomeModInfo`s with new ones that contain the same
> ModInterface but the ModDetails is replaced with a fresh one from
> initModDetails.
>
> The attached `-after` profile shows typechecking with both of these ideas
> implemented. The first one doesn’t seem to help much on its own, but it’s
> tricky to evaluate that because it is very dependent on the shape of the
> workload (how tree-y it is). But the second one shows some serious promise
> in curtailing memory usage. However, it is also very slow – even on this
> small example, you can see its effect. On my full 35k+ module example, it
> more than doubles the runtime.
>
> What would be a good policy on when to replace ModDetails with thunks to
> avoid both the space leak and excessive rehydration churn?
>
> Also, perhaps unrelated, perhaps not – what’s with all those lists?!
>
> Thanks,
>             Gergo
>
> [*] BTW is it normal that I am still seeing several (17 in a small test
> case involving a couple hundred modules) HPT constructors in the heap? (I
> hacked it locally to be a datatype instead of a newtype just so I can see
> it in the heap). I expected to see only one.
>
> From: Matthew Pickering <mailto:matthewtpickering at gmail.com>
> Sent: Tuesday, January 21, 2025 8:24 PM
> To: ÉRDI Gergő <mailto:gergo at erdi.hu>
> Cc: Zubin Duggal <mailto:zubin at well-typed.com>; Erdi, Gergo <mailto:
> Gergo.Erdi at sc.com>; Montelatici, Raphael Laurent <mailto:
> Raphael.Montelatici at sc.com>; GHC Devs <mailto:ghc-devs at haskell.org>
> Subject: [External] Re: GHC memory usage when typechecking from source vs.
> loading ModIfaces
>
> Thanks Gergo, I think that unless we have access to your code base or a
> realistic example then the before vs after snapshot will not be so helpful.
> It's known that `ModDetails` will leak space like this.
>
> Let us know how it goes for you.
>
> Cheers,
>
> Matt
>
>
>
> On Fri, Jan 17, 2025 at 11:30 AM ÉRDI Gergő <mailto:gergo at erdi.hu> wrote:
> On Fri, 17 Jan 2025, Matthew Pickering wrote:
>
> > 1. As Zubin points out we have recently been concerned with improving
> the memory usage
> > of large module sessions (#25511, !13675, !13593)
> >
> > I imagine all these patches will greatly help the memory usage in your
> use case.
>
> I'll try these out and report back.
>
> > 2. You are absolutely right that ModDetails can get forced and is never
> reset.
> >
> > If you try !13675, it should be much more easily possible to reset the
> ModDetails by
> > writing into the IORef which stores each home package.
>
> Yes, that makes sense.
>
> > 3. If you share your example or perhaps even a trace from ghc-debug then
> I will be
> > happy to investigate further as it seems like a great test case for the
> work we have
> > recently been doing.
>
> Untangling just the parts that exercise the GHC API from all the other
> in-house bits will be quite a lot of work. But if just a ghc-debug
> snapshot of e.g. a small example from scratch  vs. from existing ModIfaces
> would be helpful (with e.g. the top HscEnv at the time of finishing all
> typechecking as a saved closure), I can provide that no prob.
>
> Thanks,
>         Gergo
> ________________________________________
> This email and any attachments are confidential and may also be
> privileged. If you are not the intended recipient, please delete all copies
> and notify the sender immediately. You may wish to refer to the
> incorporation details of Standard Chartered PLC, Standard Chartered Bank
> and their subsidiaries together with Standard Chartered Bank’s Privacy
> Policy via our public website.
> ________________________________________
> This email and any attachments are confidential and may also be
> privileged. If you are not the intended recipient, please delete all copies
> and notify the sender immediately. You may wish to refer to the
> incorporation details of Standard Chartered PLC, Standard Chartered Bank
> and their subsidiaries together with Standard Chartered Bank’s Privacy
> Policy via our main Standard Chartered PLC (UK) website at sc. com
>
> ----------------------------------------------------------------------
> This email and any attachments are confidential and may also be
> privileged. If you are not the intended recipient, please delete all copies
> and notify the sender immediately. You may wish to refer to the
> incorporation details of Standard Chartered PLC, Standard Chartered Bank
> and their subsidiaries together with Standard Chartered Bank’s Privacy
> Policy via our main Standard Chartered PLC (UK) website at sc. com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20250210/cec8fa00/attachment.html>


More information about the ghc-devs mailing list