Building on limited memory environments (or does -M really work?)

Wed Nov 8 18:23:49 UTC 2017

Note that there are two -j flags, one to `stack` and one to `ghc`.

I'm not completely sure what `stack`s -j does at the moment, but IIRC it
specifies how many *packages* are built in parallel (maybe running tests
is counted towards that job limit too). This means that `stack build
-j3` may (and most likely will) spawn 3 ghc building different packages,
if dependency graph permits. This means that `stack -j3
--ghc-options="+RTS -M1G"` may end up using around 3G of memory.

And as Ben already mentioned, GHC's -j affects "module
parallellisation", i.e. how many modules to build simultaneously.

So there are two `-j`s: stack's affects how many GHCs there are, and
GHC's only how many threads there are. First multiplies -M memory usage,
later doens't.

Another thing to consider, is how both affect performance. And that's
not an easy question to answer. To make situation even more interesting,
latest cabal's support per-component builds, so instead of "how many
parallel packages we build", it's "how many parallel components we
build". As component graph is more granular (e.g. tests and executables
are mostly independent leafs), there is more parallelization
opportunities. I remember reading a discussion mentioning to have `-j
N:M` for "N ghc's building M modules", so you can fine-tune that for
your multi package project. Also there is
https://github.com/haskell/cabal/issues/976 discusses how to make
multiple GHC to co-operate with shared module limit.

Also by digging a bit of Cabal's issue tracker, there are issues like
https://github.com/haskell/cabal/issues/1529. For example linker's
memory usage, and how -j affects that. It might affect you,
where there is memory hungry package building (GHC eats memory), and
also `stack` is linking some executable from other package (also memory
hungry).

As a final note: you can add per package ghc-options in `stack.yaml` [1]
(and cabal.project). At work, for example I have (in `cabal.project`)
few packages with

    package memory-hungry-package
        ghc-options: +RTS -M2G -RTS

where other packages are unrestricted. This wins us a bit in time, as
for most packages GHC's RTS doesn't need to worry about memory usage.

Cheers, Oleg

- [1]
https://docs.haskellstack.org/en/stable/yaml_configuration/#ghc-options

On 08.11.2017 17:27, Ben Gamari wrote:
> Saurabh Nanda <saurabhnanda at gmail.com> writes:
>
>>> Did you ever make any progress on this, Saurabh?
>>>
>> We made progress in some sense, by introducing a separate `stack build -j1`
>> step in our CI pipeline for compiling packages that are known to use a lot
>> of memory.
>>
>>
>>>  * -j just tells GHC to parallelise compilation across modules. This can
>>>     increase the maximum heap size needed by the compiler.
>>>
>>
>> From the docs, it wasn't very clear to me how -j interacts with -M when
>> both the options are passed to the GHC process. Is it the max heap size
>> across all build, or per build?
>>
> The short answer is that they don't interact: -j is a GHC flag whereas
> -M is an RTS flag. -M controls the amount of heap that the RTS allows
> the mutator (GHC in this case) to allocate. This includes all
> threads. GHC when run with -j is just like any other threaded Haskell
> program.
>
>>>  * -M is a bit tricky to define. For one, it defines the maximum heap
>>>    size beyond which we will terminate. However, we also use it in
>>>    garbage collector to make various decisions about GC scheduling. I'll
>>>    admit that I'm not terribly familiar with the details here.
>>>
>>> Note that -M does not guarantee that GHC will find a way to keep your
>>> program under the limit that you provide. It merely ensures that the
>>> program doesn't exceed the given size, aborting if necessary.
>>>
>>
>> Quoting from
>> https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/runtime_control.html#rts-flag--M
>> :
>>
>> *> The maximum heap size also affects other garbage collection parameters:
>> when the amount of live data in the heap exceeds a certain fraction of the
>> maximum heap size, compacting collection will be automatically enabled for
>> the oldest generation, and the -F parameter will be reduced in order to
>> avoid exceeding the maximum heap size.*
>>
>> It just makes it sound that the RTS is going to tweak the GC algo, and the
>> number of time GC is run, to avoid crossing the heap limit. However, I've
>> found the GHC process easily consuming more memory than what is specified
>> in the -M flag (as reported by top).
>>
> Yes, as I mentioned we do tweak some things in the GC; however, these
> tweaks are really a best-effort attempt to avoid going over the limit.
> It's entirely possible that your mutator will be terminated if it wants
> to use significantly more than the limit set with -M. There is
> relatively little else the RTS can do in this case that wouldn't require
> explicit cooperation from the mutator to keep working sets down.
>
> Cheers,
>
> - Ben
>
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20171108/6a3fe836/attachment.sig>