Issues with the large address space allocator and HPC systems with resource limits

Yitzchak Gale gale at sefer.org
Wed Jul 4 12:06:57 UTC 2018


Pre-allocation of large memory also causes trouble on Windows
Subsystem for Linux. In that case also, compiling with
--disable-large-address-space solves the problem.

If your OS happens to be Ubuntu, there is an unofficial PPA for GHC
which includes versions of GHC and cabal-install that were compiled
without the pre-allocation, and which compile executables that do not
do the pre-allocation.

The PPA is here:

https://launchpad.net/~hvr/+archive/ubuntu/ghc-wsl

Yitz

On Tue, Jul 3, 2018 at 11:29 PM, Luis Pedro Coelho
<luispedro at big-data-biology.org> wrote:
> Dear GHC devs,
>
> I hope this is the right forum to bring this up.
>
> I am the lead developer of NGLess (https://github.com/ngless-toolkit/ngless a bioinformatics tool, written in Haskell). Several users have complained about not being able to easily use NGLess in an academic cluster environment due to the fact that it allocates 1TB of address space (e.g., https://groups.google.com/forum/#!topic/ngless/9su2E0EdeCc and I have also gotten several  private emails on this issue).
>
> In particular, many systems are set up with a limit on the address space so that if the job allocates more than the given limit, it is immediately killed.
>
> This appears to be the default way to set up SGE, the most widely used batch system. Users are dependent on their sysadmins and lack the permissions to change these settings easily (and may not always be cognizant of the difference between "allocating address space" and "allocating memory"). Using ulimit seem to make the issue disappear on most, but not all, user setups.
>
> I have now built NGLess with a version of GHC that was compiled without the large address allocator (using ./configure --disable-large-address-space). At least locally, this seems to run correctly and solve the issue.
>
> I assume that there are performance or other reasons to use the large address space allocator as the default, but, right now, for the problem space I am working in, disabling it seems to be a better trade-off. In principle, the RTS that is used for GHC and the one that is used for the programme being linked do not need to be the same. Is there any possibility of making this choice when a programme is linked and not when GHC is compiled?
>
> Thank you for all your effort!
>
> Luis
>
> --
> Luis Pedro Coelho | Fudan University | http://luispedro.org
>
> PI of Big Data Biology Lab at Fudan University (start mid-2018)
> http://big-data-biology.org
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


More information about the ghc-devs mailing list