Issues with the large address space allocator and HPC systems with resource limits
Luis Pedro Coelho
luispedro at big-data-biology.org
Tue Jul 3 20:29:34 UTC 2018
Dear GHC devs,
I hope this is the right forum to bring this up.
I am the lead developer of NGLess (https://github.com/ngless-toolkit/ngless a bioinformatics tool, written in Haskell). Several users have complained about not being able to easily use NGLess in an academic cluster environment due to the fact that it allocates 1TB of address space (e.g., https://groups.google.com/forum/#!topic/ngless/9su2E0EdeCc and I have also gotten several private emails on this issue).
In particular, many systems are set up with a limit on the address space so that if the job allocates more than the given limit, it is immediately killed.
This appears to be the default way to set up SGE, the most widely used batch system. Users are dependent on their sysadmins and lack the permissions to change these settings easily (and may not always be cognizant of the difference between "allocating address space" and "allocating memory"). Using ulimit seem to make the issue disappear on most, but not all, user setups.
I have now built NGLess with a version of GHC that was compiled without the large address allocator (using ./configure --disable-large-address-space). At least locally, this seems to run correctly and solve the issue.
I assume that there are performance or other reasons to use the large address space allocator as the default, but, right now, for the problem space I am working in, disabling it seems to be a better trade-off. In principle, the RTS that is used for GHC and the one that is used for the programme being linked do not need to be the same. Is there any possibility of making this choice when a programme is linked and not when GHC is compiled?
Thank you for all your effort!
Luis Pedro Coelho | Fudan University | http://luispedro.org
PI of Big Data Biology Lab at Fudan University (start mid-2018)
More information about the ghc-devs