Issues with the large address space allocator and HPC systems with resource limits

Thu Jul 5 12:35:44 UTC 2018

Thanks for your feedback.

> > I am the lead developer of NGLess
> > (https://github.com/ngless-toolkit/ngless a bioinformatics tool,
> > written in Haskell). Several users have complained about not being
> > able to easily use NGLess in an academic cluster environment due to
> > the fact that it allocates 1TB of address space (e.g.,
> > https://groups.google.com/forum/#!topic/ngless/9su2E0EdeCc and I have
> > also gotten several private emails on this issue).
> >
> > In particular, many systems are set up with a limit on the address
> > space so that if the job allocates more than the given limit, it is
> > immediately killed.
> [snip]
> Are these address space limits advertised via getrlimit(2)? If so, have
> you tried GHC 8.6.1-alpha1? While fixing #14492 I taught GHC to respect
> rlimits when allocating its heap, so this might work now.

Thanks for fixing #14492 (that was us reporting it too, btw)!

Some well configured systems do advertise the limits correctly and things work.

For unrelated reasons, I have access to an AWS virtual private cloud, which was setup using SGE with default settings (and I think it should be fairly up-to-date). I took the opportunity to test and indeed NGLess runs without a glitch even pre-#14492 fix (just that memory is somewhat wasted).

However, it seems that some users are running on misconfigured systems so that NGLess (and Haskell) end up getting blamed for the situation (furthermore, not all sysadmins are as helpful as they could be).

> Indeed you will take a bit of performance hit by using the one-step
> allocator since the check of whether an object resides in the heap
> (which is very hot during GC) is a fair bit more complex.

I expected that. As I said, for now, I prefer that trade-off: we are already orders of magnitude faster than our competition, so I can afford a slowdown to support all these old-school HPC systems (which, for better or for worse, are still used by many of our target users).

> As far as I know the choice of allocator has no effect on code
> generation so in principle it should be possible to link the same code
> against either RTS. However, the build system looks to be built around
> the assumption that the choice is made at configure-time.
> 
> I'm sure this could be fixed, but it's not immediately obvious how. I
> suppose you could make the two allocators different RTS ways (e.g. like
> the distinction between event-logged, debugging, profiled and vanilla
> RTSs), but that would double the already large number of ways.

I see.

This becomes a distribution issue then. I currently distribute the binaries through bioconda, and stack is used for compilation. I will try to see if I can convince conda/stack to recompile ghc with the right flags for me.

Thank you,
Luis