Why do we put GMP allocations on GHC heaps?

Austin Seipp austin at well-typed.com
Wed Oct 23 16:53:22 UTC 2013


Apologies for the quick double-take, minor amendments:

We actually want libtommath. Also it is not BSD3 - it is public domain
(it's also 100% portable ISO C and builds anywhere.)

On Wed, Oct 23, 2013 at 11:49 AM, Austin Seipp <austin at well-typed.com> wrote:
> First, I took some time a year or two ago to give a go at writing an
> integer-openssl implementation. I don't have the code anymore (it was
> more of a fun exercise,) sorry. :(
>
> However, the real problem with the whole integer situation IMO, is
> that the choice of integer implementation is completely non-modular,
> everything else aside. In a theoretically ideal world, it would be a
> link-time option, e.g.
>
> $ ghc foo.hs
> $ ghc -threaded foo.hs
>
> will relink `foo` with the -threaded runtime. Similarly we would like to say:
>
> $ ghc foo.hs
> $ ghc -integer-type=gmp foo.hs
> $ ghc -integer-type=openssl foo.hs
> $ ghc -integer-type=simple foo.hs
>
> and so on. This is besides all of the talk of "we need to tell the GC
> that a root is here, so it doesn't free things, etc." as in the case
> with MPFR.
>
> The current scheme is non-modular because it depends on a
> package-level interface which the compiler cannot hide. Furthermore,
> it depends on a specific build of GHC to choose, and various people
> end up needing this but do not want to recompile (for the standard -
> valid - typical deployment reasons: you want a stable version of your
> compiler, not a home-grown one.)
>
> Unfortunately the current situation with a package-level interface has
> various ramifications making this theory quite difficult in practice.
>
> A) Because it's an exposed-package, people actually do, sometimes,
> depend on it or the API in various ways. This means swapping
> implementations becomes more difficult (OTOH, such low-level-ness is a
> rare occurrence for most users, so perhaps this wouldn't hurt too many
> people.)
> B) The bigger problem related to A) is that because it's an
> exposed-package, GHC will inline the hell out of unfoldings for
> integer-gmp primitives at call sites. That means a compiled module may
> get an inlined copy of some integer-gmp primitive for example (a
> library could use Integer internally,) making re-linking later
> basically impossible.
> C) Probably other things I can't think of off the top of my head.
>
> In particular I don't know how to do this and reconcile A and B
> without destroying the efficiency of the GMP-related primitives, or
> highly restricting the Integer implemenation API and specializing it
> further in the compiler to handle stuff like this. Or something. The
> idea is that we inline all the unfoldings to minimize the overhead at
> call sites, but I'm not sure the best way to do this at the linker
> level - in particular we might need to stub things out per-way,
> because relinking requires the same symbols both ways, and we want an
> interface that does not cost us much. It doesn't cost too much for the
> RTS because the bits to relink are fairly minimal and don't hurt the
> fast path very much, AFAIK.
>
> If half of the problem is actually integration issues, can I ask why
> we don't just pick a small C library and offer that up? It is highly
> likely we will *never* beat GMP unfortunately. Beating it in some
> specific cases might work, but it has been ruthlessly optimized for
> years by many skilled people, and I highly doubt we can beat it on
> average without a very, very substantial amount of work. It would be
> interesting work - but I'm not convinced we can't have an acceptable
> solution in the mean time (and by all means, should we get something
> faster, it can just replace integer-simple.)
>
> As an example, libtomcrypt is damn fast, BSD3 licensed, very small,
> and well-respected. Why don't we:
>
> 1) Take libtomcrypt, modify the symbol names to be private to GHC
> (e.g. prefix everything with `ghc_integer_`) - this means the symbol
> names will be internal to GHC, so this also doesn't affect relinks
> against other copies of libtomcrypt (many projects include their own
> copy.) It is so small it basically comes under our control completely.
>
> 2) We can then offer people a integer-tomcrypt which does not have
> compatibility issues, and does not cause as much pain as the
> integration of widely deployed libraries like GMP/OpenSSL, which are
> in various use by many Haskell packages already.
>
> This should give us the flexibility of integer-simple without
> compromising so much on the efficiency aspect.
>
> This is also a very half-baked idea, but just a suggestion. I'd love a
> more modular solution to Integer as well, as I know many other people
> would.
>
>
>
> On Wed, Oct 23, 2013 at 11:08 AM, Bryan O'Sullivan <bos at serpentine.com> wrote:
>>
>> On Wed, Oct 23, 2013 at 4:31 AM, Gergely Risko <gergely at risko.hu> wrote:
>>>
>>> I can understand that this may be slower in CPU, but can you please
>>> elaborate why would it be worse in memory, how the frees wouldn't happen
>>> in a "timely manner"?  I thought finalisers are called when the
>>> referencee is GCd, so if we free the mpz in the callback, then where are
>>> we going wrong?
>>
>>
>> There is no guarantee that finalizers will be called at all, much less that
>> they will be called in a timely manner. This is a general and well-known
>> property of all garbage collectors, not something unique to GHC.
>>
>> _______________________________________________
>> ghc-devs mailing list
>> ghc-devs at haskell.org
>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>
>
>
>
> --
> Regards,
>
> Austin Seipp, Haskell Consultant
> Well-Typed LLP, http://www.well-typed.com/



-- 
Regards,

Austin Seipp, Haskell Consultant
Well-Typed LLP, http://www.well-typed.com/


More information about the ghc-devs mailing list