FFI Bindings to Libraries using GMP
Peter Tanski
p.tanski at gmail.com
Mon Sep 17 23:49:02 EDT 2007
On Sep 14, 2007, at 9:14 AM, Benedikt Huber wrote:
>> | I've been struggling using FFI bindings to libraries which rely
>> on the
>> | GNU Mp Bignum library (gmp).
>> It's an issue that bites very few users, but it bites them hard.
>> It's also tricky, but not impossible, to fix. The combination
>> keeps meaning that at GHC HQ we work on things that affect more
>> people. I doubt we can spare effort to design and implement a fix
>> in the near future -- we keep hoping someone else step up and
>> tackle it!
>>
>> Peter Tanski did exactly that (he's the author of the
>> ReplacingGMPNotes above), but he's been very quiet recently. I
>> don't know where he is up to. Perhaps someone else would like to
>> join in?
I apologise for being away. The company I work for has been ramping
up for a launch and I have been working very long hours (nights and
weekends, too).
> Thank you for the information - I'm also willing to help, though
> I'm not too familiar with the GHC internals (yet).
> I do like the idea of optionally linking with a pure-haskell
> library, but I'm interested in a solution with comparable
> performance. Commenting solutions to ticket #311:
It goes beyond mere familiarity with the internals: the GMP functions
are threaded throughout the RTS and the PrimOps files. Of all the
primitive operations, they are the most ubiquitous for interfering in
other code. The rough list I put on the ReplacingGMP page is a start
but the more I work with the RTS the more little things keep turning
up. At the least I should update the page.
> (2) Using the standard allocation functions for the gmp memory
> managment (maybe as compile flag) as suggested in http://
> www.haskell.org/pipermail/glasgow-haskell-users/2006-July/
> 010660.html would also resolve ticket #311.
> In this case at least the dynamic part of gmp integers has to be
> resized using external allocation functions, and a finalizer
> (mpz_clear) has to be called when an Integer is garbage collected.
> It seems that the performance loss by using malloc is significant
> [1], as lots of allocations and reallocations of very small chunks
> occur in a functional setting; some kind of (non garbage
> collected !) memory pool allocation would certainly help. I'm not
> sure what overhead is associated with calling a finalizer ?
The problem of lots of small allocations affects the garbage
collector, as well. In the current implementation, each GMP
operation calls doYouWantToGC()--I'm sure you have seen the note in
PrimOps.cmm, for example--which may act as a stop-world garbage
collection. The byte arrays for GMP are also pinned. Compared to
this, a FFI implementation using finalizers, which have horrible but
practical guarantees on when they are called, would work much
better. The best solution would be to revamp the way Integer types
are implemented, so when possible they are mutable under the hood,
much like using the binary += instead of the ternary +. Enumerations
like the test in [1], below, would not be mutable unless there were
some information such as a good consumer function that indicated the
intermediate values were only temporarily necessary.
> (3) So when replacing GMP with the BN library of OpenSSL (see
> http://hackage.haskell.org/trac/ghc/wiki/ReplacingGMPNotes/
> PerformanceMeasurements), it would propably be neccessary to
> refactor the library, so custom allocation can be used as well.
> This does not seem too difficult at a first glance though.
The OpenSSL library is not GPL compatible, so there would be
licensing problems for GPL'd system distributions; it is also
relatively slow, though it does have a noticeably constant curve for
exponential functions. The one problem you will find with _all_
potential replacement libraries is incompatible behaviour for bitwise
functions: they are implemented arithmetically in GMP but logically
elsewhere (when they are implemented at all). (Note: if you are
looking for the left-shift and right-shift operations in GMP, they
are hidden in mpz_mul_2exp and mpz_t_div_q_2exp.) LibTomMath, for
example uses pure logical shifts which do not produce correct
results. I could go on about many other small differences but the
end result is that you would have to do a lot of hacking to get a
library that would replace all the functionality GMP provides. That
is why I started a replacement from scratch.
> So I'd like to investigate the second or third option, as far as my
> knowledge and time permits it.
> Of course it would be wise to check first if Peter Tanski is
> already/still working on a GMP replacement.
I left off working on it for some time, but things may slow down a
little for now so I will (hopefully) have time to package it up. I
meant to do that more than a month ago for Thorkil, who has written a
multi-precision integer library before and wanted to help.
> [1]
> Simple Performance Test on (ghc-darwin-i386-6.6.1):
>
> The haskell function (k was taken as 10M)
> > test k = (iterateT k (fromIntegral (maxBound ::Int))) :: Integer
> where
> > iterateT 0 v = v; iterateT k v = v `seq` iterateT (k-1) (v+10000)
> triggers around k allocations and k reallocations by the gmp library.
>
> The rough C equivalent, calling sequences of
> > malloc(3), mpz_init_set(gmp), mpz_add_ui(gmp), mpz_clear(gmp) and
> free(3),
> takes more than 2 times as long, with 25% of the time spend in
> allocating and freeing pointers to gmp integers (mpz_ptr) and 50%
> of the time spend in gmp allocator functions (i.e. resizing gmp
> integers = (re)allocating limbs).
Malloc is fast but not nearly as fast as the RTS alloc functions; one
thing I have not started is integrating the replacement library with
GHC, mostly because the replacement library (on par or faster than
GMP) uses SIMD functions whenever possible and they require proper
alignment.
> I also performed the test with the datatype suggested by John
> Meacham (using a gmp library with renamed symbols),
> > data FInteger = FInteger Int# (!ForeignPtr Mpz)
> but it was around 8x slower, maybe due to the ForeignPtr and FFI
> overhead, or due to missing optimizations in the code.
That is quite an interesting result. Are these "safe" foreign imports?
Cheers,
Pete
More information about the Glasgow-haskell-users
mailing list