FFI Bindings to Libraries using GMP

Sun Sep 30 23:43:46 EDT 2007

Hello Benedikt,

I apologise for the late reply.  I am travelling tomorrow but I will  
try to get something an alpha implementation out by this Wednesday.   
For now here are some preliminary answers:

On Sep 28, 2007, at 7:41 AM, Benedikt Huber wrote:

> Am 18.09.2007 um 05:49 schrieb Peter Tanski:
>> The best solution would be to revamp the way Integer types are  
>> implemented, so when possible they are mutable under the hood,  
>> much like using the binary += instead of the ternary +.   
>> Enumerations like the test in [1], below, would not be mutable  
>> unless there were some information such as a good consumer  
>> function that indicated the intermediate values were only  
>> temporarily necessary.
> I'm not sure if I understand this correctly;
> Do you want to expose an unsafe/IO interface for destructive  
> Integer manipulation?

I would not expose it, just optimise it, in the same way as we can  
replace recursion with loops at the Cmm level.  The end result would  
involve re-cycling integer memory so you might say that in some  
equations integers are mutable.  (If it is provable that an integer  
value would not be used again, then it does not seem right not to  
recycle the memory.)

>> The OpenSSL library is not GPL compatible, so there would be  
>> licensing problems for GPL'd system distributions; it is also  
>> relatively slow, though it does have a noticeably constant curve  
>> for exponential functions.
> Maybe you should add a note to
> http://hackage.haskell.org/trac/ghc/wiki/ReplacingGMPNotes/ 
> PerformanceMeasurements.
> The statistics suggest that the OpenSSL BN has comparable  
> performance to the GMP, especially for smaller numbers.
>
> Some note about the (very confusing) licensing issues regarding  
> OpenSSL would also be nice.

I will add this to the wiki.  In short, paragraph 10 of the GPL and  
paragraph 11 of the LGPL--here I may have the paragraphs wrong-- 
prohibit placing any additional restrictions on your licensees.   
OpenSSL places an additional restriction on licensees: you cannot use  
the name 'OpenSSL' with regard to your product, so the OpenSSL  
license is incompatible with the GPL/LGPL.

>>> [1] Simple Performance Test on (ghc-darwin-i386-6.6.1):
>> Malloc is fast but not nearly as fast as the RTS alloc functions;  
>> one thing I have not started is integrating the replacement  
>> library with GHC, mostly because the replacement library (on par  
>> or faster than GMP) uses SIMD functions whenever possible and they  
>> require proper alignment.
>
> Ok, it's good to know you're already working on integrating a  
> (native) replacement library.

It's workable for now but I need to finish Toom3, a basic FFT, and  
some specialised division operations.  I also need to give Thorkil  
Naur a crack at it.  All of this has been on hold because I have been  
too selfish and perfectionistic to give anyone what I consider a mess  
and I have been working too many hours to fix it.  (This seems to be  
a common problem of mine; I intend to change that.)

>>> I also performed the test  with the datatype suggested by John  
>>> Meacham  (using a gmp library with renamed symbols),
>>> > data FInteger = FInteger Int# (!ForeignPtr Mpz)
>>> but it was around 8x slower, maybe due to the ForeignPtr and FFI  
>>> overhead, or due to missing optimizations in the code.
>>
>> That is quite an interesting result.  Are these "safe" foreign  
>> imports?
> No. Note that `FInteger' above is even faster than the build-in  
> Integer type for small integers (Ints), so I was talking about  
> allocation of gmp integers. I elaborated the test a little, it now  
> shows consistent results I think [1a]; a lot of performance is lost  
> when doing many allocations using malloc, and even more invoking  
> ForeignPtr finalizers.

I found the same thing when I tried that; malloc is slow compared to  
GC-based alloc.  The ForeignPtr finalizers do not always run since-- 
as far as I know--they are only guaranteed to run before RTS shutdown.

> I'm still interested in sensible solutions to Bug #311, and maybe  
> nevertheless simple switching to standard gmp allocation (either   
> with finalizers or copying limbs when entering/leaving the gmp) via  
> a compile flag would be the right thing for many applications.
> I'm also looking forward to see the results of the replacement  
> library you're trying to integrate, and those of haskell Integer
> implementations.

The fastest interim solution I can come up with for you would be to  
use Isaac Dupree's Haskell-based integer library and set up  
preprocessor defines so you could build ghc (HEAD) from source and  
use that.  Would that be sufficient for now?

Cheers,
Pete

> [1a] Integer Allocation Test
> > allocTest :: Int -> `some Integral Type T'
> > allocTest iterations = (iterateT iterations INIT) where
> >   iterateT 0 v = v
> >   iterateT k v = v `seq` iterateT (k-1) (v+STEP)
>
> - Small BigNums Allocation Test (INIT = 2^31, STEP = 10^5, k=10^6)
> Results (utime samples.sort[3..7].average) on darwin-i386 (dualcore):
>     0.04s destructive-update C implementation
>     0.19s  with T = Integer
>     0.71s  non-destructive-update C implementation using malloc
>     with T = FInteger {-# UNPACK -#} !Int# {-# UNPACK -#} ! 
> (ForeignPtr Mpz)
> 	0.90s  with newForeignPtr_ (no finalizer, space leak)
>     	1.87s with Foreign.Concurrent.newForeignPtr mpz do 
> { hs_mpz_clear mpz; free mpz}
>     	1.94s  with newForeignPtr free_mpz_c_impl mpz
>
> - Small Integers Allocation Test (INIT=0,STEP=4,k=2*10^8)
> Results (utime samples.sort[3..7].average) on darwin-i386 (dualcore):
>    0.67s for Int
>    2.54s for FInteger
>    3.62s for Integer