GMP unboxed number representation

Marcin 'Qrczak' Kowalczyk qrczak@knm.org.pl
Thu, 3 May 2001 22:23:03 +0200


[Wiadomość wysłana również na grupę dyskusyjną.]

Thu, 3 May 2001 20:48:11 +0200, Hans Aberg <haberg@matematik.su.se> pisze:

> I am chatting with the GMP (GNU Multi-Precision number library) people
> about "unboxed" (no pointer) representations.

Do you mean storing smaller values directly in the structure?

> What do you experts on such implementations think would be a suitable
> low-level GMP implementation; that is, that would make the use in
> say GHC easier?

GHC does that optimization itself. Its Integer representation is thus:

data Integer
   = S# Int#            -- small integers
   | J# Int# ByteArray# -- large integers

It keeps integers in the small variant where possible, switching to
use the gmp variant on overflow.

It's essential that these two variants are visible to ghc, so it
can often generate appropriate dispatching code instead of physical
allocation of these structures.

The Int# in the J# variant corresponds to _mp_size in __mpz_struct, and
the ByteArray# holds the pointer to Haskell heap block which contains:
* a header pointer used by the GC,
* _mp_alloc,
* all the libs pointed to by _mp_d.

Changing the representation in gmp would require massive rewriting
in ghc. I don't know what improvement could be made by the way,
so let me just describe what is currently going on.

Code for primitive versions of (+)::Integer->Integer->Integer etc. has
a comment:
    /* ToDo: this is shockingly inefficient */

It creates MP_INT variables, fills them from arguments of Haskell's
J# objects (passed separately):
    arg1._mp_alloc        = d1->words;
    arg1._mp_size         = (s1);
    arg1._mp_d            = (unsigned long int *) (BYTE_ARR_CTS(d1));
    arg2._mp_alloc        = d2->words;
    arg2._mp_size         = (s2);
    arg2._mp_d            = (unsigned long int *) (BYTE_ARR_CTS(d2));
calls the gmp function, and returns the _mp_size of the result together
with its _mp_d-sizeof(two words) on the Haskell stack.

ghc calls mp_set_memory_functions on startup to let gmp allocate
ByteArray#s with appropriate header on the Haskell heap.

ghc's Int# has always the same size as a pointer, either 32 or 64 bits.
The runtime assumes that limbs have this size too, e.g. _mp_alloc here
is just the number of words in the ByteArray#.

There is no provision for turning the J# representation back into S#
if it gets small enough. (If there was, it would not be enough to
check abs(_mp_size)<=1, because mpz with abs(_mp_size)<=1 is able to
represent one bit more than a single Int#, because the sign bit is
stored in the _mp_size's sign.)

I think it's essential to not make the S# case slower, but the
representation of larger numbers could be changed if only somebody
did the huge task of rewriting everything (with #ifdefs for older
gmp). I'm not sure how to integrate gmp's view of optimizing small
integers with ghc's view.

-- 
 __("<  Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
 \__/
  ^^                      SYGNATURA ZASTĘPCZA
QRCZAK