GMP unboxed number representation

Fri, 4 May 2001 11:32:36 +0200

At 22:23 +0200 2001/05/03, Marcin 'Qrczak' Kowalczyk wrote:
>> I am chatting with the GMP (GNU Multi-Precision number library) people
>> about "unboxed" (no pointer) representations.
>
>Do you mean storing smaller values directly in the structure?

Yes, in one form or another. Let's focus on integers. Then GMP currently uses
  typedef struct {
    int _mp_alloc;
    int _mp_size;
    mp_limb_t *_mp_d;
  } __mpz_struct;
where _mp_d is the pointer to the dynamically allocated number (mp_limb_t
is an integral type of suitable size for the platform).

My idea is to somehow augment it, so that when numbers are small, they do
not use dynamical allocation. Say
  typedef struct {
    union {
      mp_int     _mp_n;
      mp_limb_t* _mp_d;
    };
    int unboxed;
} __mpz_struct;
or some variation of it.

One problems though is the GC (Garbage Collector): My immediate concern is
to write a C++ wrap, in which case the return of an integer is slow, as it
will invoke the copy constructor, causing a dynamic allocation. One way
around it is to use a reference count.

Then, in order to avoid a dynamic allocation for the ref count as well, one
wants the object pointed to by _mp_d above contain all data, say
  typedef struct {
   union {
      mp_int     _mp_n;
      mp_limb_t* _mp_all;	 /* Pointer to all.  */
    };
    int unboxed;
  } __mpz_struct;

  #define _mpz_alloc  _mp_all[0]
  #define _mpz_count  _mp_all[1]
  #define _mpz_size   _mp_all[2]
  #define _mpz_d      _mp_all + 3

But, after all, a ref count is just one primitive form of GC, used in C++
because it is easy to automate, and because it is difficult to implement a
more advanced form of GC.

So here comes the question in, if now GMP should serve say the
implementation of compilers such as GHC, what kind of low number
representations should one use?

One advantage of having it in GMP is that it supports low-level assembler code.

>GHC does that optimization itself. Its Integer representation is thus:
>
>data Integer
>   = S# Int#            -- small integers
>   | J# Int# ByteArray# -- large integers
>
>It keeps integers in the small variant where possible, switching to
>use the gmp variant on overflow.
>
>It's essential that these two variants are visible to ghc, so it
>can often generate appropriate dispatching code instead of physical
>allocation of these structures.

So this suggest that it might be a disadvantage for GHC to have a GMP
mergeing the two number representations. Or is it so that the dispatch code
can only be generated if the input data is sufficiently static, in which it
would be great advantage with it in GMP in the case the data is dynamic?

>The Int# in the J# variant corresponds to _mp_size in __mpz_struct, and
>the ByteArray# holds the pointer to Haskell heap block which contains:
>* a header pointer used by the GC,
>* _mp_alloc,
>* all the libs pointed to by _mp_d.
>
>Changing the representation in gmp would require massive rewriting
>in ghc. I don't know what improvement could be made by the way,
>so let me just describe what is currently going on.

You shouldn't worry that anything going on in GMP would not respect upwards
compatibility. It seems that this type of GMP discussions have taken place
from time to time; I am only the only bringing it up now.

The GMP developers have so far judged that the effort does not outweigh the
advantage of having a type which is faster for small numbers. Do you think
that the speed-up for smaller numbers would be significant for such a
rewrite?

>Code for primitive versions of (+)::Integer->Integer->Integer etc. has
>a comment:
>    /* ToDo: this is shockingly inefficient */

Yes, I think so to. This is one reason for moving it into GMP, because on
an assembler level, one can do more efficient overflow checks, etc.

>It creates MP_INT variables, fills them from arguments of Haskell's
>J# objects (passed separately):
>    arg1._mp_alloc        = d1->words;
>    arg1._mp_size         = (s1);
>    arg1._mp_d            = (unsigned long int *) (BYTE_ARR_CTS(d1));
>    arg2._mp_alloc        = d2->words;
>    arg2._mp_size         = (s2);
>    arg2._mp_d            = (unsigned long int *) (BYTE_ARR_CTS(d2));
>calls the gmp function, and returns the _mp_size of the result together
>with its _mp_d-sizeof(two words) on the Haskell stack.
>
>ghc calls mp_set_memory_functions on startup to let gmp allocate
>ByteArray#s with appropriate header on the Haskell heap.
>
>ghc's Int# has always the same size as a pointer, either 32 or 64 bits.
>The runtime assumes that limbs have this size too, e.g. _mp_alloc here
>is just the number of words in the ByteArray#.
>
>There is no provision for turning the J# representation back into S#
>if it gets small enough. (If there was, it would not be enough to
>check abs(_mp_size)<=1, because mpz with abs(_mp_size)<=1 is able to
>represent one bit more than a single Int#, because the sign bit is
>stored in the _mp_size's sign.)
>
>I think it's essential to not make the S# case slower, but the
>representation of larger numbers could be changed if only somebody
>did the huge task of rewriting everything (with #ifdefs for older
>gmp). I'm not sure how to integrate gmp's view of optimizing small
>integers with ghc's view.

My guess is that the S# representation should be used for situations where
small static size can be detected. Even overflow detection slows down these
primitive a lot (or so I am told). Then when the static detection cannot be
used, one would have to work with the J# representation.

  Hans Aberg