GMP unboxed number representation

Hans Aberg haberg@matematik.su.se
Mon, 7 May 2001 19:45:59 +0200


At 18:45 +0200 2001/05/07, Marcin 'Qrczak' Kowalczyk wrote:
>> GMP has a very interesting multi-precision floating number type as
>> well, but it has the same problem as the integers: It does not use
>> the native double's for small float, so it probably becomes slow.
>
>I think that it's easier to check machine-size int for overflow
>than to check double for overflow or loss of precision, so it's
>impractical to use native double and keep predictable precision.

It is easier, right.

But for floats one can make in most cases an overflow estimate that will
take care of most cases. For the rest of the cases, one can simply carry
out the computations and check if it resulted in a NaN: If the CPU has a
FPU, floats compute quickly, so this will probably be much faster than
using multiprecision anyway.

>> Perhaps GMP should provide small number arithmetic, with a fast
>> way to determine if the answer fits in a small number representation.
>
>Indeed. Ghc has those primops, used for Integer arithmetic, with
>tricky implementations in the case the code is generated via C:
...
[Thanks for code snippet.]
..
>> So this, in the end, suggests that one perhaps should get a better GMP
>> interface for perhaps both small and large number representations. But it
>> should then be so that GHC could use that interface, rather than abusing
>> its internals.
>
>Yes, except that I'm not sure how much harder for ghc would be to
>use a GMP's interface than to do it itself.

I arrived at a suggestion for GMP:
  typedef struct  {
    _mp_int _mp_s;     /* Small number representation. */
    mp_limb_t *_mp_d;  /* Pointer to the limbs and all other data.  */
  } __mpz_struct;
where simply _mp_d is set to NULL if one is using the small number
representation.

(Strictly speaking, this might a new type with a different name in order to
keep GMP upwards compatibility; let's skip that aspect here.)

I find it interesting for several reasons:

First, the small number format is wholly untampered with, so all one is
down to is overflow checks. If a CPU would support instructions for
  +, -, *: 1-register x 1-register -> 2-register
then the small number representations could be made very fast. (But I am
told that many CPU's do not support that.)

But the conversion back and forth from native integral types to GMP types
are also easy:

For example, the signed integral type one merely puts into the _mp_s field
and sets _mp_d to NULL. The unsigned integral type one checks the most
significant bit; if it is 0, one converts it as a signed, if it is 1, one
sets it to the one limb format.

GMP would only need to define the basic arithmetic operations involving
__mpz_struct; the ones involving native integral types could be macroed on
top of them.

Can you tell me the pros and cons relative GHC of this suggestion?

>BTW, it could be nice to have a better way for writing large integer
>literals. Ghc used to convert them from a decimal string, and now it
>builds them from pieces by * and + in base 2^31-1 or 2^63-1. Both
>approaches are a bit ugly. But probably large integer literals are
>not that common for it to matter much.

I am not sure what you mean here; in what context should these integer
literals be entered?


  Hans Aberg