GMP unboxed number representation

Marcin 'Qrczak' Kowalczyk qrczak@knm.org.pl
Fri, 4 May 2001 19:06:53 +0200


On Fri, May 04, 2001 at 11:32:36AM +0200, Hans Aberg wrote:

> Let's focus on integers.

ghc doesn't use other gmp types.

> One problems though is the GC (Garbage Collector):

Indeed. ghc solves it by using gmp format only temporarily to perform
operations, and keeping numbers in its own structures.

> Then, in order to avoid a dynamic allocation for the ref count
> as well, one wants the object pointed to by _mp_d above contain
> all data,

Attaching data to _mp_d can be done without changing gmp, but also
without the ability to mix other library using gmp in the same program:
by providing appropriate allocation functions, which allocate more,
fill the header and return a shifted pointer to gmp. This is what
ghc does.

I think you would have to be careful to not use gmp functions which
change numbers in place. In Haskell there is no problem because the
interface is functional anyway, but a C++ user might want to have
++z performed in place. When the number can be shared, it doesn't
work. So either do it functionally or copy by value. In either case
sometimes you lose. Leaving gmp memory management to the programmer
would make using it awful.

> So here comes the question in, if now GMP should serve say the
> implementation of compilers such as GHC, what kind of low number
> representations should one use?

I don't know how to do it better than currently: that it lets the
program/library which uses gmp manage gmp's memory.

> So this suggest that it might be a disadvantage for GHC to have a
> GMP mergeing the two number representations.

It would be an unnecessary complication, but manageable. Since ghc
creates mpz objects for each operation, it could use only the big
representation of gmp on inputs, handling small numbers itself as
currently. But it must be prepared to receive small result. It would
be silly to wrap it in an allocated memory if gmp produced a small
answer, so primops (implemented in C or assembler) should return
either representation.

It would be a bit ugly. Ghc's primops use a restriced set of types
and rarely allocate memory themselves. For example plusInteger#
has the type
    Int# -> ByteArr# -> Int# -> ByteArr# -> (# Int#, ByteArr# #)
where (# x, y #) means to return both values on the stack.
It is wrapped for integer addition thus:

plusInteger :: Inteter -> Integer -> Integer
plusInteger i1@(S# i) i2@(S# j)  = case addIntC# i j of { (# r, c #) ->
                                   if c ==# 0# then S# r
                                   else toBig i1 + toBig i2 }
plusInteger i1@(J# _ _) i2@(S# _) = i1 + toBig i2
plusInteger i1@(S# _) i2@(J# _ _) = toBig i1 + i2
plusInteger (J# s1 d1) (J# s2 d2) = case plusInteger# s1 d1 s2 d2 of
    (# s, d #) -> J# s d

Returning one of two variants from a primop requires expressing it
similarly to a C struct, without type punning. So it could be this:
    plusInteger# :: Int# -> ByteArr#
                 -> Int# -> ByteArr#
                 -> (# Int#, Int#, ByteArr# #)
plusInteger (J# s1 d1) (J# s2 d2) = case plusInteger# s1 d1 s2 d2 of
    (# 0#, i, _ #) -> S# i
    (# _,  s, d #) -> J# s d

The third component in the S# case is ignored, but the primop must
pass something - a pointer to a dummy Haskell object (this technique
is used in some other primops).

This assumes that the condition for having an alternative
representation is the same in ghc and gmp: fitting into a single
signed integer of the limb size (i.e. pointer size).

Note that the J# representation currently relies on the fact that
mpz can be reconstructed from an integer + an array of words of an
explicit size. It's not possible to change the representation of
mpz and keep the J# representation in ghc.

> Or is it so that the dispatch code can only be generated if the
> input data is sufficiently static, in which it would be great
> advantage with it in GMP in the case the data is dynamic?

It's dynamic too. An advantage of making the variants visible to
the compiler is that a function returning an Integer is compiled to
a code which takes an address of two continuations as an argument,
and enters the continuation corresponding to the constructor returned,
passing it constructor arguments as arguments. It doesn't necessarily
allocate S# or J# node on the heap if it's to be evaluated right
away. This happens to all algebraic types in general (perhaps there
are size constraints, I don't know). So I guess that dispatching is
best done by using an algebraic type.

But I'm not sure. The ghc documentation says that ghc loves
single-constructor types. It was written a long time ago, but perhaps
it's still true. So how to distinguish representations in another
way? Well, perhaps
    data Integer = J# Int# Int# ByteArray#
with a dummy object for the ByteArray# in the short case would work.
I don't know how well: it's ugly but maybe fast, and it seems easier
to integrate with a variant representation in gmp.

In any case better judging of performance of ghc for various
representations should be done by somebody with more knowledge than me.
I don't know what are exact the overheads in various cases.

> You shouldn't worry that anything going on in GMP would not respect
> upwards compatibility.

ghc abuses gmp by using its internals directly. When the representation
of mpz changes, ghc must be changed, even if the C interface remains
compatible.

Even renaming some gmp functions and providing old names as macros
(or something like this) in gmp3 caused compatibility problem for ghc,
and ghc-4.08 requires gmp2, because assembler code calls C functions
directly and cpp couldn't translate the names.

> The GMP developers have so far judged that the effort does not
> outweigh the advantage of having a type which is faster for small
> numbers. Do you think that the speed-up for smaller numbers would
> be significant for such a rewrite?

It is significant. I don't know any numbers, but ghc's handling
Integers was told to get a big speedup after introducing the separate
representation for small integers.

But this might be partially because calling gmp functions has a lot of
indirections, construction of mpz objects etc., so the speedup could
result from avoiding calling gmp at all. Allocation in ghc is fast.

Perhaps in your C++ wrappers you can distinguish the variant above
gmp, and use gmp at all only for large numbers? C++ doesn't have
such indirections, you would surely keep original mpz objects, but
it might be easier than changing gmp.

Note that I'm not against changing gmp. It's not obvious how the
consequences would be...

-- 
 __("<  Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
 \__/
  ^^                      SYGNATURA ZASTĘPCZA
QRCZAK